NAME

WWW::Crawl4AI::Client - UA-agnostic REST client for the Crawl4AI Docker API

VERSION

version 0.001

SYNOPSIS

my $client = WWW::Crawl4AI::Client->new( base_url => 'http://localhost:11235' );

my $pages = $client->crawl(
  WWW::Crawl4AI::Request->new( urls => 'https://example.com' )
);
print $pages->[0]{markdown};

my $job = $client->job_submit($request);
my $st  = $client->job_status( $job->{task_id} );   # { status => 'COMPLETED', pages => [...] }

DESCRIPTION

A thin, UA-agnostic wrapper over the Crawl4AI Docker REST API (default port 11235). Every endpoint comes in three flavours:

foo_request(...)          → builds an HTTP::Request (no I/O)
parse_foo_response($res)  → decodes/normalizes (no I/O)
foo(...)                  → convenience: build + fire (LWP, with retry) + parse

Covered endpoints: crawl (/crawl), md (/md), job_submit / job_status (/crawl/job), health, plus the single-URL action endpoints screenshot, pdf, html, execute_js, llm and token.

Page results are normalized to a flat hash (url, final_url, status_code, markdown, html, title, metadata, error, raw) across the several response shapes Crawl4AI has used.

base_url

Crawl4AI server URL. Defaults to $ENV{CRAWL4AI_URL}, then $ENV{CRAWL4AI_BASE_URL}, then http://localhost:11235.

api_token

Optional bearer token ($ENV{CRAWL4AI_API_TOKEN}). Only needed when the server has JWT auth enabled.

user_agent_string

The User-Agent string sent with every request (default WWW-Crawl4AI/$VERSION). Net::Async::Crawl4AI reads this to configure its Net::Async::HTTP.

timeout

LWP request timeout in seconds (default 120). Crawling can be slow.

ua

The LWP::UserAgent. Lazily built; inject your own to swap transports.

max_attempts

Number of request attempts before giving up (default 3). Retries apply to transport failures and retryable HTTP statuses (see "retry_statuses").

retry_backoff

Arrayref of inter-attempt delays in seconds (default [1, 2, 4]). The last value is reused when there are more retries than entries. Overridden by a numeric Retry-After header.

retry_statuses

Arrayref of HTTP status codes that trigger a retry (default [429, 502, 503, 504]).

on_retry

Optional CodeRef called before each retry, with ($attempt, $delay, $response). Useful for logging.

sleep_sub

CodeRef that performs the inter-attempt sleep. Defaults to Time::HiRes::sleep. Override in tests to avoid wall-clock delays.

is_request

True if the argument is an HTTP::Request.

crawl_request

md_request

job_submit_request

job_status_request

health_request

Build the HTTP::Request for each endpoint without performing I/O.

screenshot_request

pdf_request

html_request

execute_js_request

llm_request

token_request

Build the HTTP::Request for each of the single-URL action endpoints without performing I/O. screenshot_request accepts wait_for (seconds), wait_for_images (bool) and output_path; pdf_request accepts output_path; execute_js_request takes a script string or arrayref; llm_request takes a query plus optional provider/temperature/base_url.

parse_crawl_response

parse_md_response

parse_job_submit_response

parse_job_status_response

parse_health_response

Decode and normalize a response. They throw a WWW::Crawl4AI::Error on API, transport or job failures.

parse_screenshot_response

parse_pdf_response

Decode the { success, screenshot|pdf } response and return the raw bytes (the base64 payload decoded), ready to write to a file. Throw a type=content WWW::Crawl4AI::Error when the server reports failure or returns no artifact.

parse_html_response

Decode the { html, url, success } response and return the preprocessed HTML string.

parse_execute_js_response

Decode the /execute_js response into a normalized page (the same shape "parse_crawl_response" produces) with the script return values added under js_result.

parse_llm_response

Decode the /llm response and return the answer text (the answer field when present).

parse_token_response

Decode the /token response into { email, access_token, token_type }.

do_request

Fire an HTTP::Request through the UA with the retry policy applied. Returns the HTTP::Response; throws a type=transport WWW::Crawl4AI::Error on persistent connection failure.

crawl

md

job_submit

job_status

health

Convenience methods: build request, fire via LWP::UserAgent, parse response. See the *_request/parse_*_response pairs for the no-I/O building blocks.

screenshot

my $png = $client->screenshot( $url, wait_for => 2, wait_for_images => 1 );
open my $fh, '>:raw', 'shot.png'; print $fh $png;

pdf

my $pdf = $client->pdf($url);

POST /screenshot and POST /pdf: render the page and return the raw image or PDF bytes (decoded from the base64 the server sends).

html

my $html = $client->html($url);

POST /html: the server-preprocessed (sanitized) HTML as a string.

execute_js

my $page = $client->execute_js( $url, 'return document.title' );
my $page = $client->execute_js( $url, [ 'window.scrollTo(0,9e9)', 'return document.title' ] );
print $page->{js_result}{results}[0];

POST /execute_js: run one or more JS snippets in the page and return a normalized page (as "crawl") with the raw script output under js_result ({ results => [ ... ], success => bool }).

llm

my $answer = $client->llm( $url, 'Who is the author?' );

GET /llm/{url}: ask an LLM a question about the page. Requires the Crawl4AI server to have an LLM provider configured (e.g. OPENAI_API_KEY); optional provider, temperature and base_url are forwarded.

token

my $tok = $client->token('me@example.com');   # { access_token, token_type, ... }

POST /token: obtain a JWT from a server with auth enabled. Feed $tok->{access_token} back as "api_token".

SUPPORT

Issues

Please report bugs and feature requests on GitHub at https://github.com/Getty/p5-www-crawl4ai/issues.

CONTRIBUTING

Contributions are welcome! Please fork the repository and submit a pull request.

AUTHOR

Torsten Raudssus <torsten@raudssus.de> https://raudss.us/

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install WWW::Crawl4AI, copy and paste the appropriate command in to your terminal.

cpanm

cpanm WWW::Crawl4AI

CPAN shell

perl -MCPAN -e shell
install WWW::Crawl4AI

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)