NAME
WWW::Crawl4AI::Client - UA-agnostic REST client for the Crawl4AI Docker API
VERSION
version 0.001
SYNOPSIS
my $client = WWW::Crawl4AI::Client->new( base_url => 'http://localhost:11235' );
my $pages = $client->crawl(
WWW::Crawl4AI::Request->new( urls => 'https://example.com' )
);
print $pages->[0]{markdown};
my $job = $client->job_submit($request);
my $st = $client->job_status( $job->{task_id} ); # { status => 'COMPLETED', pages => [...] }
DESCRIPTION
A thin, UA-agnostic wrapper over the Crawl4AI Docker REST API (default port 11235). Every endpoint comes in three flavours:
foo_request(...) → builds an HTTP::Request (no I/O)
parse_foo_response($res) → decodes/normalizes (no I/O)
foo(...) → convenience: build + fire (LWP, with retry) + parse
Covered endpoints: crawl (/crawl), md (/md), job_submit / job_status (/crawl/job), health, plus the single-URL action endpoints screenshot, pdf, html, execute_js, llm and token.
Page results are normalized to a flat hash (url, final_url, status_code, markdown, html, title, metadata, error, raw) across the several response shapes Crawl4AI has used.
base_url
Crawl4AI server URL. Defaults to $ENV{CRAWL4AI_URL}, then $ENV{CRAWL4AI_BASE_URL}, then http://localhost:11235.
api_token
Optional bearer token ($ENV{CRAWL4AI_API_TOKEN}). Only needed when the server has JWT auth enabled.
user_agent_string
The User-Agent string sent with every request (default WWW-Crawl4AI/$VERSION). Net::Async::Crawl4AI reads this to configure its Net::Async::HTTP.
timeout
LWP request timeout in seconds (default 120). Crawling can be slow.
ua
The LWP::UserAgent. Lazily built; inject your own to swap transports.
max_attempts
Number of request attempts before giving up (default 3). Retries apply to transport failures and retryable HTTP statuses (see "retry_statuses").
retry_backoff
Arrayref of inter-attempt delays in seconds (default [1, 2, 4]). The last value is reused when there are more retries than entries. Overridden by a numeric Retry-After header.
retry_statuses
Arrayref of HTTP status codes that trigger a retry (default [429, 502, 503, 504]).
on_retry
Optional CodeRef called before each retry, with ($attempt, $delay, $response). Useful for logging.
sleep_sub
CodeRef that performs the inter-attempt sleep. Defaults to Time::HiRes::sleep. Override in tests to avoid wall-clock delays.
is_request
True if the argument is an HTTP::Request.
crawl_request
md_request
job_submit_request
job_status_request
health_request
Build the HTTP::Request for each endpoint without performing I/O.
screenshot_request
pdf_request
html_request
execute_js_request
llm_request
token_request
Build the HTTP::Request for each of the single-URL action endpoints without performing I/O. screenshot_request accepts wait_for (seconds), wait_for_images (bool) and output_path; pdf_request accepts output_path; execute_js_request takes a script string or arrayref; llm_request takes a query plus optional provider/temperature/base_url.
parse_crawl_response
parse_md_response
parse_job_submit_response
parse_job_status_response
parse_health_response
Decode and normalize a response. They throw a WWW::Crawl4AI::Error on API, transport or job failures.
parse_screenshot_response
parse_pdf_response
Decode the { success, screenshot|pdf } response and return the raw bytes (the base64 payload decoded), ready to write to a file. Throw a type=content WWW::Crawl4AI::Error when the server reports failure or returns no artifact.
parse_html_response
Decode the { html, url, success } response and return the preprocessed HTML string.
parse_execute_js_response
Decode the /execute_js response into a normalized page (the same shape "parse_crawl_response" produces) with the script return values added under js_result.
parse_llm_response
Decode the /llm response and return the answer text (the answer field when present).
parse_token_response
Decode the /token response into { email, access_token, token_type }.
do_request
Fire an HTTP::Request through the UA with the retry policy applied. Returns the HTTP::Response; throws a type=transport WWW::Crawl4AI::Error on persistent connection failure.
crawl
md
job_submit
job_status
health
Convenience methods: build request, fire via LWP::UserAgent, parse response. See the *_request/parse_*_response pairs for the no-I/O building blocks.
screenshot
my $png = $client->screenshot( $url, wait_for => 2, wait_for_images => 1 );
open my $fh, '>:raw', 'shot.png'; print $fh $png;
my $pdf = $client->pdf($url);
POST /screenshot and POST /pdf: render the page and return the raw image or PDF bytes (decoded from the base64 the server sends).
html
my $html = $client->html($url);
POST /html: the server-preprocessed (sanitized) HTML as a string.
execute_js
my $page = $client->execute_js( $url, 'return document.title' );
my $page = $client->execute_js( $url, [ 'window.scrollTo(0,9e9)', 'return document.title' ] );
print $page->{js_result}{results}[0];
POST /execute_js: run one or more JS snippets in the page and return a normalized page (as "crawl") with the raw script output under js_result ({ results => [ ... ], success => bool }).
llm
my $answer = $client->llm( $url, 'Who is the author?' );
GET /llm/{url}: ask an LLM a question about the page. Requires the Crawl4AI server to have an LLM provider configured (e.g. OPENAI_API_KEY); optional provider, temperature and base_url are forwarded.
token
my $tok = $client->token('me@example.com'); # { access_token, token_type, ... }
POST /token: obtain a JWT from a server with auth enabled. Feed $tok->{access_token} back as "api_token".
SEE ALSO
WWW::Crawl4AI, WWW::Crawl4AI::Request, WWW::Crawl4AI::Error
SUPPORT
Issues
Please report bugs and feature requests on GitHub at https://github.com/Getty/p5-www-crawl4ai/issues.
CONTRIBUTING
Contributions are welcome! Please fork the repository and submit a pull request.
AUTHOR
Torsten Raudssus <torsten@raudssus.de> https://raudss.us/
COPYRIGHT AND LICENSE
This software is copyright (c) 2026 by Torsten Raudssus.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.