NAME
WWW::Firecrawl - Firecrawl v2 API bindings (self-host first, cloud compatible)
VERSION
version 0.001
SYNOPSIS
use WWW::Firecrawl;
# Self-hosted
my $fc = WWW::Firecrawl->new(
base_url => 'http://localhost:3002',
);
# Cloud
my $fc = WWW::Firecrawl->new(
api_key => 'fc-...',
);
# Synchronous calls (uses LWP::UserAgent)
my $doc = $fc->scrape( url => 'https://example.com', formats => ['markdown'] );
my $links = $fc->map( url => 'https://example.com' );
my $results = $fc->search( query => 'perl firecrawl', limit => 5 );
my $job = $fc->crawl( url => 'https://example.com', limit => 50 );
my $status = $fc->crawl_status( $job->{id} );
# Request builders (bring your own UA / async framework)
my $req = $fc->scrape_request( url => 'https://example.com' );
my $res = $my_ua->request($req);
my $data = $fc->parse_scrape_response($res);
DESCRIPTION
Firecrawl (https://firecrawl.dev, https://github.com/firecrawl/firecrawl) is an open-source web scraping and crawling API. This module provides Perl bindings for the v2 API, with a focus on self-hosted deployments (cloud works too).
Every endpoint is exposed in three flavours:
$fc->foo_request(%args)— returns an HTTP::Request, no network I/O$fc->parse_foo_response($http_response)— decodes JSON, dies on error, returns the payload$fc->foo(%args)— convenience: builds, fires via LWP::UserAgent, parses
The split makes the module trivial to use with any async framework; see Net::Async::Firecrawl for the IO::Async integration.
ERROR HANDLING
All failures throw a WWW::Firecrawl::Error object (stringifies to its message — so existing die "..." / $@-matching code keeps working).
Five error types:
transport— Could not reach Firecrawl (DNS / connect / TLS / socket).api— Firecrawl returned a non-2xx HTTP response, invalid JSON, or{success: false}.job— For flows using*_status: the Firecrawl job ended with statusfailedorcancelled.scrape— Single scrape: the target URL was classified as failed by "is_failure". Only thrown when "strict" is on.page— Surfaced in thefailed[]arrayref of "scrape_many" / "retry_failed_pages": an individual URL's scrape was classified as failed but the overall operation continued.
Retries are automatic for transport and retryable api statuses (see "retry_statuses"). Never for job, scrape, or page — Firecrawl already retries target-level failures server-side, and re-running a failed job is a caller decision. See "retry_failed_pages" for the manual re-scrape helper.
Usage:
use Try::Tiny;
try {
my $data = $fc->scrape( url => $u, strict => 1 );
...
}
catch {
my $e = $_;
if (ref $e && $e->isa('WWW::Firecrawl::Error')) {
if ($e->is_transport) { ... }
elsif ($e->is_scrape) { warn "target dead: ", $e->url }
else { warn "firecrawl: $e" }
}
};
base_url
Base URL of the Firecrawl server. Defaults to $ENV{FIRECRAWL_BASE_URL} or https://api.firecrawl.dev.
api_key
Bearer token for authentication. Defaults to $ENV{FIRECRAWL_API_KEY}. Optional — self-hosted instances can run without auth.
api_version
Defaults to v2.
ua
LWP::UserAgent instance used by the synchronous convenience methods. Lazily built.
strict
When true, parse_scrape_response (and therefore scrape) throws a WWW::Firecrawl::Error with type=scrape if the target URL is classified as failed by "is_failure". Default is false — partial results are returned and the caller inspects via "is_scrape_ok" / "scrape_error".
Can be overridden per call: $fc->scrape( url => ..., strict => 1 ).
strict affects only single-URL scrape. Flow helpers (crawl, batch-scrape, scrape_many) always return partial-success results regardless.
max_attempts
Number of attempts for each request (default 3). Set to 1 to disable retries. Retries apply only to transport errors and retryable API statuses (see "retry_statuses"). Never retries target-level (scrape/page) or job-level (job) failures — Firecrawl already retries targets server-side, and re-running a failed job is a caller decision.
retry_backoff
Arrayref of delays in seconds between attempts (default [1, 2, 4]). If fewer entries than max_attempts - 1, the last value is reused. Overridden by a numeric Retry-After response header.
retry_statuses
Arrayref of HTTP status codes that trigger a retry (default [429, 502, 503, 504]).
on_retry
Optional CodeRef called before each retry, with ($attempt, $delay, $error). Useful for logging.
sleep_sub
CodeRef that performs the inter-attempt sleep. Defaults to Time::HiRes::sleep. Override in tests to avoid wall-clock delays.
is_failure
Code reference that classifies a scrape result hash as a failure. Defaults to: metadata.error non-empty OR metadata.statusCode >= 500. Mutually exclusive with "failure_codes".
failure_codes
Constructor sugar for common "is_failure" variants. Pass an arrayref of HTTP status codes (e.g. [ 404, 500..599 ]) or the string 'any-non-2xx'. Compiled into an is_failure predicate at construction time.
Passing both is_failure and failure_codes raises at construction.
endpoint_url(@path_parts)
Builds <base_url>/<api_version>/<path_parts>.
parse_response($http_response)
Generic JSON decoder used by all parse_* helpers. On failure, throws a WWW::Firecrawl::Error object (type api or transport). Throws on HTTP errors and on {success: false} payloads.
is_response / is_request
Boolean helpers to check if a value is an HTTP::Response or HTTP::Request.
is_scrape_ok($page)
Returns true if the given scrape result hash is not classified as a failure by "is_failure".
scrape_status($page)
Returns the target URL's HTTP status code (metadata.statusCode), or 0 if absent.
scrape_error($page)
Returns a combined error string for a failed scrape (metadata.error and/or non-2xx statusCode), or undef if nothing looks wrong.
scrape / scrape_request / parse_scrape_response
POST /v2/scrape. Returns the data hash on success.
crawl / crawl_request / parse_crawl_response
POST /v2/crawl. Returns { success, id, url }.
crawl_status / crawl_status_request / parse_crawl_status_response
GET /v2/crawl/{id}. Returns the full status object including status, total, completed, data, and next (pagination URL).
crawl_status_next($next_url)
Follow the next URL verbatim for subsequent pages of a large crawl result.
crawl_cancel / crawl_cancel_request
DELETE /v2/crawl/{id}.
crawl_errors / crawl_errors_request
GET /v2/crawl/{id}/errors.
crawl_active / crawl_active_request
GET /v2/crawl/active.
crawl_params_preview / crawl_params_preview_request
POST /v2/crawl/params/preview.
map / map_request / parse_map_response
POST /v2/map. Returns the links array.
search / search_request / parse_search_response
POST /v2/search.
batch_scrape / batch_scrape_request
POST /v2/batch/scrape. Returns { id, url, invalidURLs }.
batch_scrape_status / batch_scrape_status_request
GET /v2/batch/scrape/{id}.
batch_scrape_status_next($next_url)
Follow pagination for batch-scrape results.
batch_scrape_cancel / batch_scrape_cancel_request
DELETE /v2/batch/scrape/{id}.
batch_scrape_errors / batch_scrape_errors_request
GET /v2/batch/scrape/{id}/errors.
extract / extract_request
POST /v2/extract.
extract_status / extract_status_request
GET /v2/extract/{id}.
agent / agent_status / agent_cancel
POST /v2/agent, GET/DELETE /v2/agent/{id}.
browser_create / browser_list / browser_delete / browser_execute
POST/GET/DELETE /v2/browser/....
scrape_execute / scrape_browser_stop
Interactive scrape session endpoints.
credit_usage / credit_usage_historical / token_usage / token_usage_historical / queue_status / activity
GET monitoring endpoints.
scrape_many(\@urls, %scrape_opts)
Sequential per-URL scrape with partial-success semantics. Returns a hashref:
{
ok => [ { url => ..., data => $scrape_data }, ... ],
failed => [ { url => ..., error => $WWW_Firecrawl_Error }, ... ],
stats => { ok => N, failed => M, total => N+M },
}
Transport/API errors and target-level failures (per "is_failure") all land in failed[]. The outer call never throws for per-URL failures.
retry_failed_pages(\%crawl_or_batch_result, %scrape_opts)
Pulls URLs out of a crawl/batch result's failed[] array and re-scrapes them via "scrape_many". Returns the standard { ok, failed, stats } hashref. The scrape options you pass here are applied to the retry round — caller is responsible for matching them to the original crawl's scrape options if needed.
SEE ALSO
Net::Async::Firecrawl, https://firecrawl.dev, https://docs.firecrawl.dev/api-reference/v2-introduction
SUPPORT
Issues
Please report bugs and feature requests on GitHub at https://github.com/Getty/p5-www-firecrawl/issues.
CONTRIBUTING
Contributions are welcome! Please fork the repository and submit a pull request.
AUTHOR
Torsten Raudssus <torsten@raudssus.de> https://raudss.us/
COPYRIGHT AND LICENSE
This software is copyright (c) 2026 by Torsten Raudssus.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.