NAME

WWW::Firecrawl - Firecrawl v2 API bindings (self-host first, cloud compatible)

VERSION

version 0.001

SYNOPSIS

use WWW::Firecrawl;

# Self-hosted
my $fc = WWW::Firecrawl->new(
  base_url => 'http://localhost:3002',
);

# Cloud
my $fc = WWW::Firecrawl->new(
  api_key => 'fc-...',
);

# Synchronous calls (uses LWP::UserAgent)
my $doc     = $fc->scrape( url => 'https://example.com', formats => ['markdown'] );
my $links   = $fc->map( url => 'https://example.com' );
my $results = $fc->search( query => 'perl firecrawl', limit => 5 );

my $job = $fc->crawl( url => 'https://example.com', limit => 50 );
my $status = $fc->crawl_status( $job->{id} );

# Request builders (bring your own UA / async framework)
my $req = $fc->scrape_request( url => 'https://example.com' );
my $res = $my_ua->request($req);
my $data = $fc->parse_scrape_response($res);

DESCRIPTION

Firecrawl (https://firecrawl.dev, https://github.com/firecrawl/firecrawl) is an open-source web scraping and crawling API. This module provides Perl bindings for the v2 API, with a focus on self-hosted deployments (cloud works too).

Every endpoint is exposed in three flavours:

  • $fc->foo_request(%args) — returns an HTTP::Request, no network I/O

  • $fc->parse_foo_response($http_response) — decodes JSON, dies on error, returns the payload

  • $fc->foo(%args) — convenience: builds, fires via LWP::UserAgent, parses

The split makes the module trivial to use with any async framework; see Net::Async::Firecrawl for the IO::Async integration.

ERROR HANDLING

All failures throw a WWW::Firecrawl::Error object (stringifies to its message — so existing die "..." / $@-matching code keeps working).

Five error types:

  • transport — Could not reach Firecrawl (DNS / connect / TLS / socket).

  • api — Firecrawl returned a non-2xx HTTP response, invalid JSON, or {success: false}.

  • job — For flows using *_status: the Firecrawl job ended with status failed or cancelled.

  • scrape — Single scrape: the target URL was classified as failed by "is_failure". Only thrown when "strict" is on.

  • page — Surfaced in the failed[] arrayref of "scrape_many" / "retry_failed_pages": an individual URL's scrape was classified as failed but the overall operation continued.

Retries are automatic for transport and retryable api statuses (see "retry_statuses"). Never for job, scrape, or page — Firecrawl already retries target-level failures server-side, and re-running a failed job is a caller decision. See "retry_failed_pages" for the manual re-scrape helper.

Usage:

use Try::Tiny;
try {
  my $data = $fc->scrape( url => $u, strict => 1 );
  ...
}
catch {
  my $e = $_;
  if (ref $e && $e->isa('WWW::Firecrawl::Error')) {
    if ($e->is_transport) { ... }
    elsif ($e->is_scrape) { warn "target dead: ", $e->url }
    else                  { warn "firecrawl: $e" }
  }
};

base_url

Base URL of the Firecrawl server. Defaults to $ENV{FIRECRAWL_BASE_URL} or https://api.firecrawl.dev.

api_key

Bearer token for authentication. Defaults to $ENV{FIRECRAWL_API_KEY}. Optional — self-hosted instances can run without auth.

api_version

Defaults to v2.

ua

LWP::UserAgent instance used by the synchronous convenience methods. Lazily built.

strict

When true, parse_scrape_response (and therefore scrape) throws a WWW::Firecrawl::Error with type=scrape if the target URL is classified as failed by "is_failure". Default is false — partial results are returned and the caller inspects via "is_scrape_ok" / "scrape_error".

Can be overridden per call: $fc->scrape( url => ..., strict => 1 ).

strict affects only single-URL scrape. Flow helpers (crawl, batch-scrape, scrape_many) always return partial-success results regardless.

max_attempts

Number of attempts for each request (default 3). Set to 1 to disable retries. Retries apply only to transport errors and retryable API statuses (see "retry_statuses"). Never retries target-level (scrape/page) or job-level (job) failures — Firecrawl already retries targets server-side, and re-running a failed job is a caller decision.

retry_backoff

Arrayref of delays in seconds between attempts (default [1, 2, 4]). If fewer entries than max_attempts - 1, the last value is reused. Overridden by a numeric Retry-After response header.

retry_statuses

Arrayref of HTTP status codes that trigger a retry (default [429, 502, 503, 504]).

on_retry

Optional CodeRef called before each retry, with ($attempt, $delay, $error). Useful for logging.

sleep_sub

CodeRef that performs the inter-attempt sleep. Defaults to Time::HiRes::sleep. Override in tests to avoid wall-clock delays.

is_failure

Code reference that classifies a scrape result hash as a failure. Defaults to: metadata.error non-empty OR metadata.statusCode >= 500. Mutually exclusive with "failure_codes".

failure_codes

Constructor sugar for common "is_failure" variants. Pass an arrayref of HTTP status codes (e.g. [ 404, 500..599 ]) or the string 'any-non-2xx'. Compiled into an is_failure predicate at construction time.

Passing both is_failure and failure_codes raises at construction.

endpoint_url(@path_parts)

Builds <base_url>/<api_version>/<path_parts>.

parse_response($http_response)

Generic JSON decoder used by all parse_* helpers. On failure, throws a WWW::Firecrawl::Error object (type api or transport). Throws on HTTP errors and on {success: false} payloads.

is_response / is_request

Boolean helpers to check if a value is an HTTP::Response or HTTP::Request.

is_scrape_ok($page)

Returns true if the given scrape result hash is not classified as a failure by "is_failure".

scrape_status($page)

Returns the target URL's HTTP status code (metadata.statusCode), or 0 if absent.

scrape_error($page)

Returns a combined error string for a failed scrape (metadata.error and/or non-2xx statusCode), or undef if nothing looks wrong.

scrape / scrape_request / parse_scrape_response

POST /v2/scrape. Returns the data hash on success.

crawl / crawl_request / parse_crawl_response

POST /v2/crawl. Returns { success, id, url }.

crawl_status / crawl_status_request / parse_crawl_status_response

GET /v2/crawl/{id}. Returns the full status object including status, total, completed, data, and next (pagination URL).

crawl_status_next($next_url)

Follow the next URL verbatim for subsequent pages of a large crawl result.

crawl_cancel / crawl_cancel_request

DELETE /v2/crawl/{id}.

crawl_errors / crawl_errors_request

GET /v2/crawl/{id}/errors.

crawl_active / crawl_active_request

GET /v2/crawl/active.

crawl_params_preview / crawl_params_preview_request

POST /v2/crawl/params/preview.

map / map_request / parse_map_response

POST /v2/map. Returns the links array.

search / search_request / parse_search_response

POST /v2/search.

batch_scrape / batch_scrape_request

POST /v2/batch/scrape. Returns { id, url, invalidURLs }.

batch_scrape_status / batch_scrape_status_request

GET /v2/batch/scrape/{id}.

batch_scrape_status_next($next_url)

Follow pagination for batch-scrape results.

batch_scrape_cancel / batch_scrape_cancel_request

DELETE /v2/batch/scrape/{id}.

batch_scrape_errors / batch_scrape_errors_request

GET /v2/batch/scrape/{id}/errors.

extract / extract_request

POST /v2/extract.

extract_status / extract_status_request

GET /v2/extract/{id}.

agent / agent_status / agent_cancel

POST /v2/agent, GET/DELETE /v2/agent/{id}.

browser_create / browser_list / browser_delete / browser_execute

POST/GET/DELETE /v2/browser/....

scrape_execute / scrape_browser_stop

Interactive scrape session endpoints.

credit_usage / credit_usage_historical / token_usage / token_usage_historical / queue_status / activity

GET monitoring endpoints.

scrape_many(\@urls, %scrape_opts)

Sequential per-URL scrape with partial-success semantics. Returns a hashref:

{
  ok     => [ { url => ..., data => $scrape_data }, ... ],
  failed => [ { url => ..., error => $WWW_Firecrawl_Error }, ... ],
  stats  => { ok => N, failed => M, total => N+M },
}

Transport/API errors and target-level failures (per "is_failure") all land in failed[]. The outer call never throws for per-URL failures.

retry_failed_pages(\%crawl_or_batch_result, %scrape_opts)

Pulls URLs out of a crawl/batch result's failed[] array and re-scrapes them via "scrape_many". Returns the standard { ok, failed, stats } hashref. The scrape options you pass here are applied to the retry round — caller is responsible for matching them to the original crawl's scrape options if needed.

SEE ALSO

Net::Async::Firecrawl, https://firecrawl.dev, https://docs.firecrawl.dev/api-reference/v2-introduction

SUPPORT

Issues

Please report bugs and feature requests on GitHub at https://github.com/Getty/p5-www-firecrawl/issues.

CONTRIBUTING

Contributions are welcome! Please fork the repository and submit a pull request.

AUTHOR

Torsten Raudssus <torsten@raudssus.de> https://raudss.us/

COPYRIGHT AND LICENSE

This software is copyright (c) 2026 by Torsten Raudssus.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.