NAME

Net::Async::Firecrawl - IO::Async Firecrawl v2 client with flow helpers

VERSION

version 0.001

SYNOPSIS

use IO::Async::Loop;
use Net::Async::Firecrawl;

my $loop = IO::Async::Loop->new;
my $fc = Net::Async::Firecrawl->new(
  base_url      => 'http://localhost:3002',  # or https://api.firecrawl.dev
  api_key       => 'fc-...',                 # optional for self-hosted
  poll_interval => 3,
);
$loop->add($fc);

# Single scrape
my $doc = $fc->scrape( url => 'https://example.com', formats => ['markdown'] )->get;

# Crawl a site, poll to completion, collect all paginated pages, split by is_failure.
my $result = $fc->crawl_and_collect(
  url   => 'https://example.com',
  limit => 100,
)->get;
# $result->{data}     — ok pages only
# $result->{failed}   — [{ url, statusCode, error, page }, ...]
# $result->{raw_data} — all pages in original order
# $result->{stats}    — { ok, failed, total }

# Batch scrape, waits for all results.
my $batch = $fc->batch_scrape_and_wait(
  urls    => [ 'https://a', 'https://b' ],
  formats => ['markdown'],
)->get;

# Structured extraction.
my $extract = $fc->extract_and_wait(
  urls   => [ 'https://example.com/*' ],
  prompt => 'extract pricing and product names',
)->get;

# Concurrent per-URL scrape (partial-success).
my $many = $fc->scrape_many(
  [qw( https://a https://b https://c )],
  formats => ['markdown'],
)->get;
# $many->{ok}     — [{ url, data }, ...]
# $many->{failed} — [{ url, error }, ...]   — $error is a WWW::Firecrawl::Error

# Retry the failed URLs from a prior crawl/batch.
my $retried = $fc->retry_failed_pages($result, formats => ['markdown'])->get;

DESCRIPTION

IO::Async-flavoured client for the Firecrawl v2 API. Wraps WWW::Firecrawl's request builders and response parsers, dispatches through Net::Async::HTTP, and returns Future objects.

Every endpoint exposed by WWW::Firecrawl is available here as a Future-returning method with identical argument signatures. On top of that, high-level flow helpers automate the start-job → poll → collect-pages pattern common to crawl, batch-scrape, extract, and agent operations — including partial-success splitting by the classification policy of the underlying WWW::Firecrawl.

CONSTRUCTOR PARAMETERS

base_url, api_key, api_version — forwarded to WWW::Firecrawl.
firecrawl — pass a pre-built WWW::Firecrawl instance (overrides the above three).
http — pass a pre-built Net::Async::HTTP (otherwise one is created and parented to this notifier).
poll_interval — seconds between status polls for flow helpers (default 3).
delay_sub — optional CodeRef that returns a Future for inter-attempt and polling delays. If omitted, $loop->delay_future is used. Mainly a test hook.

Retry attributes (max_attempts, retry_backoff, retry_statuses, on_retry) and classification attributes (is_failure, failure_codes, strict) live on the underlying WWW::Firecrawl. Pass them to this constructor or build the WWW::Firecrawl instance yourself and pass it as firecrawl.

ERROR HANDLING

Every failure path resolves as Future->fail($error, 'firecrawl', $attempt?) where $error is a WWW::Firecrawl::Error object (stringifies to its message). $f->failure returns ($error, 'firecrawl', $attempt?).

Five error types (same model as WWW::Firecrawl):

transport — Firecrawl unreachable. Retried automatically up to max_attempts.
api — Firecrawl returned non-2xx or {success: false}. Retried only for retry_statuses (default 429/502/503/504).
job — A flow reported status: failed or status: cancelled. Never retried — always propagates as Future fail.
scrape — Single-scrape's target URL was classified as failed (only raised when strict is on).
page — A target URL inside a flow (scrape_many, or a failed entry within crawl/batch) was classified as failed. Surfaced in failed[], not thrown.

Classic usage:

$fc->scrape( url => $u )->then(sub {
  my $data = shift;
  ...
})->else(sub {
  my ( $err ) = @_;
  if ($err->is_transport) { ... }
  elsif ($err->is_job)    { ... }
  else                    { warn "firecrawl: $err"; Future->fail($err) }
});

scrape

crawl

crawl_status

crawl_cancel

crawl_errors

crawl_active

crawl_params_preview

map

search

batch_scrape

batch_scrape_status

batch_scrape_cancel

batch_scrape_errors

extract

extract_status

agent

agent_status

agent_cancel

browser_create

browser_list

browser_delete

browser_execute

scrape_execute

scrape_browser_stop

credit_usage

credit_usage_historical

token_usage

token_usage_historical

queue_status

activity

One Future-returning method per WWW::Firecrawl endpoint, same argument signature. Resolves to the parsed payload on success. See WWW::Firecrawl for per-endpoint details.

crawl_status_next($next_url)

batch_scrape_status_next($next_url)

Follow a pagination URL from a previous status response.

crawl_and_collect(%crawl_args)

Fires crawl, polls crawl_status every poll_interval seconds until the job reports completed (failed/cancelled fail the Future with type=job), walks the next pagination chain, classifies each collected page via the underlying WWW::Firecrawl's is_failure, and resolves to:

{
  status      => 'completed',
  id          => $job_id,
  creditsUsed => ...,
  data        => [ ok_page,   ... ],   # ok only
  failed      => [ { url, statusCode, error, page }, ... ],
  raw_data    => [ page, ... ],         # all, original order
  stats       => { ok, failed, total },
}

batch_scrape_and_wait(%batch_args)

Same contract as crawl_and_collect but against the batch-scrape endpoints. Same return shape.

extract_and_wait(%extract_args)

Starts an extract job and resolves once extract_status reports completed. Fails (type=job) on failed/cancelled. Returns the final status hash.

agent_and_wait(%agent_args)

Like extract_and_wait, for agent jobs.

scrape_many(\@urls, %common_scrape_args)

Fires a scrape per URL concurrently. Resolves to:

{
  ok     => [ { url, data }, ... ],
  failed => [ { url, error }, ... ],     # error is a WWW::Firecrawl::Error
  stats  => { ok, failed, total },
}

The outer Future never fails for per-URL failures (transport, api, or target-level). It only fails for local errors (e.g. not added to a loop).

retry_failed_pages($result, %scrape_opts)

Takes a result from crawl_and_collect / batch_scrape_and_wait / scrape_many and re-scrapes the URLs in $result->{failed} via scrape_many. Returns a Future of the standard { ok, failed, stats } hashref.

do_request($http_request)

Low-level: dispatch an arbitrary HTTP::Request (typically one built via $self->firecrawl->foo_request) through the underlying Net::Async::HTTP with retry applied. Returns a Future of HTTP::Response.

firecrawl

The underlying WWW::Firecrawl instance.

http

The underlying Net::Async::HTTP instance (lazily built and parented to this notifier).

poll_interval

Read/write accessor for the default poll interval (seconds) used by flow helpers.

SUPPORT

Issues

Please report bugs and feature requests on GitHub at https://github.com/Getty/p5-net-async-firecrawl/issues.

CONTRIBUTING

Contributions are welcome! Please fork the repository and submit a pull request.

AUTHOR

Torsten Raudssus <torsten@raudssus.de> https://raudss.us/

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Net::Async::Firecrawl, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Net::Async::Firecrawl

CPAN shell

perl -MCPAN -e shell
install Net::Async::Firecrawl

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)