NAME

WWW::Crawl4AI::DeepCrawlIterator - breadth-first iterator for deep_crawl, separating frontier management from crawl logic

VERSION

version 0.001

SYNOPSIS

my $iter = WWW::Crawl4AI::DeepCrawlIterator->new(
  crawler   => $crawler,
  start_url => 'https://example.com',
  max_pages => 50,
  max_depth => 3,
  same_host => 1,
  url_filter => sub { $_[0] !~ m{/login} },
);

while ( my $page = $iter->next ) {
  my ( $result, $depth ) = @$page;
  $on_page->( $result, $depth ) if $on_page;
}

DESCRIPTION

Iterator over pages returned by "deep_crawl" in WWW::Crawl4AI. Encapsulates the BFS frontier management: deduplication, same-host filtering, depth capping. Each call to "next" performs one crawl (through the strategy chain) and schedules its links for future traversal.

Replaces the inline BFS loop in WWW::Crawl4AI::deep_crawl, enabling alternative crawl orders and isolated testing of the frontier logic.

crawler

A WWW::Crawl4AI instance (or any object with a crawl method).

start_url

Starting URL for the crawl.

max_pages

Hard cap on pages crawled.

max_depth

Maximum link-following depth; the start URL is depth 0.

same_host

Only follow links on the start URL's host.

url_filter

Optional coderef ($url) -> bool; return false to skip a URL.

on_page

Optional coderef ($result, $depth) called as each page completes.

next

Returns an arrayref [$result, $depth] for the next page, or undef when the crawl is exhausted or max_pages reached.

results

Returns the arrayref of WWW::Crawl4AI::Result accumulated so far.

is_exhausted

True when the queue is empty or max_pages reached.

SUPPORT

Issues

Please report bugs and feature requests on GitHub at https://github.com/Getty/p5-www-crawl4ai/issues.

CONTRIBUTING

Contributions are welcome! Please fork the repository and submit a pull request.

AUTHOR

Torsten Raudssus <torsten@raudssus.de> https://raudss.us/

COPYRIGHT AND LICENSE

This software is copyright (c) 2026 by Torsten Raudssus.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.