NAME

Net::Async::WebSearch - IO::Async multi-provider web search aggregator

VERSION

version 0.002

SYNOPSIS

use IO::Async::Loop;
use Net::Async::WebSearch;
use Net::Async::WebSearch::Provider::DuckDuckGo;
use Net::Async::WebSearch::Provider::SearxNG;
use Net::Async::WebSearch::Provider::Serper;

my $loop = IO::Async::Loop->new;
my $ws = Net::Async::WebSearch->new(
  providers => [
    Net::Async::WebSearch::Provider::DuckDuckGo->new(
      tags => ['free'],
    ),
    Net::Async::WebSearch::Provider::SearxNG->new(
      endpoint => 'https://searxng.example.org',
      tags     => ['free', 'private'],
    ),
    Net::Async::WebSearch::Provider::Serper->new(
      api_key => $ENV{SERPER_API_KEY},
      tags    => ['paid'],
    ),
  ],
);
$loop->add($ws);

# Collect mode: fan out, dedup by URL, rank by Reciprocal Rank Fusion.
my $out = $ws->search(
  query   => 'handyintelligence AI consulting',
  limit   => 20,
  exclude => ['paid'],    # skip Serper (and any other 'paid' provider)
)->get;
# $out->{results}  — arrayref of Net::Async::WebSearch::Result, ranked
# $out->{errors}   — [{ provider, error }, ...]
# $out->{stats}    — { providers, providers_ok, providers_error, merged }

# Stream mode: per-result callback as soon as each provider finishes.
$ws->search_stream(
  query     => 'handy intelligence local AI infrastructure',
  on_result => sub { my $r = shift; say $r->title, ' — ', $r->url },
  on_provider_done  => sub { my ($name, $results) = @_; ... },
  on_provider_error => sub { my ($name, $err)     = @_; ... },
)->get;

# Race mode: resolve with whichever provider returns first.
my $fast = $ws->search( query => 'handyintelligence', mode => 'race', only => ['free'] )->get;
# $fast->{provider} — name of winning provider

DESCRIPTION

IO::Async-based aggregator that fans a single query out to multiple web search providers in parallel and combines their results. Each provider is an instance of Net::Async::WebSearch::Provider; they all share the same Net::Async::HTTP client parented to this notifier.

Three modes:

  • collect (default) — wait for every selected provider, dedup results by normalized URL, score with Reciprocal Rank Fusion (RRF), return the top limit ranked list plus a per-provider error list.

  • stream — fire the on_result coderef as soon as each provider's results arrive, in provider-finish order. Deduplicated — the first provider to surface a given URL wins. Returns a Future that resolves once all providers have settled.

  • race — resolve with the first provider to return successfully. Good for latency-sensitive UIs that just want something.

Provider selection per call via only => [...] (allow-list) or exclude => [...] (deny-list). Disabled providers ($p->enabled(0)) are skipped regardless.

GETTING API KEYS

Quick reference for every built-in provider — where to sign up, how much it costs, and whether you need a credit card to start. Prices and free-tier allowances are as of early 2026 — upstream can move the goalposts at any time, so verify before you plan your quota.

  • DuckDuckGohttps://duckduckgo.com/

    No key, no sign-up. The provider scrapes the no-JS HTML endpoint html.duckduckgo.com. Free and unlimited but inherently fragile — DDG can change the markup, and they rate-limit aggressively if you hammer them. Don't build a crawler on top.

  • SearxNG — self-hosted, https://docs.searxng.org/

    Free, but you run it. The trick people trip on: the default settings.yml doesn't enable the JSON format. See ex/docker-compose.searxng.yml and ex/searxng/settings.yml in this distribution for a working config. Public instances also exist (https://searx.space) but most block automated JSON queries.

  • Brave Searchhttps://brave.com/search/api/

    Brave restructured pricing — there is no more "2000 free queries a month" tier. You now get $5 in free credits every month, automatically applied. At the Search plan rate of $5 / 1000 requests that's about 1000 queries/month. You must pick a plan on signup even to use the free credits, and a credit card is required as an anti-fraud check (not charged while you stay within the credit allowance). API key is minted at https://api.search.brave.com/app/dashboard.

  • Serper.devhttps://serper.dev

    Best free-tier deal of the paid providers: 2500 free queries on signup, no credit card required. After that, paid plans in the ~$1 / 1000 range. Google results behind a proxy, very fast. Sign up on the homepage; the API key is shown in the dashboard afterward (there is no standalone /api-key URL).

  • Google Programmable Search (Custom Search JSON API) — https://programmablesearchengine.google.com

    Two things to set up and both are free at low volume:

    1. Create a Programmable Search Engine at the URL above. That gives you the cx value ("Search engine ID"). By default the PSE is scoped to specific sites you list — to get full web results, open Search features for that engine and turn Search the entire web on. (Google has been steadily burying this toggle but it's still there.)
    2. Enable the Custom Search API in a Google Cloud project (https://console.cloud.google.com/apis/library/customsearch.googleapis.com) and create an API key under Credentials. No credit card needed at the free tier.

    Quota: 100 free queries/day. Paid: $5 / 1000, capped at 10,000/day. Results per call capped at 10.

  • Yandex Search API

    Signup: https://console.yandex.cloud/link/search-api/
    Docs: https://yandex.cloud/en/docs/search-api/

    Requires a Yandex Cloud account and a "folder" (their project-scope concept — the folder id is your folderid). Pricing is via Yandex Cloud credits; a free trial exists via the standard Cloud welcome credits. API key: create a service account in the Cloud Console, grant it the search-api.executor role, then generate an API key (apikey) or IAM token — that's your api_key.

  • Reddit (public JSON) — no key

    Works out of the box but rate-limited aggressively with generic UAs. Fine for low-volume use; for anything serious use OAuth (below).

  • Reddit OAuthhttps://www.reddit.com/prefs/apps

    Free. You need a Reddit account and a working User-Agent string (Reddit insists on the form app/1.0 by /u/yourname). At the bottom of https://www.reddit.com/prefs/apps click create app, pick type script (for client_credentials/password) or installed (for installed) or web (for the full authorization_code consent flow). The short string under the app name is client_id; secret is shown once on creation. Rate limit is 100 QPM per OAuth identity. See "SETUP" in Net::Async::WebSearch::Provider::Reddit::OAuth for the full walkthrough.

Summary table:

Provider         Free tier                        CC?   Key source
---------------- -------------------------------- ---- --------------------------------------
DuckDuckGo       unlimited (HTML scrape)          no   (no key)
SearxNG          self-hosted, unlimited           no   (self-host; see ex/docker-compose.*)
Brave            $5/month credits (~1000 q)       yes  api.search.brave.com/app/dashboard
Serper           2500 / signup                    no   serper.dev (dashboard after signup)
Google CSE       100 / day                        no   Cloud Console + programmablesearchengine.google.com
Yandex           Cloud trial credits              no   console.yandex.cloud/link/search-api/
Reddit           keyless (rate-limited)           no   (no key)
Reddit OAuth     100 QPM per client_id            no   reddit.com/prefs/apps

Fetching result bodies

Pass fetch => N to any of the search modes to additionally GET the top N result URLs and attach the response to each Result under $r->fetched (see "fetched" in Net::Async::WebSearch::Result for the hash shape). You still get the full search result list — fetch is additive.

Semantics per mode:

  • collect — fetches the top N URLs after RRF dedup/ranking, so every URL is hit at most once no matter how many providers surfaced it.

  • stream — fetches the first N unique URLs in arrival order, kicked off the moment on_result fires for each. An optional on_fetch coderef fires per result once its fetch settles. The outer Future resolves after every search and every fetch is done.

  • race — fetches the top N of the winning provider's list.

Knobs (constructor defaults, all overridable per call):

  • fetch_concurrency — global cap on parallel in-flight fetches (default 100). In collect/race this is the concurrent arg to "fmap_void" in Future::Utils. In stream it's the ceiling for fetches queued on result arrival.

  • fetch_concurrency_per_target_ip — per-host cap (default 5). Wired to Net::Async::HTTP's max_connections_per_host on the shared HTTP client. Keeps you from hammering a single origin even when the global pool has headroom. Currently this is per-hostname, not per-resolved-IP; different names pointing at the same CDN edge are counted separately.

  • fetch_timeout — seconds per request, passed straight to Net::Async::HTTP.

  • fetch_max_bytes — truncate the response body to this many bytes.

  • fetch_user_agent — User-Agent for fetch requests. Default is the library's own UA; set it to something representative if you care about politeness.

  • fetch_accept — per-call Accept header (e.g. text/html).

This feature is deliberately separate from the provider plumbing — providers hand back search results only. Fetching is for use-cases like RAG, crawling, and summarization where you want the actual page bodies, and is optional for MCP-style consumers that only care about the search hits themselves.

Stacking providers

You can register multiple instances of the same provider class — five SearxNG mirrors, two Serper API keys, a private DuckDuckGo clone alongside the public one. add_provider auto-renames colliding instances (serper, serper#2, serper#3...) so every one stays individually addressable. Give them explicit names via new(name => ...) when you care about the exact identifier (for logs, only/exclude, etc.).

Selectors — in only, exclude, and provider_opts keys — match against three things on each provider: its name, its class leaf (lowercased), and any of its tags. So:

my $ws = Net::Async::WebSearch->new(
  providers => [
    Net::Async::WebSearch::Provider::SearxNG->new(
      name     => 'searx-eu',
      endpoint => 'https://searx.eu.example',
      tags     => ['private', 'eu'],
    ),
    Net::Async::WebSearch::Provider::Serper->new(
      name    => 'serper-primary',
      api_key => $KEY1,
      tags    => ['paid', 'google-backed'],
    ),
    Net::Async::WebSearch::Provider::Serper->new(
      name    => 'serper-backup',
      api_key => $KEY2,
      tags    => ['paid', 'google-backed'],
    ),
  ],
);

$ws->search( query => $q, exclude => ['paid'] );        # both Serpers skipped
$ws->search( query => $q, only    => ['eu'] );          # only searx-eu
$ws->search( query => $q, only    => ['searxng'] );     # every SearxNG instance
$ws->search( query => $q,
  provider_opts => {
    paid              => { limit => 5 },                # applies to all tagged 'paid'
    'serper-primary'  => { tbs   => 'qdr:w' },          # exact name wins
  },
);

CONSTRUCTOR PARAMETERS

  • providers — arrayref of Net::Async::WebSearch::Provider instances.

  • http — optional pre-built Net::Async::HTTP. One is created and parented otherwise.

  • default_limit — top-N cap on aggregated results (default 10).

  • per_provider_limit — how many results to ask each provider for (default 10).

  • rrf_k — the RRF constant (default 60, as in Cormack et al.).

search(%args)

The main entry point. %args:

  • query — required, the search string.

  • modecollect (default), stream, or race. stream and race delegate to search_stream / search_race.

  • limit — top-N merged results. Defaults to default_limit.

  • per_provider_limit — how many results to ask each provider for.

  • only — arrayref of selectors; restrict dispatch to providers matching any of them. A selector is a provider name, a class leaf (searxng, serper, ...) or a tag. See "Stacking providers".

  • exclude — arrayref of selectors; drop providers matching any of them. Takes precedence over only.

  • language, region, safesearch — generic hints, mapped per-provider.

  • provider_opts — hashref keyed by selector (name / class leaf / tag), each value a hashref of per-provider option overrides, e.g. { serper => { tbs => 'qdr:w' }, paid => { limit => 5 } }. When multiple keys match the same provider, exact-name matches win over class-leaf / tag matches.

Resolves to { results, errors, stats }. results is an arrayref of Net::Async::WebSearch::Result, with score set to its RRF score and extra->{providers} carrying { name => rank, ... }.

search_stream(%args)

Same argument shape as search, but requires an on_result coderef invoked once per unique result as it arrives. on_provider_done and on_provider_error coderefs are optional. Returns a Future that resolves once every provider has settled, to { results, errors, stats } (results is the accumulated dedup list in arrival order, not RRF-ranked).

search_race(%args)

Same argument shape. Resolves with the first provider to succeed: { provider, results, errors, stats }. If every provider fails, provider is undef and results is empty.

add_provider($provider)

Register a provider instance. If its name collides with an already-registered provider, the new one is renamed by appending #2, #3... Returns the provider (useful to pick up the final name).

provider($name)

Look up a registered provider by exact name. Returns undef if none match.

providers_matching($selector)

Returns every provider whose matches($selector) is true. $selector is a name, class leaf, or tag.

providers

Returns the list of registered providers.

http

The shared Net::Async::HTTP (lazily built, parented to this notifier).

SEE ALSO

Net::Async::WebSearch::Provider, Net::Async::WebSearch::Result, IO::Async, Net::Async::HTTP, Future

SUPPORT

Issues

Please report bugs and feature requests on GitHub at https://github.com/Getty/p5-net-async-websearch/issues.

CONTRIBUTING

Contributions are welcome! Please fork the repository and submit a pull request.

AUTHOR

Torsten Raudssus <torsten@raudssus.de> https://raudss.us/

COPYRIGHT AND LICENSE

This software is copyright (c) 2026 by Torsten Raudssus.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.