NAME
Net::Async::WebSearch - IO::Async multi-provider web search aggregator
VERSION
version 0.002
SYNOPSIS
use IO::Async::Loop;
use Net::Async::WebSearch;
use Net::Async::WebSearch::Provider::DuckDuckGo;
use Net::Async::WebSearch::Provider::SearxNG;
use Net::Async::WebSearch::Provider::Serper;
my $loop = IO::Async::Loop->new;
my $ws = Net::Async::WebSearch->new(
providers => [
Net::Async::WebSearch::Provider::DuckDuckGo->new(
tags => ['free'],
),
Net::Async::WebSearch::Provider::SearxNG->new(
endpoint => 'https://searxng.example.org',
tags => ['free', 'private'],
),
Net::Async::WebSearch::Provider::Serper->new(
api_key => $ENV{SERPER_API_KEY},
tags => ['paid'],
),
],
);
$loop->add($ws);
# Collect mode: fan out, dedup by URL, rank by Reciprocal Rank Fusion.
my $out = $ws->search(
query => 'handyintelligence AI consulting',
limit => 20,
exclude => ['paid'], # skip Serper (and any other 'paid' provider)
)->get;
# $out->{results} — arrayref of Net::Async::WebSearch::Result, ranked
# $out->{errors} — [{ provider, error }, ...]
# $out->{stats} — { providers, providers_ok, providers_error, merged }
# Stream mode: per-result callback as soon as each provider finishes.
$ws->search_stream(
query => 'handy intelligence local AI infrastructure',
on_result => sub { my $r = shift; say $r->title, ' — ', $r->url },
on_provider_done => sub { my ($name, $results) = @_; ... },
on_provider_error => sub { my ($name, $err) = @_; ... },
)->get;
# Race mode: resolve with whichever provider returns first.
my $fast = $ws->search( query => 'handyintelligence', mode => 'race', only => ['free'] )->get;
# $fast->{provider} — name of winning provider
DESCRIPTION
IO::Async-based aggregator that fans a single query out to multiple web search providers in parallel and combines their results. Each provider is an instance of Net::Async::WebSearch::Provider; they all share the same Net::Async::HTTP client parented to this notifier.
Three modes:
collect(default) — wait for every selected provider, dedup results by normalized URL, score with Reciprocal Rank Fusion (RRF), return the toplimitranked list plus a per-provider error list.stream— fire theon_resultcoderef as soon as each provider's results arrive, in provider-finish order. Deduplicated — the first provider to surface a given URL wins. Returns a Future that resolves once all providers have settled.race— resolve with the first provider to return successfully. Good for latency-sensitive UIs that just want something.
Provider selection per call via only => [...] (allow-list) or exclude => [...] (deny-list). Disabled providers ($p->enabled(0)) are skipped regardless.
GETTING API KEYS
Quick reference for every built-in provider — where to sign up, how much it costs, and whether you need a credit card to start. Prices and free-tier allowances are as of early 2026 — upstream can move the goalposts at any time, so verify before you plan your quota.
DuckDuckGo — https://duckduckgo.com/
No key, no sign-up. The provider scrapes the no-JS HTML endpoint
html.duckduckgo.com. Free and unlimited but inherently fragile — DDG can change the markup, and they rate-limit aggressively if you hammer them. Don't build a crawler on top.SearxNG — self-hosted, https://docs.searxng.org/
Free, but you run it. The trick people trip on: the default
settings.ymldoesn't enable the JSON format. See ex/docker-compose.searxng.yml and ex/searxng/settings.yml in this distribution for a working config. Public instances also exist (https://searx.space) but most block automated JSON queries.Brave Search — https://brave.com/search/api/
Brave restructured pricing — there is no more "2000 free queries a month" tier. You now get
$5 in free credits every month, automatically applied. At the Search plan rate of$5 / 1000 requeststhat's about 1000 queries/month. You must pick a plan on signup even to use the free credits, and a credit card is required as an anti-fraud check (not charged while you stay within the credit allowance). API key is minted at https://api.search.brave.com/app/dashboard.Serper.dev — https://serper.dev
Best free-tier deal of the paid providers: 2500 free queries on signup, no credit card required. After that, paid plans in the ~$1 / 1000 range. Google results behind a proxy, very fast. Sign up on the homepage; the API key is shown in the dashboard afterward (there is no standalone
/api-keyURL).Google Programmable Search (Custom Search JSON API) — https://programmablesearchengine.google.com
Two things to set up and both are free at low volume:
- 1. Create a Programmable Search Engine at the URL above. That gives you the
cxvalue ("Search engine ID"). By default the PSE is scoped to specific sites you list — to get full web results, open Search features for that engine and turn Search the entire web on. (Google has been steadily burying this toggle but it's still there.) - 2. Enable the Custom Search API in a Google Cloud project (https://console.cloud.google.com/apis/library/customsearch.googleapis.com) and create an API key under Credentials. No credit card needed at the free tier.
Quota: 100 free queries/day. Paid: $5 / 1000, capped at 10,000/day. Results per call capped at 10.
- 1. Create a Programmable Search Engine at the URL above. That gives you the
Yandex Search API
- Signup: https://console.yandex.cloud/link/search-api/
- Docs: https://yandex.cloud/en/docs/search-api/
Requires a Yandex Cloud account and a "folder" (their project-scope concept — the folder id is your
folderid). Pricing is via Yandex Cloud credits; a free trial exists via the standard Cloud welcome credits. API key: create a service account in the Cloud Console, grant it thesearch-api.executorrole, then generate an API key (apikey) or IAM token — that's yourapi_key.Reddit (public JSON) — no key
Works out of the box but rate-limited aggressively with generic UAs. Fine for low-volume use; for anything serious use OAuth (below).
Reddit OAuth — https://www.reddit.com/prefs/apps
Free. You need a Reddit account and a working User-Agent string (Reddit insists on the form
app/1.0 by /u/yourname). At the bottom of https://www.reddit.com/prefs/apps click create app, pick type script (forclient_credentials/password) or installed (forinstalled) or web (for the fullauthorization_codeconsent flow). The short string under the app name isclient_id;secretis shown once on creation. Rate limit is 100 QPM per OAuth identity. See "SETUP" in Net::Async::WebSearch::Provider::Reddit::OAuth for the full walkthrough.
Summary table:
Provider Free tier CC? Key source
---------------- -------------------------------- ---- --------------------------------------
DuckDuckGo unlimited (HTML scrape) no (no key)
SearxNG self-hosted, unlimited no (self-host; see ex/docker-compose.*)
Brave $5/month credits (~1000 q) yes api.search.brave.com/app/dashboard
Serper 2500 / signup no serper.dev (dashboard after signup)
Google CSE 100 / day no Cloud Console + programmablesearchengine.google.com
Yandex Cloud trial credits no console.yandex.cloud/link/search-api/
Reddit keyless (rate-limited) no (no key)
Reddit OAuth 100 QPM per client_id no reddit.com/prefs/apps
Fetching result bodies
Pass fetch => N to any of the search modes to additionally GET the top N result URLs and attach the response to each Result under $r->fetched (see "fetched" in Net::Async::WebSearch::Result for the hash shape). You still get the full search result list — fetch is additive.
Semantics per mode:
collect— fetches the topNURLs after RRF dedup/ranking, so every URL is hit at most once no matter how many providers surfaced it.stream— fetches the firstNunique URLs in arrival order, kicked off the momenton_resultfires for each. An optionalon_fetchcoderef fires per result once its fetch settles. The outer Future resolves after every search and every fetch is done.race— fetches the topNof the winning provider's list.
Knobs (constructor defaults, all overridable per call):
fetch_concurrency— global cap on parallel in-flight fetches (default 100). Incollect/racethis is theconcurrentarg to "fmap_void" in Future::Utils. Instreamit's the ceiling for fetches queued on result arrival.fetch_concurrency_per_target_ip— per-host cap (default 5). Wired to Net::Async::HTTP'smax_connections_per_hoston the shared HTTP client. Keeps you from hammering a single origin even when the global pool has headroom. Currently this is per-hostname, not per-resolved-IP; different names pointing at the same CDN edge are counted separately.fetch_timeout— seconds per request, passed straight to Net::Async::HTTP.fetch_max_bytes— truncate the response body to this many bytes.fetch_user_agent— User-Agent for fetch requests. Default is the library's own UA; set it to something representative if you care about politeness.fetch_accept— per-call Accept header (e.g.text/html).
This feature is deliberately separate from the provider plumbing — providers hand back search results only. Fetching is for use-cases like RAG, crawling, and summarization where you want the actual page bodies, and is optional for MCP-style consumers that only care about the search hits themselves.
Stacking providers
You can register multiple instances of the same provider class — five SearxNG mirrors, two Serper API keys, a private DuckDuckGo clone alongside the public one. add_provider auto-renames colliding instances (serper, serper#2, serper#3...) so every one stays individually addressable. Give them explicit names via new(name => ...) when you care about the exact identifier (for logs, only/exclude, etc.).
Selectors — in only, exclude, and provider_opts keys — match against three things on each provider: its name, its class leaf (lowercased), and any of its tags. So:
my $ws = Net::Async::WebSearch->new(
providers => [
Net::Async::WebSearch::Provider::SearxNG->new(
name => 'searx-eu',
endpoint => 'https://searx.eu.example',
tags => ['private', 'eu'],
),
Net::Async::WebSearch::Provider::Serper->new(
name => 'serper-primary',
api_key => $KEY1,
tags => ['paid', 'google-backed'],
),
Net::Async::WebSearch::Provider::Serper->new(
name => 'serper-backup',
api_key => $KEY2,
tags => ['paid', 'google-backed'],
),
],
);
$ws->search( query => $q, exclude => ['paid'] ); # both Serpers skipped
$ws->search( query => $q, only => ['eu'] ); # only searx-eu
$ws->search( query => $q, only => ['searxng'] ); # every SearxNG instance
$ws->search( query => $q,
provider_opts => {
paid => { limit => 5 }, # applies to all tagged 'paid'
'serper-primary' => { tbs => 'qdr:w' }, # exact name wins
},
);
CONSTRUCTOR PARAMETERS
providers— arrayref of Net::Async::WebSearch::Provider instances.http— optional pre-built Net::Async::HTTP. One is created and parented otherwise.default_limit— top-N cap on aggregated results (default 10).per_provider_limit— how many results to ask each provider for (default 10).rrf_k— the RRF constant (default 60, as in Cormack et al.).
search(%args)
The main entry point. %args:
query— required, the search string.mode—collect(default),stream, orrace.streamandracedelegate tosearch_stream/search_race.limit— top-N merged results. Defaults todefault_limit.per_provider_limit— how many results to ask each provider for.only— arrayref of selectors; restrict dispatch to providers matching any of them. A selector is a provider name, a class leaf (searxng,serper, ...) or a tag. See "Stacking providers".exclude— arrayref of selectors; drop providers matching any of them. Takes precedence overonly.language,region,safesearch— generic hints, mapped per-provider.provider_opts— hashref keyed by selector (name / class leaf / tag), each value a hashref of per-provider option overrides, e.g.{ serper => { tbs => 'qdr:w' }, paid => { limit => 5 } }. When multiple keys match the same provider, exact-name matches win over class-leaf / tag matches.
Resolves to { results, errors, stats }. results is an arrayref of Net::Async::WebSearch::Result, with score set to its RRF score and extra->{providers} carrying { name => rank, ... }.
search_stream(%args)
Same argument shape as search, but requires an on_result coderef invoked once per unique result as it arrives. on_provider_done and on_provider_error coderefs are optional. Returns a Future that resolves once every provider has settled, to { results, errors, stats } (results is the accumulated dedup list in arrival order, not RRF-ranked).
search_race(%args)
Same argument shape. Resolves with the first provider to succeed: { provider, results, errors, stats }. If every provider fails, provider is undef and results is empty.
add_provider($provider)
Register a provider instance. If its name collides with an already-registered provider, the new one is renamed by appending #2, #3... Returns the provider (useful to pick up the final name).
provider($name)
Look up a registered provider by exact name. Returns undef if none match.
providers_matching($selector)
Returns every provider whose matches($selector) is true. $selector is a name, class leaf, or tag.
providers
Returns the list of registered providers.
http
The shared Net::Async::HTTP (lazily built, parented to this notifier).
SEE ALSO
Net::Async::WebSearch::Provider, Net::Async::WebSearch::Result, IO::Async, Net::Async::HTTP, Future
SUPPORT
Issues
Please report bugs and feature requests on GitHub at https://github.com/Getty/p5-net-async-websearch/issues.
CONTRIBUTING
Contributions are welcome! Please fork the repository and submit a pull request.
AUTHOR
Torsten Raudssus <torsten@raudssus.de> https://raudss.us/
COPYRIGHT AND LICENSE
This software is copyright (c) 2026 by Torsten Raudssus.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.