NAME
web-search - Open web search page in browser
VERSION
This document describes version 0.001 of web-search (from Perl distribution App-WebSearchUtils), released on 2022-10-10.
SYNOPSIS
web-search [--action=str|--open-url|--print-html-link|--print-org-link|--print-result-html-link|--print-result-link|--print-result-org-link|--print-url|--save-html] [--append=str] [--config-path=path|-c|--no-config|-C] [--config-profile=profile|-P] [--debug|--log-level=level|--quiet|--trace|--verbose] [--delay=duration] [--engine=str] [--format=name|--json] [--max-delay=duration] [--min-delay=duration] [--(no)naked-res] [--no-env] [--num=posint] [--page-result[=program]|--view-result[=program]] [--prepend=str] [--queries-from=filename] [(--query=str)+|--queries-json=json] [--time-end=date] [--time-past=str] [--time-start=date] -- [query] ...
See examples in the "EXAMPLES" section.
DESCRIPTION
This utility can save you time when you want to open multiple queries (with added common prefix/suffix words) or specify some options like time limit. It will formulate the search URL(s) then open them for you in browser. You can also specify to print out the URLs instead.
Aside from standard web search, you can also generate/open other searches like image, video, news, or map.
OPTIONS
*
marks required options.
Main options
- --action=s
-
What to do with the URLs.
Default value:
"open_url"
Valid values:
["open_url","print_url","print_html_link","print_org_link","save_html","print_result_link","print_result_html_link","print_result_org_link"]
Instead of opening the queries in browser (
open_url
), you can also do other action instead.Printing search URLs:
print_url
will print the search URL.print_html_link
will print the HTML link (the <a> tag). Andprint_org_link
will print the Org-mode link, e.g.[[url...][query]]
.Saving search result HTMLs:
save_html
will first visit each search URL (currently using Firefox::Marionette) then save each result page to a file named<num>-<query>.html
in the current directory. Existing files will not be overwritten; the utility will save to*.html.1
,*.html.2
and so on instead.Extracting search result links:
print_result_link
will first will first visit each search URL (currently using Firefox::Marionette) then extract result links and print them.print_result_html_link
andprint_result_org_link
are similar but will instead format each link as HTML and Org link, respectively.The
print_result_*link
actions are not very useful for some search engines like Google because result HTML page is obfuscated. Thus we can only extract all links in each page instead of selecting (via DOM) only the actual search result entry links, etc.If you want to filter the links further by domain, path, etc. you can use grep-url.
- --append=s
-
String to add at the end of each query.
- --delay=s
-
Delay between opening each query.
As an alternative to the
--delay
option, you can also use--min-delay
and--max-delay
to set a random delay between a minimum and maximum value. - --engine=s
-
Search engine to use.
Default value:
"web"
Valid values:
["google","google_image","google_video","google_news","google_map","bing","brave","ddg"]
- --max-delay=s
-
Delay between opening each query.
- --min-delay=s
-
Delay between opening each query.
As an alternative to the
--mindelay
and--max-delay
options, you can also use--delay
to set a constant delay between requests. - --num=s
-
Number of results per page.
- --open-url
-
Alias for --action=open_url.
See
--action
. - --prepend=s
-
String to add at the beginning of each query.
- --print-html-link
-
Alias for --action=print_html_link.
See
--action
. - --print-org-link
-
Alias for --action=print_org_link.
See
--action
. - --print-result-html-link
-
Alias for --action=extract_links.
See
--action
. - --print-result-link
-
Alias for --action=extract_links.
See
--action
. - --print-result-org-link
-
Alias for --action=extract_links.
See
--action
. - --print-url
-
Alias for --action=print_url.
See
--action
. - --queries-from=s
-
Supply queries from lines of text file (specify "-" for stdin).
- --queries-json=s
-
See
--query
.Can also be specified as the 1st command-line argument and onwards.
- --query=s@
-
Can also be specified as the 1st command-line argument and onwards.
Can be specified multiple times.
- --save-html
-
Alias for --action=save_html.
See
--action
.
Configuration options
- --config-path=s, -c
-
Set path to configuration file.
Can actually be specified multiple times to instruct application to read from multiple configuration files (and merge them).
- --config-profile=s, -P
-
Set configuration profile to use.
A single configuration file can contain profiles, i.e. alternative sets of values that can be selected. For example:
[profile=dev] username=foo pass=beaver [profile=production] username=bar pass=honey
When you specify
--config-profile=dev
,username
will be set tofoo
andpassword
tobeaver
. When you specify--config-profile=production
,username
will be set tobar
andpassword
tohoney
. - --no-config, -C
-
Do not use any configuration file.
If you specify
--no-config
, the application will not read any configuration file.
Environment options
- --no-env
-
Do not read environment for default options.
If you specify
--no-env
, the application wil not read any environment variable.
Logging options
- --debug
-
Shortcut for --log-level=debug.
- --log-level=s
-
Set log level.
By default, these log levels are available (in order of increasing level of importance, from least important to most):
trace
,debug
,info
,warn
/warning
,error
,fatal
. By default, the level is usually set towarn
, which means that log statements with levelinfo
and less important levels will not be shown. To increase verbosity, chooseinfo
,debug
, ortrace
.For more details on log level and logging, as well as how new logging levels can be defined or existing ones modified, see Log::ger.
- --quiet
-
Shortcut for --log-level=error.
- --trace
-
Shortcut for --log-level=trace.
- --verbose
-
Shortcut for --log-level=info.
Output options
- --format=s
-
Choose output format, e.g. json, text.
Default value:
undef
Output can be displayed in multiple formats, and a suitable default format is chosen depending on the application and/or whether output destination is interactive terminal (i.e. whether output is piped). This option specifically chooses an output format.
- --json
-
Set output format to json.
- --naked-res
-
When outputing as JSON, strip result envelope.
Default value:
0
By default, when outputing as JSON, the full enveloped result is returned, e.g.:
[200,"OK",[1,2,3],{"func.extra"=>4}]
The reason is so you can get the status (1st element), status message (2nd element) as well as result metadata/extra result (4th element) instead of just the result (3rd element). However, sometimes you want just the result, e.g. when you want to pipe the result for more post-processing. In this case you can use
--naked-res
so you just get:[1,2,3]
- --page-result
-
Filter output through a pager.
This option will pipe the output to a specified pager program. If pager program is not specified, a suitable default e.g.
less
is chosen. - --view-result
-
View output using a viewer.
This option will first save the output to a temporary file, then open a viewer program to view the temporary file. If a viewer program is not chosen, a suitable default, e.g. the browser, is chosen.
Time period criteria options
- --time-end=s
- --time-past=s
-
Limit time period to the past hour/24hour/week/month/year.
Valid values:
["hour","24hour","day","week","month","year"]
- --time-start=s
Other options
COMPLETION
This script has shell tab completion capability with support for several shells.
bash
To activate bash completion for this script, put:
complete -C web-search web-search
in your bash startup (e.g. ~/.bashrc). Your next shell session will then recognize tab completion for the command. Or, you can also directly execute the line above in your shell to activate immediately.
It is recommended, however, that you install modules using cpanm-shcompgen which can activate shell completion for scripts immediately.
tcsh
To activate tcsh completion for this script, put:
complete web-search 'p/*/`web-search`/'
in your tcsh startup (e.g. ~/.tcshrc). Your next shell session will then recognize tab completion for the command. Or, you can also directly execute the line above in your shell to activate immediately.
It is also recommended to install shcompgen (see above).
other shells
For fish and zsh, install shcompgen as described above.
CONFIGURATION FILE
This script can read configuration files. Configuration files are in the format of IOD, which is basically INI with some extra features.
By default, these names are searched for configuration filenames (can be changed using --config-path
): /home/u1/.config/web-search.conf, /home/u1/web-search.conf, or /etc/web-search.conf.
All found files will be read and merged.
To disable searching for configuration files, pass --no-config
.
You can put multiple profiles in a single file by using section names like [profile=SOMENAME]
or [SOMESECTION profile=SOMENAME]
. Those sections will only be read if you specify the matching --config-profile SOMENAME
.
You can also put configuration for multiple programs inside a single file, and use filter program=NAME
in section names, e.g. [program=NAME ...]
or [SOMESECTION program=NAME]
. The section will then only be used when the reading program matches.
You can also filter a section by environment variable using the filter env=CONDITION
in section names. For example if you only want a section to be read if a certain environment variable is true: [env=SOMEVAR ...]
or [SOMESECTION env=SOMEVAR ...]
. If you only want a section to be read when the value of an environment variable equals some string: [env=HOSTNAME=blink ...]
or [SOMESECTION env=HOSTNAME=blink ...]
. If you only want a section to be read when the value of an environment variable does not equal some string: [env=HOSTNAME!=blink ...]
or [SOMESECTION env=HOSTNAME!=blink ...]
. If you only want a section to be read when the value of an environment variable includes some string: [env=HOSTNAME*=server ...]
or [SOMESECTION env=HOSTNAME*=server ...]
. If you only want a section to be read when the value of an environment variable does not include some string: [env=HOSTNAME!*=server ...]
or [SOMESECTION env=HOSTNAME!*=server ...]
. Note that currently due to simplistic parsing, there must not be any whitespace in the value being compared because it marks the beginning of a new section filter or section name.
To load and configure plugins, you can use either the -plugins
parameter (e.g. -plugins=DumpArgs
or -plugins=DumpArgs@before_validate_args
), or use the [plugin=NAME ...]
sections, for example:
[plugin=DumpArgs]
-event=before_validate_args
-prio=99
[plugin=Foo]
-event=after_validate_args
arg1=val1
arg2=val2
which is equivalent to setting -plugins=-DumpArgs@before_validate_args@99,-Foo@after_validate_args,arg1,val1,arg2,val2
.
List of available configuration parameters:
action (see --action)
append (see --append)
delay (see --delay)
engine (see --engine)
format (see --format)
log_level (see --log-level)
max_delay (see --max-delay)
min_delay (see --min-delay)
naked_res (see --naked-res)
num (see --num)
prepend (see --prepend)
queries (see --query)
queries_from (see --queries-from)
time_end (see --time-end)
time_past (see --time-past)
time_start (see --time-start)
ENVIRONMENT
WEB_SEARCH_OPT
String. Specify additional command-line options.
FILES
/home/u1/.config/web-search.conf
/home/u1/web-search.conf
/etc/web-search.conf
EXAMPLES
Open a single query, show 100 results
% web-search "a query" -n 100
Open several queries, limit time period all search to the past month
% web-search "query one" query2 "query number three" --time-past month
Open queries from each line of file, add delay 3s after each query (e.g. to avoid getting rate-limited)
% web-search --queries-from phrases.txt --delay 3s
Open queries from each line of stdin
% prog-that-produces-lines-of-phrases | web-search --queries-from -
Use a custom browser
% BROWSER=lynx web-search "a query"
Use with firefox-container
% BROWSER="firefox-container mycontainer" web-search "query one" query2
Show image search URLs instead of opening them in browser
% web-search --engine google_image --print-url "query one" query2
Print map search URLs as Org links
% web-search --engine google_map --print-org-link "jakarta selatan" "kebun raya bogor"
Prepend prefix words to each query, use Brave Search instead of the default Google (I'm sick of CAPTCHAs)
% web-search --engine brave --prepend "imdb " "carrie" "hocus pocus" "raya"
Append suffix words to each query
% web-search --append " net worth" "lewis capaldi" "beyonce" "lee mack" "mariah carey"
Visit the search URL for each query using Firefox::Marionette then extract and print the links
% web-search "lee mack" --print-result-link
Currently not very useful with some search engines like Google because result HTML page is obfuscated so we can just extract all links in each page instead of selecting (via DOM) only the result links, etc.
If you want to filter the links further by domain, path, etc. you can use grep-url.
Get the IMDB URL for Lee Mack
% web-search "lee mack imdb" --print-result-link | grep-url --host-contains imdb.com | head -n1
HOMEPAGE
Please visit the project's homepage at https://metacpan.org/release/App-WebSearchUtils.
SOURCE
Source repository is at https://github.com/perlancar/perl-App-WebSearchUtils.
SEE ALSO
App::FirefoxMultiAccountContainersUtils.
AUTHOR
perlancar <perlancar@cpan.org>
CONTRIBUTING
To contribute, you can send patches by email/via RT, or send pull requests on GitHub.
Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:
% prove -l
If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.
COPYRIGHT AND LICENSE
This software is copyright (c) 2022 by perlancar <perlancar@cpan.org>.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
BUGS
Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=App-WebSearchUtils
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.