NAME
WWW::Mechanize::Chrome - automate the Chrome browser
SYNOPSIS
use Log::Log4perl qw(:easy);
use WWW::Mechanize::Chrome;
Log::Log4perl->easy_init($ERROR); # Set priority of root logger to ERROR
my $mech = WWW::Mechanize::Chrome->new();
$mech->get('https://google.com');
$mech->eval_in_page('alert("Hello Chrome")');
my $png= $mech->content_as_png();
DESCRIPTION
This module provides a scriptable web client in a Perl object. In contrast to WWW::Mechanize it delegates fetching web pages and rendering them to the Chrome (or Chromium) browser: It starts an instance of the browser and controls it using Chrome DevTools.
Use Case
This module gives access to features of today's web applications which are not (yet) available with WWW::Mechanize:
Page content created or modified by JavaScript. You can also run your own JavaScript code on the page content.
Selection of page content using CSS selectors.
Render a page into a screenshot image or a PDF file.
A collection of Examples should give you a start.
It comes with a price: You need to have a Chrome compatible browser installed, and you need to live with the quirks and error messages issued by this browser when used with DevTools.
METHODS
WWW::Mechanize::Chrome->new %options
my $mech = WWW::Mechanize::Chrome->new();
- autodie
-
Control whether HTTP errors are fatal.
autodie => 0, # make HTTP errors non-fatal
The default is to have HTTP errors fatal, as that makes debugging much easier than expecting you to actually check the results of every action.
- host
-
Specify the host where Chrome listens
host => 'localhost'
Most likely you don't want to have Chrome listening on an outside port on a machine connected to the internet.
- port
-
Specify the port of Chrome to connect to
port => 9222
- tab
-
Specify which tab to connect to
tab => 'current'
If you want to connect to a tab by title, you can pass in a regular expression matching that title. If you want to create a new tab, pass in a false value.
- log
-
A premade Log::Log4perl object
- launch_exe
-
Specify the path to the Chrome executable.
The default is
chrome
on Windows andgoogle-chrome
elsewhere, as found via$ENV{PATH}
. If you want to use Chromium, you need to specify that explicitly via:launch_exe => 'chromium-browser', # if Chromium is named chromium-browser on your OS
You can also provide this information from the outside to the class by setting
$ENV{CHROME_BIN}
. - start_url
-
Launch Chrome with the given URL. Normally you would use the
->get
method instead. - launch_arg
-
Specify additional parameters to the Chrome executable.
launch_arg => [ "--some-new-parameter=foo" ],
Interesting parameters might be
'--start-maximized', '--window-size=1280x1696' '--ignore-certificate-errors' '--disable-web-security', '--allow-running-insecure-content', '--load-extension' '--no-sandbox'
- profile
-
Profile directory for this session. If not given, Chrome will use your current user profile.
- incognito
-
Launch Chrome in incognito mode.
- data_directory
-
The base data directory for this session. If not given, Chrome will use your current base directory.
use File::Temp 'tempdir'; # create a fresh Chrome every time my $mech = WWW::Mechanize::Chrome->new( data_directory => tempdir(CLEANUP => 1 ), );
- startup_timeout
-
startup_timeout => 20,
The maximum number of seconds to wait until Chrome is ready. This helps on slow systems where Chrome takes some time starting up. The process will try every second to connect to Chrome.
- listen_host
-
listen_host => 'myhostname'
Specify the interface where a launched Chrome process should listen. This is usually not needed but available if you want to connect to the launched Chrome process from other machines as well.
- driver
-
A premade Chrome::DevToolsProtocol object.
- report_js_errors
-
If set to 1, after each request tests for Javascript errors and warns. Useful for testing with
use warnings qw(fatal)
. - mute_audio
-
Mutes the audio output. This setting is enabled by default.
- background_networking
-
Enable "background networking".
Disabled by default.
- client_side_phishing_detection
-
Enable "client side phising detection".
Disabled by default.
- component_update
-
Enable "component update".
Disabled by default.
- default_apps
-
Enable "default apps".
Disabled by default.
- hang_monitor
-
Enable "hang monitor".
Disabled by default.
- hide_scrollbars
-
Hide scrollbars.
Disabled by default.
- infobars
-
Enable "infobars".
Disabled by default.
- popup_blocking
-
Enable "popup blocking".
Disabled by default.
- prompt_on_repost
-
Enable "prompt on repost".
Disabled by default.
- save_password_bubble
-
Enable the "save password" bubble.
Disabled by default.
- sync
-
Enable "sync".
Disabled by default.
- web_resources
-
Enable "Web resources".
Disabled by default.
You can override the class to implement the transport from the outside by setting $ENV{WWW_MECHANIZE_CHROME_TRANSPORT}
to the transport class. This is mostly used for testing but can be useful to exclude the underlying websocket implementation(s) as source of bugs.
WWW::Mechanize::Chrome->find_executable
my $chrome = WWW::Mechanize::Chrome->find_executable();
my $chrome = WWW::Mechanize::Chrome->find_executable(
'chromium.exe',
'.\\my-chrome-66\\',
);
my( $chrome, $diagnosis ) = WWW::Mechanize::Chrome->find_executable(
['chromium-browser','google-chrome'],
'./my-chrome-66/',
);
die $diagnosis if ! $chrome;
Finds the first Chrome executable in the path ($ENV{PATH}
). For Windows, it also looks in $ENV{ProgramFiles}
, $ENV{ProgramFiles(x86)}
and $ENV{"ProgramFilesW6432"}
. For OSX it also looks in the user home directory as given through $ENV{HOME}
.
This is used to find the default Chrome executable if none was given through the launch_exe
option or if the executable is given and does not exist and does not contain a directory separator.
$mech->chrome_version
print $mech->chrome_version;
Returns the version of the Chrome executable being used. This information needs launching the browser and asking for the version via the network.
$mech->chrome_version_info
print $mech->chrome_version_info->{Browser};
Returns the version information of the Chrome executable and various other APIs of Chrome that the object is connected to.
$mech->driver
my $driver = $mech->driver
Access the Chrome::DevToolsProtocol instance connecting to Chrome.
$mech->tab
my $tab = $mech->tab
Access the tab hash of the Chrome::DevToolsProtocol instance connecting to Chrome. This represents the tab we control.
$mech->allow( %options )
$mech->allow( javascript => 1 );
Allow or disallow execution of Javascript
$mech->emulateNetworkConditions( %options )
# Go offline
$mech->emulateNetworkConditions(
offline => JSON::true,
latency => 10, # ms ping
downloadThroughput => 0, # bytes/s
uploadThroughput => 0, # bytes/s
connectionType => 'offline', # cellular2g, cellular3g, cellular4g, bluetooth, ethernet, wifi, wimax, other.
);
$mech->setRequestInterception( @patterns )
$mech->setRequestInterception(
{ urlPattern => '*', resourceType => 'Document', interceptionStage => 'Request'},
{ urlPattern => '*', resourceType => 'Media', interceptionStage => 'Response'},
);
Sets the list of request patterns and resource types for which the interception callback will be invoked.
$mech->add_listener
my $url_loaded = $mech->add_listener('Network.responseReceived', sub {
my( $info ) = @_;
warn "Loaded URL "
. $info->{params}->{response}->{url}
. ": "
. $info->{params}->{response}->{status};
warn "Resource timing: " . Dumper $info->{params}->{response}->{timing};
});
Returns a listener object. If that object is discarded, the listener callback will be removed.
Calling this method in void context croaks.
To see the browser console live from your Perl script, use the following:
my $console = $mech->add_listener('Runtime.consoleAPICalled', sub {
warn join ", ",
map { $_->{value} // $_->{description} }
@{ $_[0]->{params}->{args} };
});
$mech->on_request_intercepted( $cb )
$mech->on_request_intercepted( sub {
my( $mech, $info ) = @_;
warn $info->{request}->{url};
$mech->continueInterceptedRequest_future(
interceptionId => $info->{interceptionId}
)
});
A callback for intercepted requests that match the patterns set up via setRequestInterception
.
If you return a future from this callback, it will not be discarded but kept in a safe place.
$mech->searchInResponseBody( $id, %options )
my $request_id = ...;
my @matches = $mech->searchInResponseBody(
requestId => $request_id,
query => 'rumpelstiltskin',
caseSensitive => JSON::true,
isRegex => JSON::false,
);
for( @matches ) {
print $_->{lineNumber}, ":", $_->{lineContent}, "\n";
};
Returns the matches (if any) for a string or regular expression within a response.
$mech->on_dialog( $cb )
$mech->on_dialog( sub {
my( $mech, $dialog ) = @_;
warn $dialog->{message};
$mech->handle_dialog( 1 ); # click "OK" / "yes" instead of "cancel"
});
A callback for Javascript dialogs (alert()
, prompt()
, ... )
$mech->handle_dialog( $accept, $prompt = undef )
$mech->on_dialog( sub {
my( $mech, $dialog ) = @_;
warn "[Javascript $dialog->{type}]: $dialog->{message}";
$mech->handle_dialog( 1 ); # click "OK" / "yes" instead of "cancel"
});
Closes the current Javascript dialog. Depending on
$mech->js_console_entries()
print $_->{type}, " ", $_->{message}, "\n"
for $mech->js_console_entries();
An interface to the Javascript Error Console
Returns the list of entries in the JEC
$mech->js_errors()
print "JS error: ", $_->{message}, "\n"
for $mech->js_errors();
Returns the list of errors in the JEC
$mech->clear_js_errors()
$mech->clear_js_errors();
Clears all Javascript messages from the console
$mech->eval_in_page( $str )
$mech->eval( $str )
my ($value, $type) = $mech->eval( '2+2' );
Evaluates the given Javascript fragment in the context of the web page. Returns a pair of value and Javascript type.
This allows access to variables and functions declared "globally" on the web page.
This method is special to WWW::Mechanize::Chrome.
$mech->eval_in_chrome $code, @args
$mech->eval_in_chrome(<<'JS', "Foobar/1.0");
this.settings.userAgent= arguments[0]
JS
Evaluates Javascript code in the context of Chrome.
This allows you to modify properties of Chrome.
This is currently not implemented.
$mech->callFunctionOn( $function, @arguments )
my ($value, $type) = $mech->callFunctionOn( 'function(greeting) { alert(greeting)}', 'Hello World' );
Runs the given function with the specified arguments.
This method is special to WWW::Mechanize::Chrome.
$mech->highlight_node( @nodes )
my @links = $mech->selector('a');
$mech->highlight_node(@links);
print $mech->content_as_png();
Convenience method that marks all nodes in the arguments with a red frame.
This is convenient if you need visual verification that you've got the right nodes.
NAVIGATION METHODS
$mech->get( $url, %options )
my $response = $mech->get( $url );
Retrieves the URL URL
.
It returns a HTTP::Response object for interface compatibility with WWW::Mechanize.
Note that Chrome does not support download of files.
$mech->_collectEvents
my $events = $mech->_collectEvents(
sub { $_[0]->{method} eq 'Page.loadEventFired' }
);
my( $e,$r) = Future->wait_all( $events, $self->driver->send_message(...));
Internal method to create a Future that waits for an event that is sent by Chrome.
The subroutine is the predicate to check to see if the current event is the event we have been waiting for.
The result is a Future that will return all captured events.
$mech->get_local( $filename , %options )
$mech->get_local('test.html');
Shorthand method to construct the appropriate file://
URI and load it into Chrome. Relative paths will be interpreted as relative to $0
or the basedir
option.
This method accepts the same options as ->get()
.
This method is special to WWW::Mechanize::Chrome but could also exist in WWW::Mechanize through a plugin.
Warning: Chrome does not handle local files well. Especially subframes do not get loaded properly.
$mech->getRequestPostData
if( $info->{params}->{response}->{requestHeaders}->{":method"} eq 'POST' ) {
$req->{postBody} = $m->getRequestPostData( $id );
};
Retrieves the data sent with a POST request
$mech->post( $url, %options )
not implemented
$mech->post( 'http://example.com',
params => { param => "Hello World" },
headers => {
"Content-Type" => 'application/x-www-form-urlencoded',
},
charset => 'utf-8',
);
Sends a POST request to $url
.
A Content-Length
header will be automatically calculated if it is not given.
The following options are recognized:
headers
- a hash of HTTP headers to send. If not given, the content type will be generated automatically.data
- the raw data to send, if you've encoded it already.
$mech->reload( %options )
$mech->reload( ignoreCache => 1 )
Acts like the reload button in a browser: repeats the current request. The history (as per the "back" method) is not altered.
Returns the HTTP::Response object from the reload, or undef if there's no current request.
$mech->set_download_directory( $dir )
my $downloads = tempdir();
$mech->set_download_directory( $downloads );
Enables automatic file downloads and sets the directory where the files will be downloaded to. Setting this to undef will disable downloads again.
The directory in $dir
must be an absolute path, since Chrome does not know about the current directory of your Perl script.
$mech->add_header( $name => $value, ... )
$mech->add_header(
'X-WWW-Mechanize-Chrome' => "I'm using it",
Encoding => 'text/klingon',
);
This method sets up custom headers that will be sent with every HTTP(S) request that Chrome makes.
Note that currently, we only support one value per header.
$mech->delete_header( $name , $name2... )
$mech->delete_header( 'User-Agent' );
Removes HTTP headers from the agent's list of special headers. Note that Chrome may still send a header with its default value.
$mech->reset_headers
$mech->reset_headers();
Removes all custom headers and makes Chrome send its defaults again.
$mech->block_urls()
$mech->block_urls( '//facebook.com/js/conversions/tracking.js' );
Sets the list of blocked URLs. These URLs will not be retrieved by Chrome when loading a page. This is useful to eliminate tracking images or to test resilience in face of bad network conditions.
$mech->res()
/ $mech->response(%options)
my $response = $mech->response(headers => 0);
Returns the current response as a HTTP::Response object.
$mech->success()
$mech->get('http://google.com');
print "Yay"
if $mech->success();
Returns a boolean telling whether the last request was successful. If there hasn't been an operation yet, returns false.
This is a convenience function that wraps $mech->res->is_success
.
$mech->status()
$mech->get('http://google.com');
print $mech->status();
# 200
Returns the HTTP status code of the response. This is a 3-digit number like 200 for OK, 404 for not found, and so on.
$mech->back()
$mech->back();
Goes one page back in the page history.
Returns the (new) response.
$mech->forward()
$mech->forward();
Goes one page forward in the page history.
Returns the (new) response.
$mech->stop()
$mech->stop();
Stops all loading in Chrome, as if you pressed ESC
.
This function is mostly of use in callbacks or in a timer callback from your event loop.
$mech->uri()
print "We are at " . $mech->uri;
Returns the current document URI.
$mech->infinite_scroll( [$wait_time_in_seconds] )
$new_content_found = $mech->infinite_scroll(3);
Loads content into pages that have "infinite scroll" capabilities by scrolling to the bottom of the web page and waiting up to the number of seconds, as set by the optional $wait_time_in_seconds
argument, for the browser to load more content. The default is to wait up to 20 seconds. For reasonbly fast sites, the wait time can be set much lower.
The method returns a boolean true
if new content is loaded, false
otherwise. You can scroll to the end (if there is one) of an infinitely scrolling page like so:
while( $mech->infinite_scroll ) {
# Tests for exiting the loop earlier
last if $count++ >= 10;
}
CONTENT METHODS
$mech->document_future()
$mech->document()
print $self->document->{nodeId};
This is WWW::Mechanize::Chrome specific.
$mech->content( %options )
print $mech->content;
print $mech->content( format => 'html' ); # default
print $mech->content( format => 'text' ); # identical to ->text
This always returns the content as a Unicode string. It tries to decode the raw content according to its input encoding. This currently only works for HTML pages, not for images etc.
Recognized options:
format
- the stuff to returnThe allowed values are
html
andtext
. The default ishtml
.
$mech->text()
print $mech->text();
Returns the text of the current HTML content. If the content isn't HTML, $mech will die.
$mech->content_encoding()
print "The content is encoded as ", $mech->content_encoding;
Returns the encoding that the content is in. This can be used to convert the content from UTF-8 back to its native encoding.
$mech->update_html( $html )
$mech->update_html($html);
Writes $html
into the current document. This is mostly implemented as a convenience method for HTML::Display::MozRepl.
$mech->base()
print $mech->base;
Returns the URL base for the current page.
The base is either specified through a base
tag or is the current URL.
This method is specific to WWW::Mechanize::Chrome.
$mech->content_type()
$mech->ct()
print $mech->content_type;
Returns the content type of the currently loaded document
$mech->is_html()
print $mech->is_html();
Returns true/false on whether our content is HTML, according to the HTTP headers.
$mech->title()
print "We are on page " . $mech->title;
Returns the current document title.
EXTRACTION METHODS
$mech->links()
print $_->text . " -> " . $_->url . "\n"
for $mech->links;
Returns all links in the document as WWW::Mechanize::Link objects.
Currently accepts no parameters. See ->xpath
or ->selector
when you want more control.
$mech->selector( $css_selector, %options )
my @text = $mech->selector('p.content');
Returns all nodes matching the given CSS selector. If $css_selector
is an array reference, it returns all nodes matched by any of the CSS selectors in the array.
This takes the same options that ->xpath
does.
This method is implemented via WWW::Mechanize::Plugin::Selector.
$mech->find_link_dom( %options )
print $_->{innerHTML} . "\n"
for $mech->find_link_dom( text_contains => 'CPAN' );
A method to find links, like WWW::Mechanize's ->find_links
method. This method returns DOM objects from Chrome instead of WWW::Mechanize::Link objects.
Note that Chrome might have reordered the links or frame links in the document so the absolute numbers passed via n
might not be the same between WWW::Mechanize and WWW::Mechanize::Chrome.
The supported options are:
text
andtext_contains
andtext_regex
Match the text of the link as a complete string, substring or regular expression.
Matching as a complete string or substring is a bit faster, as it is done in the XPath engine of Chrome.
id
andid_contains
andid_regex
Matches the
id
attribute of the link completely or as partname
andname_contains
andname_regex
Matches the
name
attribute of the linkurl
andurl_regex
Matches the URL attribute of the link (
href
,src
orcontent
).class
- theclass
attribute of the linkn
- the (1-based) index. Defaults to returning the first link.single
- If true, ensure that only one element is found. Otherwise croak or carp, depending on theautodie
parameter.one
- If true, ensure that at least one element is found. Otherwise croak or carp, depending on theautodie
parameter.The method
croak
s if no link is found. If thesingle
option is true, it alsocroak
s when more than one link is found.
$mech->find_link( %options )
print $_->text . "\n"
for $mech->find_link( text_contains => 'CPAN' );
A method quite similar to WWW::Mechanize's method. The options are documented in ->find_link_dom
.
Returns a WWW::Mechanize::Link object.
This defaults to not look through child frames.
$mech->find_all_links( %options )
print $_->text . "\n"
for $mech->find_all_links( text_regex => qr/google/i );
Finds all links in the document. The options are documented in ->find_link_dom
.
Returns them as list or an array reference, depending on context.
This defaults to not look through child frames.
$mech->find_all_links_dom %options
print $_->{innerHTML} . "\n"
for $mech->find_all_links_dom( text_regex => qr/google/i );
Finds all matching linky DOM nodes in the document. The options are documented in ->find_link_dom
.
Returns them as list or an array reference, depending on context.
This defaults to not look through child frames.
$mech->follow_link( $link )
$mech->follow_link( %options )
$mech->follow_link( xpath => '//a[text() = "Click here!"]' );
Follows the given link. Takes the same parameters that find_link_dom
uses.
Note that ->follow_link
will only try to follow link-like things like A
tags.
$mech->xpath( $query, %options )
my $link = $mech->xpath('//a[id="clickme"]', one => 1);
# croaks if there is no link or more than one link found
my @para = $mech->xpath('//p');
# Collects all paragraphs
my @para_text = $mech->xpath('//p/text()', type => $mech->xpathResult('STRING_TYPE'));
# Collects all paragraphs as text
Runs an XPath query in Chrome against the current document.
If you need more information about the returned results, use the ->xpathEx()
function.
The options allow the following keys:
document
- document in which the query is to be executed. Use this to search a node within a specific subframe of$mech->document
.frames
- if true, search all documents in all frames and iframes. This may or may not conflict withnode
. This will default to theframes
setting of the WWW::Mechanize::Chrome object.node
- node relative to which the query is to be executed. Note that you will have to use a relative XPath expression as well. Use.//foo
instead of
//foo
single
- If true, ensure that only one element is found. Otherwise croak or carp, depending on theautodie
parameter.one
- If true, ensure that at least one element is found. Otherwise croak or carp, depending on theautodie
parameter.maybe
- If true, ensure that at most one element is found. Otherwise croak or carp, depending on theautodie
parameter.all
- If true, return all elements found. This is the default. You can use this option if you want to use->xpath
in scalar context to count the number of matched elements, as it will otherwise emit a warning for each usage in scalar context without any of the above restricting options.any
- no error is raised, no matter if an item is found or not.
Returns the matched results as WWW::Mechanize::Chrome::Node objects.
You can pass in a list of queries as an array reference for the first parameter. The result will then be the list of all elements matching any of the queries.
This is a method that is not implemented in WWW::Mechanize.
In the long run, this should go into a general plugin for WWW::Mechanize.
$mech->by_id( $id, %options )
my @text = $mech->by_id('_foo:bar');
Returns all nodes matching the given ids. If $id
is an array reference, it returns all nodes matched by any of the ids in the array.
This method is equivalent to calling ->xpath
:
$self->xpath(qq{//*[\@id="$_"]}, %options)
It is convenient when your element ids get mistaken for CSS selectors.
$mech->click( $name [,$x ,$y] )
$mech->click( 'go' );
$mech->click({ xpath => '//button[@name="go"]' });
Has the effect of clicking a button (or other element) on the current form. The first argument is the name
of the button to be clicked. The second and third arguments (optional) allow you to specify the (x,y) coordinates of the click.
If there is only one button on the form, $mech->click()
with no arguments simply clicks that one button.
If you pass in a hash reference instead of a name, the following keys are recognized:
text
- Find the element to click by its contained textselector
- Find the element to click by the CSS selectorxpath
- Find the element to click by the XPath querydom
- Click on the passed DOM elementYou can use this to click on arbitrary page elements. There is no convenient way to pass x/y co-ordinates with this method.
id
- Click on the element with the given idThis is useful if your document ids contain characters that do look like CSS selectors. It is equivalent to
xpath => qq{//*[\@id="$id"]}
Returns a HTTP::Response object.
As a deviation from the WWW::Mechanize API, you can also pass a hash reference as the first parameter. In it, you can specify the parameters to search much like for the find_link
calls.
$mech->click_button( ... )
$mech->click_button( name => 'go' );
$mech->click_button( input => $mybutton );
Has the effect of clicking a button on the current form by specifying its name, value, or index. Its arguments are a list of key/value pairs. Only one of name, number, input or value must be specified in the keys.
name
- name of the buttonvalue
- value of the buttoninput
- DOM nodeid
- id of the buttonnumber
- number of the button
If you find yourself wanting to specify a button through its selector
or xpath
, consider using ->click
instead.
FORM METHODS
$mech->current_form()
print $mech->current_form->{name};
Returns the current form.
This method is incompatible with WWW::Mechanize. It returns the DOM <form>
object and not a HTML::Form instance.
The current form will be reset by WWW::Mechanize::Chrome on calls to ->get()
and ->get_local()
, and on calls to ->submit()
and ->submit_with_fields
.
$mech->dump_forms( [$fh] )
open my $fh, '>', 'form-log.txt'
or die "Couldn't open logfile 'form-log.txt': $!";
$mech->dump_forms( $fh );
Prints a dump of the forms on the current page to the filehandle $fh
. If $fh
is not specified or is undef, it dumps to STDOUT
.
$mech->form_name( $name [, %options] )
$mech->form_name( 'search' );
Selects the current form by its name. The options are identical to those accepted by the "$mech->xpath" method.
$mech->form_id( $id [, %options] )
$mech->form_id( 'login' );
Selects the current form by its id
attribute. The options are identical to those accepted by the "$mech->xpath" method.
This is equivalent to calling
$mech->by_id($id,single => 1,%options)
$mech->form_number( $number [, %options] )
$mech->form_number( 2 );
Selects the numberth form. The options are identical to those accepted by the "$mech->xpath" method.
$mech->form_with_fields( [$options], @fields )
$mech->form_with_fields(
'user', 'password'
);
Find the form which has the listed fields.
If the first argument is a hash reference, it's taken as options to ->xpath
.
See also "$mech->submit_form".
$mech->forms( %options )
my @forms = $mech->forms();
When called in a list context, returns a list of the forms found in the last fetched page. In a scalar context, returns a reference to an array with those forms.
The options are identical to those accepted by the "$mech->selector" method.
The returned elements are the DOM <form>
elements.
$mech->field( $selector, $value, [,\@pre_events [,\@post_events]] )
$mech->field( user => 'joe' );
$mech->field( not_empty => '', [], [] ); # bypass JS validation
Sets the field with the name given in $selector
to the given value. Returns the value.
The method understands very basic CSS selectors in the value for $selector
, like the HTML::Form find_input() method.
A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
By passing the array reference @pre_events
, you can indicate which Javascript events you want to be triggered before setting the value. @post_events
contains the events you want to be triggered after setting the value.
By default, the events set in the constructor for pre_events
and post_events
are triggered.
$mech->sendkeys( %options )
$mech->sendkeys( string => "Hello World" );
Sends a series of keystrokes. The keystrokes can be either a string or a reference to an array containing the detailed data as hashes.
- string - the string to send as keystrokes
- keys - reference of the array to send as keystrokes
- delay - delay in ms to sleep between keys
$mech->upload( $selector, $value )
$mech->upload( user_picture => 'C:/Users/Joe/face.png' );
Sets the file upload field with the name given in $selector
to the given file. The filename must be an absolute path and filename in the local filesystem.
The method understands very basic CSS selectors in the value for $selector
, like the ->field
method.
$mech->value( $selector_or_element, [%options] )
print $mech->value( 'user' );
Returns the value of the field given by $selector_or_name
or of the DOM element passed in.
The legacy form of
$mech->value( name => value );
is also still supported but will likely be deprecated in favour of the ->field
method.
For fields that can have multiple values, like a select
field, the method is context sensitive and returns the first selected value in scalar context and all values in list context.
Note that this method does not support file uploads. See the ->upload
method for that.
$mech->get_set_value( %options )
Allows fine-grained access to getting/setting a value with a different API. Supported keys are:
name
value
pre
post
in addition to all keys that $mech->xpath
supports.
$mech->submit( $form )
$mech->submit;
Submits the form. Note that this does not fire the onClick
event and thus also does not fire eventual Javascript handlers. Maybe you want to use $mech->click
instead.
The default is to submit the current form as returned by $mech->current_form
.
$mech->submit_form( %options )
$mech->submit_form(
with_fields => {
user => 'me',
pass => 'secret',
}
);
This method lets you select a form from the previously fetched page, fill in its fields, and submit it. It combines the form_number/form_name, set_fields and click methods into one higher level call. Its arguments are a list of key/value pairs, all of which are optional.
form => $mech->current_form()
Specifies the form to be filled and submitted. Defaults to the current form.
fields => \%fields
Specifies the fields to be filled in the current form
with_fields => \%fields
Probably all you need for the common case. It combines a smart form selector and data setting in one operation. It selects the first form that contains all fields mentioned in \%fields. This is nice because you don't need to know the name or number of the form to do this.
(calls "$mech->form_with_fields()" and "$mech->set_fields()").
If you choose this, the form_number, form_name, form_id and fields options will be ignored.
$mech->set_fields( $name => $value, ... )
$mech->set_fields(
user => 'me',
pass => 'secret',
);
This method sets multiple fields of the current form. It takes a list of field name and value pairs. If there is more than one field with the same name, the first one found is set. If you want to select which of the duplicate field to set, use a value which is an anonymous array which has the field value and its number as the 2 elements.
CONTENT MONITORING METHODS
$mech->is_visible( $element )
$mech->is_visible( %options )
if ($mech->is_visible( selector => '#login' )) {
print "You can log in now.";
};
Returns true if the element is visible, that is, it is a member of the DOM and neither it nor its ancestors have a CSS visibility
attribute of hidden
or a display
attribute of none
.
You can either pass in a DOM element or a set of key/value pairs to search the document for the element you want.
xpath
- the XPath queryselector
- the CSS selectordom
- a DOM node
The remaining options are passed through to either the /$mech->xpath or /$mech->selector method.
$mech->wait_until_invisible( $element )
$mech->wait_until_invisible( %options )
$mech->wait_until_invisible( $please_wait );
Waits until an element is not visible anymore.
Takes the same options as "->is_visible" in $mech->is_visible.
In addition, the following options are accepted:
timeout
- the timeout after which the function willcroak
. To catch the condition and handle it in your calling program, use an eval block. A timeout of0
means to never time out.sleep
- the interval in seconds used to sleep. Subsecond intervals are possible.
Note that when passing in a selector, that selector is requeried on every poll instance. So the following query will work as expected:
xpath => '//*[contains(text(),"stand by")]'
This also means that if your selector query relies on finding a changing text, you need to pass the node explicitly instead of passing the selector.
$mech->wait_until_visible( %options )
$mech->wait_until_visible( selector => 'a.download' );
Waits until an query returns a visible element.
Takes the same options as "->is_visible" in $mech->is_visible.
In addition, the following options are accepted:
timeout
- the timeout after which the function willcroak
. To catch the condition and handle it in your calling program, use an eval block. A timeout of0
means to never time out.sleep
- the interval in seconds used to sleep. Subsecond intervals are possible.
Note that when passing in a selector, that selector is requeried on every poll instance. So the following query will work as expected:
CONTENT RENDERING METHODS
$mech->content_as_png()
my $png_data = $mech->content_as_png();
# Create scaled-down 480px wide preview
my $png_data = $mech->content_as_png(undef, { width => 480 });
Returns the given tab or the current page rendered as PNG image.
All parameters are optional.
This method is specific to WWW::Mechanize::Chrome.
$mech->saveResources_future
my $file_map = $mech->saveResources_future(
target_file => 'this_page.html',
target_dir => 'this_page_files/',
)->get();
Rough prototype of "Save Complete Page" feature
$mech->viewport_size
print Dumper $mech->viewport_size;
$mech->viewport_size({ width => 1388, height => 792 });
Returns (or sets) the new size of the viewport (the "window").
The recognized keys are:
width
height
deviceScaleFactor
mobile
screenWidth
screenHeight
positionX
positionY
$mech->element_as_png( $element )
my $shiny = $mech->selector('#shiny', single => 1);
my $i_want_this = $mech->element_as_png($shiny);
Returns PNG image data for a single element
$mech->render_element( %options )
my $shiny = $mech->selector('#shiny', single => 1);
my $i_want_this= $mech->render_element(
element => $shiny,
format => 'png',
);
Returns the data for a single element or writes it to a file. It accepts all options of ->render_content
.
Note that while the image will have the node in the upper left corner, the width and height of the resulting image will still be the size of the browser window. Cut the image using element_coordinates
if you need exactly the element.
$mech->element_coordinates( $element )
my $shiny = $mech->selector('#shiny', single => 1);
my ($pos) = $mech->element_coordinates($shiny);
print $pos->{left},',', $pos->{top};
Returns the page-coordinates of the $element
in pixels as a hash with four entries, left
, top
, width
and height
.
This function might get moved into another module more geared towards rendering HTML.
$mech->render_content(%options)
my $pdf_data = $mech->render_content( format => 'pdf' );
Returns the current page rendered as PDF or PNG as a bytestring.
Note that the PDF format will only be successful with headless Chrome. At least on Windows, when launching Chrome with a UI, printing to PDF will be unavailable.
This method is specific to WWW::Mechanize::Chrome.
$mech->content_as_pdf(%options)
my $pdf_data = $mech->content_as_pdf();
my $pdf_data = $mech->content_as_pdf( format => 'A4' );
my $pdf_data = $mech->content_as_pdf( paperWidth => 8, paperHeight => 11 );
Returns the current page rendered in PDF format as a bytestring. The page format can be specified through the format
option.
Note that this method will only be successful with headless Chrome. At least on Windows, when launching Chrome with a UI, printing to PDF will be unavailable.
This method is specific to WWW::Mechanize::Chrome.
INTERNAL METHODS
These are methods that are available but exist mostly as internal helper methods. Use of these is discouraged.
$mech->element_query( \@elements, \%attributes )
my $query = $mech->element_query(['input', 'select', 'textarea'],
{ name => 'foo' });
Returns the XPath query that searches for all elements with tagName
s in @elements
having the attributes %attributes
. The @elements
will form an or
condition, while the attributes will form an and
condition.
DEBUGGING METHODS
This module can collect the screencasts that Chrome can produce. The screencasts are sent to your callback which either feeds them to ffmpeg
to create a video out of them or dumps them to disk as sequential images.
sub saveFrame {
my( $mech, $framePNG ) = @_;
print $framePNG->{data};
}
$mech->setScreenFrameCallback( \&saveFrame );
... do stuff ...
$mech->setScreenFrameCallback( undef ); # stop recording
$mech->sleep
$mech->sleep( 2 ); # wait for things to settle down
Suspends the progress of the program while still handling messages from Chrome.
The main use of this method is to give Chrome enough time to send all its screencast frames and to catch up before shutting down the connection.
INCOMPATIBILITIES WITH WWW::Mechanize
As this module is in a very early stage of development, there are many incompatibilities. The main thing is that only the most needed WWW::Mechanize methods have been implemented by me so far.
Unsupported Methods
At least the following methods are unsupported:
->find_all_inputs
This function is likely best implemented through
$mech->selector
.->find_all_submits
This function is likely best implemented through
$mech->selector
.->images
This function is likely best implemented through
$mech->selector
.->find_image
This function is likely best implemented through
$mech->selector
.->find_all_images
This function is likely best implemented through
$mech->selector
.
Functions that will likely never be implemented
These functions are unlikely to be implemented because they make little sense in the context of Chrome.
->clone
->credentials( $username, $password )
->get_basic_credentials( $realm, $uri, $isproxy )
->clear_credentials()
->put
I have no use for it
->post
This module does not yet support POST requests
INSTALLING
See WWW::Mechanize::Chrome::Install
SEE ALSO
https://developer.chrome.com/devtools/docs/debugging-clients - the Chrome DevTools homepage
https://github.com/GoogleChrome/lighthouse - Google Lighthouse, the main client of the Chrome API
WWW::Mechanize - the module whose API grandfathered this module
WWW::Mechanize::Chrome::Node - objects representing HTML in Chrome
WWW::Mechanize::Firefox - a similar module with a visible application automating Firefox
WWW::Mechanize::PhantomJS - a similar module without a visible application automating PhantomJS
REPOSITORY
The public repository of this module is https://github.com/Corion/www-mechanize-chrome.
SUPPORT
The public support forum of this module is https://perlmonks.org/.
TALKS
I've given a German talk at GPW 2017, see http://act.yapc.eu/gpw2017/talk/7027 and https://corion.net/talks for the slides.
At The Perl Conference 2017 in Amsterdam, I also presented a talk, see http://act.perlconference.org/tpc-2017-amsterdam/talk/7022. The slides for the English presentation at TPCiA 2017 are at https://corion.net/talks/WWW-Mechanize-Chrome/www-mechanize-chrome.en.html.
At the London Perl Workshop 2017 in London, I also presented a talk, see Youtube . The slides for that talk are here.
BUG TRACKER
Please report bugs in this module via the RT CPAN bug queue at https://rt.cpan.org/Public/Dist/Display.html?Name=WWW-Mechanize-Chrome or via mail to www-mechanize-Chrome-Bugs@rt.cpan.org.
CONTRIBUTING
Please see WWW::Mechanize::Chrome::Contributing.
KNOWN ISSUES
When Chrome is run in headless mode, Chrome throws a Lost UI shared context
error. This error can be ignored and does not affect the operation of this module.
AUTHOR
Max Maischein corion@cpan.org
CONTRIBUTORS
Andreas König andk@cpan.org
Tobias Leich froggs@cpan.org
Steven Dondley s@dondley.org
COPYRIGHT (c)
Copyright 2010-2018 by Max Maischein corion@cpan.org
.
LICENSE
This module is released under the same terms as Perl itself.