NAME

AnyEvent::WebDriver - control browsers using the W3C WebDriver protocol

SYNOPSIS

# start geckodriver(chromedriver or any other webdriver via the shell
$ geckodriver -b myfirefox/firefox --log trace --port 4444
# chromedriver --port=4444

# then use it
use AnyEvent::WebDriver;

# create a new webdriver object
my $wd = new AnyEvent::WebDriver;

# create a new session with default capabilities.
$wd->new_session ({});

$wd->navigate_to ("https://duckduckgo.com/html");
my $searchbox = $wd->find_element (css => 'input[type="text"]');

$wd->element_send_keys ($searchbox => "free software");
$wd->element_click ($wd->find_element (css => 'input[type="submit"]'));

# session gets autodeleted by default, so wait a bit
sleep 10;

# this is an example of an action sequence
$wd->actions
   ->move ($wd->find_element (...), 40, 5)
   ->click
   ->type ("some text")
   ->key ("{Enter}")
   ->perform;

DESCRIPTION

This module aims to implement the W3C WebDriver specification which is the standardised equivalent to the Selenium WebDriver API, which in turn aims at remotely controlling web browsers such as Firefox or Chromium.

One of the design goals of this module was to stay very close to the language and words used in the WebDriver specification itself, so to make most of this module, or, in fact, to make any reasonable use of this module, you would need to refer to the W3C WebDriver recommendation, which can be found here:

https://www.w3.org/TR/webdriver1/

Mozilla's geckodriver has had webdriver support for a long time, while chromedriver only has basic and mostly undocumented webdriver support as of release 77.

In Debian GNU/Linux, you can install the chromedriver for chromium via the chromium-driver package. Unfortunately, there is no (working) package for geckodriver, but you can download it from github.

CONVENTIONS

Unless otherwise stated, all delays and time differences in this module are represented as an integer number of milliseconds, which is perhaps surprising to users of my other modules but is what the WebDriver spec uses.

WEBDRIVER OBJECTS

new AnyEvent::WebDriver key => value...

Create a new WebDriver object. Example for a remote WebDriver connection (the only type supported at the moment):

my $wd = new AnyEvent::WebDriver endpoint => "http://localhost:4444";

Supported keys are:

endpoint => $string

For remote connections, the endpoint to connect to (defaults to http://localhost:4444).

proxy => $proxyspec

The proxy to use (same as the proxy argument used by AnyEvent::HTTP). The default is undef, which disables proxies. To use the system-provided proxy (e.g. http_proxy environment variable), specify the string default.

autodelete => $boolean

If true (the default), then automatically execute delete_session when the WebDriver object is destroyed with an active session. If set to a false value, then the session will continue to exist.

Note that due to bugs in perl that are unlikely to get fixed, autodelete is likely ineffective during global destruction and might even crash your process, so you should ensure objects go out of scope before that, or explicitly call delete_session, if you want the session to be cleaned up.

timeout => $seconds

The HTTP timeout, in (fractional) seconds (default: 300). This timeout is reset on any activity, so it is not an overall request timeout. Also, individual requests might extend this timeout if they are known to take longer.

persistent => 1 | undef

If true (the default) then persistent connections will be used for all requests, which assumes you have a reasonably stable connection (such as to localhost :) and that the WebDriver has a persistent timeout much higher than what AnyEvent::HTTP uses.

You can force connections to be closed for non-idempotent requests (the safe default of AnyEvent::HTTP) by setting this to undef.

$al = $wd->actions

Creates an action list associated with this WebDriver. See "ACTION LISTS", below, for full details.

$sessionstring = $wd->save_session

Save the current session in a string so it can be restored load with load_session. Note that only the session data itself is stored (currently the session id and capabilities), not the endpoint information itself.

The main use of this function is in conjunction with disabled autodelete, to save a session to e.g., and restore it later. It could presumably used for other applications, such as using the same session from multiple processes and so on.

$wd->load_session ($sessionstring)
$wd->set_session ($sessionid, $capabilities)

Starts using the given session, as identified by $sessionid. $capabilities should be the original session capabilities, although the current version of this module does not make any use of it.

The $sessionid is stored in $wd->{sid} (and could be fetched form there for later use), while the capabilities are stored in $wd->{capabilities}.

SIMPLIFIED API

This section documents the simplified API, which is really just a very thin wrapper around the WebDriver protocol commands. They all block the caller until the result is available (using AnyEvent condvars), so must not be called from an event loop callback - see "EVENT BASED API" for an alternative.

The method names are pretty much taken directly from the W3C WebDriver specification, e.g. the request documented in the "Get All Cookies" section is implemented via the get_all_cookies method.

The order is the same as in the WebDriver draft at the time of this writing, and only minimal massaging is done to request parameters and results.

SESSIONS

$wd->new_session ({ key => value... })

Try to connect to the WebDriver and initialize a new session with a "new session" command, passing the given key-value pairs as value (e.g. capabilities).

No session-dependent methods must be called before this function returns successfully, and only one session can be created per WebDriver object.

On success, $wd->{sid} is set to the session ID, and $wd->{capabilities} is set to the returned capabilities.

Simple example of creating a WebDriver object and a new session:

my $wd = new AnyEvent::WebDriver endpoint => "http://localhost:4444";
$wd->new_session ({});

Real-world example with capability negotiation:

$wd->new_session ({
   capabilities => {
      alwaysMatch => {
         pageLoadStrategy        => "eager",
         unhandledPromptBehavior => "dismiss",
         # proxy => { proxyType => "manual", httpProxy => "1.2.3.4:56", sslProxy => "1.2.3.4:56" },
      },
      firstMatch => [
         {
            browserName => "firefox",
            "moz:firefoxOptions" => {
               binary => "firefox/firefox",
               args => ["-devtools", "-headless"],
               prefs => {
                  "dom.webnotifications.enabled" => \0,
                  "dom.push.enabled" => \0,
                  "dom.disable_beforeunload" => \1,
                  "browser.link.open_newwindow" => 3,
                  "browser.link.open_newwindow.restrictions" => 0,
                  "dom.popup_allowed_events" => "",
                  "dom.disable_open_during_load" => \1,
               },
            },
         },
         {
            browserName => "chrome",
            "goog:chromeOptions" => {
               binary => "/bin/chromium",
               args => ["--no-sandbox", "--headless"],
               prefs => {
                  # ...
               },
            },
         },
         {
            # generic fallback
         },
      ],

   },
});

Firefox-specific capability documentation can be found on MDN, Chrome-specific capability documentation might be found here, but the latest release at the time of this writing (chromedriver 77) has essentially no documentation about webdriver capabilities (even MDN has better documentation about chromwedriver!)

If you have URLs for Safari/IE/Edge etc. capabilities, feel free to tell me about them.

$wd->delete_session

Deletes the session - the WebDriver object must not be used after this call (except for calling this method).

This method is always safe to call and will not do anything if there is no active session.

$timeouts = $wd->get_timeouts

Get the current timeouts, e.g.:

my $timeouts = $wd->get_timeouts;
=> { implicit => 0, pageLoad => 300000, script => 30000 }
$wd->set_timeouts ($timeouts)

Sets one or more timeouts, e.g.:

$wd->set_timeouts ({ script => 60000 });
$wd->navigate_to ($url)

Navigates to the specified URL.

$url = $wd->get_current_url

Queries the current page URL as set by navigate_to.

$wd->back

The equivalent of pressing "back" in the browser.

$wd->forward

The equivalent of pressing "forward" in the browser.

$wd->refresh

The equivalent of pressing "refresh" in the browser.

$title = $wd->get_title

Returns the current document title.

COMMAND CONTEXTS

$handle = $wd->get_window_handle

Returns the current window handle.

$wd->close_window

Closes the current browsing context.

$wd->switch_to_window ($handle)

Changes the current browsing context to the given window.

$handles = $wd->get_window_handles

Return the current window handles as an array-ref of handle IDs.

$handles = $wd->switch_to_frame ($frame)

Switch to the given frame identified by $frame, which must be either undef to go back to the top-level browsing context, an integer to select the nth subframe, or an element object.

$handles = $wd->switch_to_parent_frame

Switch to the parent frame.

$rect = $wd->get_window_rect

Return the current window rect(angle), e.g.:

$rect = $wd->get_window_rect
=> { height => 1040, width => 540, x => 0, y => 0 }
$wd->set_window_rect ($rect)

Sets the window rect(angle), e.g.:

$wd->set_window_rect ({ width => 780, height => 560 });
$wd->set_window_rect ({ x => 0, y => 0, width => 780, height => 560 });
$wd->maximize_window
$wd->minimize_window
$wd->fullscreen_window

Changes the window size by either maximising, minimising or making it fullscreen. In my experience, this will timeout if no window manager is running.

ELEMENT RETRIEVAL

To reduce typing and memory strain, the element finding functions accept some shorter and hopefully easier to remember aliases for the standard locator strategy values, as follows:

Alias   Locator Strategy
css     css selector
link    link text
substr  partial link text
tag     tag name
$element = $wd->find_element ($locator_strategy, $selector)

Finds the first element specified by the given selector and returns its element object. Raises an error when no element was found.

Examples showing all standard locator strategies:

$element = $wd->find_element ("css selector" => "body a");
$element = $wd->find_element ("link text" => "Click Here For Porn");
$element = $wd->find_element ("partial link text" => "orn");
$element = $wd->find_element ("tag name" => "input");
$element = $wd->find_element ("xpath" => '//input[@type="text"]');
=> e.g. { "element-6066-11e4-a52e-4f735466cecf" => "decddca8-5986-4e1d-8c93-efe952505a5f" }

Same examples using aliases provided by this module:

$element = $wd->find_element (css => "body a");
$element = $wd->find_element (link => "Click Here For Porn");
$element = $wd->find_element (substr => "orn");
$element = $wd->find_element (tag => "input");
$elements = $wd->find_elements ($locator_strategy, $selector)

As above, but returns an arrayref of all found element objects.

$element = $wd->find_element_from_element ($element, $locator_strategy, $selector)

Like find_element, but looks only inside the specified $element.

$elements = $wd->find_elements_from_element ($element, $locator_strategy, $selector)

Like find_elements, but looks only inside the specified $element.

my $head = $wd->find_element ("tag name" => "head");
my $links = $wd->find_elements_from_element ($head, "tag name", "link");
$element = $wd->get_active_element

Returns the active element.

ELEMENT STATE

$bool = $wd->is_element_selected

Returns whether the given input or option element is selected or not.

$string = $wd->get_element_attribute ($element, $name)

Returns the value of the given attribute.

$string = $wd->get_element_property ($element, $name)

Returns the value of the given property.

$string = $wd->get_element_css_value ($element, $name)

Returns the value of the given CSS value.

$string = $wd->get_element_text ($element)

Returns the (rendered) text content of the given element.

$string = $wd->get_element_tag_name ($element)

Returns the tag of the given element.

$rect = $wd->get_element_rect ($element)

Returns the element rect(angle) of the given element.

$bool = $wd->is_element_enabled

Returns whether the element is enabled or not.

ELEMENT INTERACTION

$wd->element_click ($element)

Clicks the given element.

$wd->element_clear ($element)

Clear the contents of the given element.

$wd->element_send_keys ($element, $text)

Sends the given text as key events to the given element. Key input state can be cleared by embedding \x{e000} in $text. Presumably, you can embed modifiers using their unicode codepoints, but the specification is less than clear to mein this area.

DOCUMENT HANDLING

$source = $wd->get_page_source

Returns the (HTML/XML) page source of the current document.

$results = $wd->execute_script ($javascript, $args)

Synchronously execute the given script with given arguments and return its results ($args can be undef if no arguments are wanted/needed).

$ten = $wd->execute_script ("return arguments[0]+arguments[1]", [3, 7]);
$results = $wd->execute_async_script ($javascript, $args)

Similar to execute_script, but doesn't wait for script to return, but instead waits for the script to call its last argument, which is added to $args automatically.

$twenty = $wd->execute_async_script ("arguments[0](20)", undef);

COOKIES

$cookies = $wd->get_all_cookies

Returns all cookies, as an arrayref of hashrefs.

# google surely sets a lot of cookies without my consent
$wd->navigate_to ("http://google.com");
use Data::Dump;
ddx $wd->get_all_cookies;

Returns a single cookie as a hashref.

Adds the given cookie hashref.

Delete the named cookie.

$wd->delete_all_cookies

Delete all cookies.

ACTIONS

$wd->perform_actions ($actions)

Perform the given actions (an arrayref of action specifications simulating user activity, or an AnyEvent::WebDriver::Actions object). For further details, read the spec or the section "ACTION LISTS", below.

An example to get you started (see the next example for a mostly equivalent example using the AnyEvent::WebDriver::Actions helper API):

$wd->navigate_to ("https://duckduckgo.com/html");
my $input = $wd->find_element ("css selector", 'input[type="text"]');
$wd->perform_actions ([
   {
      id => "myfatfinger",
      type => "pointer",
      pointerType => "touch",
      actions => [
         { type => "pointerMove", duration => 100, origin => $input, x => 40, y => 5 },
         { type => "pointerDown", button => 0 },
         { type => "pause", duration => 40 },
         { type => "pointerUp", button => 0 },
      ],
   },
   {
      id => "mykeyboard",
      type => "key",
      actions => [
         { type => "pause" },
         { type => "pause" },
         { type => "pause" },
         { type => "pause" },
         { type => "keyDown", value => "a" },
         { type => "pause", duration => 100 },
         { type => "keyUp", value => "a" },
         { type => "pause", duration => 100 },
         { type => "keyDown", value => "b" },
         { type => "pause", duration => 100 },
         { type => "keyUp", value => "b" },
         { type => "pause", duration => 2000 },
         { type => "keyDown", value => "\x{E007}" }, # enter
         { type => "pause", duration => 100 },
         { type => "keyUp", value => "\x{E007}" }, # enter
         { type => "pause", duration => 5000 },
      ],
   },
]);

And here is essentially the same (except for fewer pauses) example as above, using the much simpler AnyEvent::WebDriver::Actions API:

$wd->navigate_to ("https://duckduckgo.com/html");
my $input = $wd->find_element ("css selector", 'input[type="text"]');
$wd->actions
   ->move ($input, 40, 5, "touch1")
   ->click
   ->key ("a")
   ->key ("b")
   ->pause (2000) # so you can watch leisurely
   ->key ("{Enter}")
   ->pause (5000) # so you can see the result
   ->perform;
$wd->release_actions

Release all keys and pointer buttons currently depressed.

USER PROMPTS

$wd->dismiss_alert

Dismiss a simple dialog, if present.

$wd->accept_alert

Accept a simple dialog, if present.

$text = $wd->get_alert_text

Returns the text of any simple dialog.

$text = $wd->send_alert_text

Fills in the user prompt with the given text.

SCREEN CAPTURE

$wd->take_screenshot

Create a screenshot, returning it as a PNG image. To decode and save, you could do something like:

use MIME::Base64 ();

my $screenshot = $wd->take_screenshot;

open my $fh, ">", "screenshot.png" or die "screenshot.png: $!\n";

syswrite $fh, MIME::Base64::decode_base64 $screenshot;
$wd->take_element_screenshot ($element)

Similar to take_screenshot, but only takes a screenshot of the bounding box of a single element.

Compatibility note: As of chrome version 80, I found that the screenshot scaling is often wrong (the screenshot is much smaller than the element normally displays) unless chrome runs in headless mode. The spec does allow for any form of scaling, so this is not strictly a bug in chrome, but of course it diminishes trhe screenshot functionality.

PRINT

$wd->print_page (key => value...)

Create a printed version of the document, returning it as a PDF document encoded as base64. See take_screenshot for an example on how to decode and save such a string.

This command takes a lot of optional parameters, see the print section of the WebDriver specification for details.

This command is taken from a draft document, so it might change in the future.

ACTION LISTS

Action lists can be quite complicated. Or at least it took a while for me to twist my head around them. Basically, an action list consists of a number of sources representing devices (such as a finger, a mouse, a pen or a keyboard) and a list of actions for each source, in a timeline.

An action can be a key press, a pointer move or a pause (time delay).

While you can provide these action lists manually, it is (hopefully) less cumbersome to use the API described in this section to create them.

The basic process of creating and performing actions is to create a new action list, adding action sources, followed by adding actions. Finally you would perform those actions on the WebDriver.

Most methods here are designed to chain, i.e. they return the web actions object, to simplify multiple calls.

Also, while actions from different sources can happen "at the same time" in the WebDriver protocol, this class by default ensures that actions will execute in the order specified.

For example, to simulate a mouse click to an input element, followed by entering some text and pressing enter, you can use this:

$wd->actions
   ->click (0, 100)
   ->type ("some text")
   ->key ("{Enter}")
   ->perform;

By default, keyboard and mouse input sources are provided and used. You can create your own sources and use them when adding events. The above example could be more verbosely written like this:

$wd->actions
   ->source ("mouse", "pointer", pointerType => "mouse")
   ->source ("kbd", "key")
   ->click (0, 100, "mouse")
   ->type ("some text", "kbd")
   ->key ("{Enter}", "kbd")
   ->perform;

When you specify the event source explicitly it will switch the current "focus" for this class of device (all keyboards are in one class, all pointer-like devices such as mice/fingers/pens are in one class), so you don't have to specify the source for subsequent actions that are on the same class.

When you use the sources keyboard, mouse, touch1..touch3, pen without defining them, then a suitable default source will be created for them.

$al = new AnyEvent::WebDriver::Actions

Create a new empty action list object. More often you would use the $wd->action_list method to create one that is already associated with a given web driver.

$al = $al->source ($id, $type, key => value...)

The first time you call this with a given ID, this defines the event source using the extra parameters. Subsequent calls merely switch the current source for its event class.

It's not an error to define built-in sources (such as keyboard or touch1) differently then the defaults.

Example: define a new touch device called fatfinger.

$al->source (fatfinger => "pointer", pointerType => "touch");

Example: define a new touch device called fatfinger.

$al->source (fatfinger => "pointer", pointerType => "touch");

Example: switch default keyboard source to kbd1, assuming it is of key class.

$al->source ("kbd1");
$al = $al->pause ($duration)

Creates a pause with the given duration. Makes sure that time progresses in any case, even when $duration is 0.

$al = $al->pointer_down ($button, $source)
$al = $al->pointer_up ($button, $source)

Press or release the given button. $button defaults to 0.

$al = $al->click ($button, $source)

Convenience function that creates a button press and release action without any delay between them. $button defaults to 0.

$al = $al->doubleclick ($button, $source)

Convenience function that creates two button press and release action pairs in a row, with no unnecessary delay between them. $button defaults to 0.

$al = $al->move ($origin, $x, $y, $duration, $source)

Moves a pointer to the given position, relative to origin (either "viewport", "pointer" or an element object. The coordinates will be truncated to integer values.

$al = $al->cancel ($source)

Executes a pointer cancel action.

$al = $al->key_down ($key, $source)
$al = $al->key_up ($key, $source)

Press or release the given key.

$al = $al->key ($key, $source)

Peess and release the given key in one go, without unnecessary delay.

A special syntax, {keyname} can be used for special keys - all the special key names from the second table in section 17.4.2 of the WebDriver recommendation can be used - prefix with Shift-Space. to get the shifted version, as in Shift-

Example: press and release "a".

$al->key ("a");

Example: press and release the "Enter" key:

$al->key ("\x{e007}");

Example: press and release the "enter" key using the special key name syntax:

$al->key ("{Enter}");
$al = $al->type ($string, $source)

Convenience method to simulate a series of key press and release events for the keys in $string, one pair per extended unicode grapheme cluster. There is no syntax for special keys, everything will be typed "as-is" if possible.

$al->perform ($wd)

Finalises and compiles the list, if not done yet, and calls $wd->perform with it.

If $wd is undef, and the action list was created using the $wd->actions method, then perform it against that WebDriver object.

There is no underscore variant - call the perform_actions_ method with the action object instead.

$al->perform_release ($wd)

Exactly like perform, but additionally call release_actions afterwards.

($actions, $duration) = $al->compile

Finalises and compiles the list, if not done yet, and returns an actions object suitable for calls to $wd->perform_actions. When called in list context, additionally returns the total duration of the action list.

Since building large action lists can take nontrivial amounts of time, it can make sense to build an action list only once and then perform it multiple times.

No additional actions must be added after compiling an action list.

EVENT BASED API

This module wouldn't be a good AnyEvent citizen if it didn't have a true event-based API.

In fact, the simplified API, as documented above, is emulated via the event-based API and an AUTOLOAD function that automatically provides blocking wrappers around the callback-based API.

Every method documented in the "SIMPLIFIED API" section has an equivalent event-based method that is formed by appending a underscore (_) to the method name, and appending a callback to the argument list (mnemonic: the underscore indicates the "the action is not yet finished" after the call returns).

For example, instead of a blocking calls to new_session, navigate_to and back, you can make a callback-based ones:

my $cv = AE::cv;

$wd->new_session ({}, sub {
   my ($status, $value) = @_,

   die "error $value->{error}" if $status ne "200";

   $wd->navigate_to_ ("http://www.nethype.de", sub {

      $wd->back_ (sub {
         print "all done\n";
         $cv->send;
      });

   });
});

$cv->recv;

While the blocking methods croak on errors, the callback-based ones all pass two values to the callback, $status and $res, where $status is the HTTP status code (200 for successful requests, typically 4xx or 5xx for errors), and $res is the value of the value key in the JSON response object.

Other than that, the underscore variants and the blocking variants are identical.

LOW LEVEL API

All the simplified API methods are very thin wrappers around WebDriver commands of the same name. They are all implemented in terms of the low-level methods (req, get, post and delete), which exist in blocking and callback-based variants (req_, get_, post_ and delete_).

Examples are after the function descriptions.

$wd->req_ ($method, $uri, $body, $cb->($status, $value))
$value = $wd->req ($method, $uri, $body)

Appends the $uri to the endpoint/session/{sessionid}/ URL and makes a HTTP $method request (GET, POST etc.). POST requests can provide a UTF-8-encoded JSON text as HTTP request body, or the empty string to indicate no body is used.

For the callback version, the callback gets passed the HTTP status code (200 for every successful request), and the value of the value key in the JSON response object as second argument.

$wd->get_ ($uri, $cb->($status, $value))
$value = $wd->get ($uri)

Simply a call to req_ with $method set to GET and an empty body.

$wd->post_ ($uri, $data, $cb->($status, $value))
$value = $wd->post ($uri, $data)

Simply a call to req_ with $method set to POST - if $body is undef, then an empty object is send, otherwise, $data must be a valid request object, which gets encoded into JSON for you.

$wd->delete_ ($uri, $cb->($status, $value))
$value = $wd->delete ($uri)

Simply a call to req_ with $method set to DELETE and an empty body.

Example: implement get_all_cookies, which is a simple GET request without any parameters:

$cookies = $wd->get ("cookie");

Example: implement execute_script, which needs some parameters:

$results = $wd->post ("execute/sync" => { script => "$javascript", args => [] });

Example: call find_elements to find all IMG elements:

$elems = $wd->post (elements => { using => "css selector", value => "img" });

HISTORY

This module was unintentionally created (it started inside some quickly hacked-together script) simply because I couldn't get the existing Selenium::Remote::Driver module to work reliably, ever, despite multiple attempts over the years and trying to report multiple bugs, which have been completely ignored. It's also not event-based, so, yeah...

AUTHOR

Marc Lehmann <schmorp@schmorp.de>
http://anyevent.schmorp.de

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 1156:

You forgot a '=back' before '=head2'

Around line 1450:

Unterminated C<...> sequence