NAME
WebService::ReutersConnect - Use the ReutersConnect Live News API
VERSION
Version 0.05
INSTALLATION
Debian based
This module depends only on debian distributed packages. If you're using a debian based system, do
$ sudo apt-get install perl-modules libtest-fatal-perl perl-base libdbd-sqlite3-perl libdbix-class-perl libdatetime-perl \
libdatetime-format-iso8601-perl libdevel-repl-perl libfile-sharedir-perl libwww-perl liblog-log4perl-perl libmoose-perl \
libterm-readkey-perl liburi-perl libxml-libxml-perl
$ sudo cpan -i WebService::ReutersConnect ## or anything you like.
Other OSs
Use your favorite Perl package installation method.
$ sudo cpan -i WebService::ReutersConnect ## Should do the job on *NIX systems
SYNOPSIS
This module allows access to Reuters Connect APIs as described here:
It is based on the REST APIs.
You WILL have to contact reuters to get yourself some API credentials if you want to use this module. This is out of scope of this distribution. However, some demo credentials are supplied by this module for your convenience.
By the way, those demo credential change from time to time, so have a look at http://reutersconnect.com/docs/Demo_Login_Page if you get authentication errors.
For your convenience, this module will try scraping the demo credentials from this page if you don't feel like looking at it yourself :)
Shell
This module provides a 'reutersconnect' shell so you can interactively play with the API.
Example:
$ reutersconnect
2013/03/20 16:45:05 Will try to use the demo account. Use '/usr/local/bin/reutersconnect -u <username>' to login as a specific user
2013/03/20 16:45:05 No username/password given. Trying to scrape the demo ones
2013/03/20 16:45:07 Found 'demo.user/vYkLo4Lv' credentials
2013/03/20 16:45:08 Granted access to 6 channels
2013/03/20 16:45:08 Starting shell. ReutersConnect object is '$rc'
demo.user@reutersconnect.com> map{ $_->alias().' '.$_->description()."\n" } $rc->channels()
FES376 US Online Report Top News
QTZ240 NVO
STK567 Reuters World Service
mkc191 Unique-Product-For-User-26440
txb889 Unique-Product-For-Account-26439
xHO143 Italy Picture Service
demo.user@reutersconnect.com> [CTRL-D] to quit
See the rest of this module doc and WebService::ReutersConnect::Channel and WebService::ReutersConnect::Item for a detailed API description.
Perl
Example:
use WebService::ReutersConnect qw/:demo/;
my $reuters = WebService::ReutersConnect->new({ username => REUTERS_DEMOUSER,
password => REUTERS_DEMOPASSWORD });
my @channels = $reuters->channels();
my @items = $reuters->items( $channels[0] );
my $full_xml_doc = $reuters->fetch_item_xdoc({ item => $items[0] });
Additionally, a very basic demo page scraping mechanism is provided, so you can build an API object without any credential at all if you feel lucky:
my $reuters = WebService::ReutersConnect->new();
my @channels = $reuters->channels();
EXAMPLES
Here are some example of usage to get you started quickly:
Fetch the last news about britain from all your channels
my $res = $reuters->search({ q => 'headline:britain' ,
sort => 'date'
});
say("Size: ".$res->size());
say("Num Found: ".$res->num_found());
say("Start: ".$res->start());
foreach my $item ( @{ $res->items() } ){
say($item->headline());
}
Fetch the last 5 pictures accross all your channels
my @items = $reuters->search({ limit => 5 , media_types => [ 'P' ] });
foreach my $item ( @items ){
print "\n".$item->date_created().' : '.$item->headline()."\n\n";
print " CLICK: ".$item->preview_url()."\n\n";
}
Get the freshest version of the rich NewsML-G2 document about a news item:
my $xdoc = $reuters->item({ guid => $item->guid() , channel => $item->channel_alias() });
say $xdoc->asString(); ## That will help you :)
my ($body_node) = $xc->findnodes('//x:html/x:body'); ## Find the HTML content (in case of article).
say $body_node->toString(1); ## Print the whole html.
## You can also print only the content of the body:
my @body_parts = $xdoc->get_html_body();
map { say $_->toString(1) } @body_parts;
## Find the subjects:
my @subjects = $xdoc->get_subjects();
foreach my $subject ( @subjects ){
say "This is about: ".$subject->name_main();
}
AUTHENTICATION
If you supply a ReutersConnect username and password, this module will fetch an authentication token from the service and use it in all subsequent requests.
The basic usage involves giving some classical username and password as demonstrated in the synopsys section.
You can access the authentication token: $this->authToken() for diagnostic and external storage.
You can also build an instance of this module using an authentication token that you stored somewhere:
my $reuters = WebService::ReutersConnect->new( { authToken => $authToken } );
Beware that ReutersConnect authentication tokens are only valid for 24 hours. It is advised to effectively renew the authentication token more often to avoid any expiration issue. For instance every 12 hours.
This module does NOT contain any mecanism to renew authentication tokens at regular intervals. If you keep long standing instances of this module, it's your responsability to renew them regularly enough.
However, for very simple cases, where there's no concurrent access to the token storage, or when you have only one longstanding instance, the options refresh_token and after_refresh_token can be useful.
DEMO AUTHENTICATION
Reuters provides a demo account so you can try out this API without holding an account with them. The demo credentials live on this page http://reutersconnect.com/docs/Demo_Login_Page
They do change every month, but this module provides a very basic method to scrape them if no username/password is given in the constructor. See SYNOPSIS section.
LOGGING & DEBUGGING
This module uses Log::Log4perl and is automatically initialized to the ERROR level. Feel free to initialize Log::Log4perl to your taste in your application.
Additionally, there's is the debug option that will output very verbose (HTTP traffic) at the INFO level.
ATTRIBUTES
Most attributes are read only and have a default value. Set them at construction time if necessary.
entry_point
Get/Set the ReutersConnect entry URL. Default should work.
login_entry_point
Get/Set the ReutersConnect login entry URL. Default should work.
username
ReutersConnect API username.
password
ReutersConnect API password.
head2 authToken
ReutersConnect authentication token. If not set, this will try to get a new one using the username/password
refresh_token
Option. When true, the module will attempt ONCE fetching a fresh authentication token. from ReutersConnect in case the token held is invalid or expired.
Of course, turning that on only makes sense if you give the username and password at instanciation time.
If you want to be notified of the new token in your client code, you can register a callback:
after_refresh_token
This is a callback called after this module has fetched a new authentication token from ReutersConnect. It's normally used in combination with refresh_token.
Usage:
my $reuters = WebService::ReutersConnect->new({ username => ...,
password => ....,
on_refresh_token => sub{
my ($new_token) = @_;
## Store new token somewhere
}
});
user_agent
A LWP::UserAgent. There's a default instance but feel free to replace it with your application one.
debug
Swicthes on/off extra debugging (Specially HTTP Requests and Responses).
date_created
DateTime At which this instance was created.
METHODS
scrape_demo_credentials
Quick and very dirty method to scrape demo credentials from http://reutersconnect.com/docs/Demo_Login_Page This is used automatically when no credential at all are provided in the constructor. You shouldn't have to use that yourself. Returns 1 for success, 0 for failure.
Usage:
unless( $this->scrape_demo_credentials() ){
## Woopsy
}
channels
Alias for fetch_channels
items
Alias for fetch_items
packages
Alias for fetch_packages
search
Alias for fetch_search
olr
Alias for fetch_olr
item
Alias for fetch_item_xdoc.
fetch_channels
Fetch the WebService::ReutersConnect::Channel's according to the given options (or not).
Usage:
my @all_channels = $this->fetch_channels();
my ( $channel ) = $this->fetch_channels({ channel => [ '56HD' ] });
## Filter on channel alias(s)
my @specific_channels = $this->fetch_channels({ channel => [ '567', '7654' ,... ] });
## Filter on channel Category(s) ID(s)
my @channels = $this->fetch_channels({ channelCategory => [ 'JDJD' , 'JDJD' ] });
fetch_items
Fetch WebService::ReutersConnect::Item news item from Reuters Connect. This is the core method. You MUST specify ONE channel (Get the list using the fetch_channels method). You can give indiferently a channel or a channel alias.
This method returns REAL TIME items.
Usage:
my @items = $this->fetch_items($channel->alias, { %options });
Options:
media_types: An Array of media types to compose from the following options: T (text), P (pictures), V (video), G (graphics) and C (composite)
date_from: YYYY-MM-DD or DateTime object. Defaults to now - 24h. This is INCLUSIVE
date_to: idem but cannot be specified without date_from. Defaults to now. Not that this date is NOT INCLUSIVE
limit: Number of items to fetch. Default to $this->default_limit()
sort: Sort by 'date' (newest first) or by 'score' (more relevant first).
fetch_search
Search for WebService::ReutersConnect::Item's in all Reuters news (from the channels you have access to).
Items found through this method can suffer from a slight delay compared to the live 'items' method.
Options:
q: Free Text Style query string. See search method in http://reutersconnect.com/files/Reuters_Connect_Web_Services_Developer_Guide.pdf
for an extended specification
channels : An Array Ref of restriction Channels (Or channel Aliases)
categories : An Array Ref of restriction Categories (Or catecogy IDs)
media_types: An Array of media types to compose from the following options: T (text), P (pictures), V (video), G (graphics) and C (composite)
limit: Number of items to fetch. Default to $this->default_limit()
sort: Sort by 'date' (newest first) or by 'score' (more relevant first).
Usage:
my @items = $this->fetch_search();
## Only videos
my @items = $this->fetch_search({ media_types => [ 'V' ] });
## Only pictures or videos about Britney Spears
my @items = $this->fetch_search({ q => 'britney spears' , media_types => [ 'P' , 'V' ] });
## Additionally, if you want a L<WebService::ReutersConnect::ResultSet>, use the scalar version of this method:
my $res = $this->fetch_search({ media_types => [ 'V' ] });
print $res->num_found().' results in total';
print $res->size().' results effectively returned (because of limit)';
print $res->start().' offset in the total result space';
my @items = @{$res->items()};
fetch_olr
Fetches OnLine Reports (SNI, NEPs, SNEPs, .. ) from the Channels you have access to, You can optionally filter by channel(s).
Options:
channels: An array ref of channel restriction.
fetch_packages
Fetches the edited NEPs (News Event Package) from a specific Reuters Channel. NEPs comes as WebService::ReutersConnect::Item's with added 'main links' sub items and 'supplemental links' sub items. You can view them as editorially put together news items.
Options:
use_snep: Use editor Super NEPs. Defaults to false (just returns the latest ones).
limit: Fetch a limited number of NEPs, defaults to $this->default_limit()
Usage:
my @items = $reuters->fetch_packages( $channel );
my @items = $reuters->fetch_packages( $channel->alias() , { options .. } );
fetch_package
Fetches a richer version of some specific NEPs (News Event Package). Despite the name of this method, you can actually specify multiple NEPs:
Usage:
my @nep_items = $this->fetch_package($channel->alias(), [ $item1->id() , $item2->id() ] );
fetch_item_xdoc
Fetches one WebService::ReutersConnect::XMLDocument from Reuters, given the Item or the item ID.
This document is a NewsMLG2 document as specified here: http://reutersconnect.com/files/NewsML-G2_Quick_Reference_Guide.pdf
You can view a NewMLG2 document as a 'full view' of a simple WebService::ReutersConnect::Item (Simple News Item).
Implementing a full NewsMSG2 Object from such a document is out of the scope of this module. HOWEVER, for your convenience and enjoyement, the returned object comes with an already instantiated XML::LibXML::XPathContext object on which you can query things of interest.
You are also strongly encouraged to read the 'item' method section of http://reutersconnect.com/files/Reuters_Connect_Web_Services_Developer_Guide.pdf.
Options:
item: An item
OR
guid: GUI of an ITEM
channel: Combined with guid to get the freshest version of the news item.
OR
item_id: The the specific version of the Item by item ID.
------
company_markup: 0 or 1 (default 0). If set, will markup the content with company name. See Reuters documentation.
Usage:
my $xml_doc = $this->fetch_item_xdoc({ guid => $item->guid() , channel => $item->channel()->alias() });
my $xml_doc = $this->fetch_item_xdoc({ item_id => $item->id() });
my $xml_doc = $this->fetch_item_xdoc( { item => $item_object } );
print $xml_doc->toString(); ## Print the whole document.
print $xml_doc->xml_xpath->findvalue('//rcx:description'); ## The default namespace for xpath is 'rcx'
print $xml_doc->xml_xpath->findvalue('//rcx:headline');
my ($body_node) = $xc->findnodes('//x:html/x:body'); ## Find the HTML content (in case of article).
print $body_node->toString(); ## Print the whole html.
REUTERS_DEMOUSER
Returns the username for the demo account. This is exportable:
use WebService::ReutersConnect qw/:demo/;
print REUTERS_DEMOUSER;
REUTERS_DEMOPASSWORD
Returns the password for the demo account. This is exportable:
use WebService::ReutersConnect qw/:demo/;
print REUTERS_DEMOPASSWORD;
AUTHOR
Jerome Eteve, <jerome at eteve.net>
KNOWN ISSUES
This module is known to be correct, but not to be complete.
Some ReutersConnect method options and some objects properties might not be implemented.
Also, it lacks the preference methods and the OpenCalais method of the ReutersConnect API (for now).
Please file any feature you might be missing in the issue tracking system. See BUGS section.
BUGS
Please report any bugs or feature requests to bug-webservice-reutersconnect at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WebService-ReutersConnect. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc WebService::ReutersConnect
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
http://rt.cpan.org/NoAuth/Bugs.html?Dist=WebService-ReutersConnect
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
Thanks to C. Gevrey from Reuters for his guidance and inspiration in writing this module.
LICENSE AND COPYRIGHT
Copyright 2012 Jerome Eteve.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.