NAME
Finance::QuoteHist::Generic - Base class for retrieving historical stock quotes.
SYNOPSIS
package Finance::QuoteHist::MyFavoriteSite;
use strict;
use vars qw(@ISA);
use Finance::QuoteHist::Generic;
@ISA = qw(Finance::QuoteHist::Generic);
sub quote_urls {
# This method should return the set of URLs necessary to extract
# the quotes from this particular site given the list of symbols
# and date range provided during instantiation. See
# Finance::QuoteHist::SiliconInvestor for a basic example of how to do
# this, or Finance::QuoteHist::Yahoo for a more complicated
# example.
}
DESCRIPTION
This is the base class for retrieving historical stock quotes. It is built around LWP::UserAgent, and by default it expects the returned data to be in HTML format, in which case the quotes are gathered using HTML::TableExtract. Support for CSV (Comma Separated Value) data is included as well.
In order to actually retrieve historical stock quotes, this class should be subclassed and tailored to a particular web site. In particular, the quote_urls()
method should be overridden, and provide however many URLs are necessary to retrieve the data over a list of symbols within the given date range. Different sites have different limitations on how many quotes are returned for each query. See Finance::QuoteHist::WallStreetCity, Finance::QuoteHist::SiliconInvestor, and Finance::QuoteHist::Yahoo for some examples of how to do this.
For more complicated sites, such as Yahoo, more methods are available for overriding that deal with things such as splits and dividends.
METHODS
- new()
-
Returns a new Finance::QuoteHist::Generic object. Valid attributes are:
- start_date
- end_date
-
Specify the date range from which you would like historical quotes. These dates get parsed by the
ParseDate()
method in Date::Manip, so see Date::Manip(3) for more information on valid date strings. They are quite flexible, and include such strings as '1 year ago'. Date boundaries can also be dynamically set with methods of the same name. - symbols
-
Indicates which ticker symbols to include in the search for historical quotes. Passed either as a string (for single ticker) or an array ref for multiple tickers.
- reverse
-
Indicates whether each batch of rows from each URL provided in
quote_urls()
should be reversed from top to bottom. Some sites present historical quotes with the newest quotes on the top. Since the rows from each URL are eventually catenated, if the overall order of your rows is important you might want to pay attention to this flag. If the overall order is not that important, then ignore this flag. Typically, site-specific sub classes of this module will take care of setting this appropriately. The default is 0. - attempts
-
Sets how persistently the module tries to retrieve the quotes. There are two places this will manifest. First, if there are what appear to be network errors, this many network connections are attempted for that URL. Secondly, for quotes only, if a document was successfully retrieved, but it contained no quotes, this number of attempts are made to retrieve a document with data. Sometimes sites will report a temporary internal error via HTML, and if it is truly transitory this will usually get around it. The default is 3.
- lineup
-
Passed as an array reference (or scalar for single site), this list indicates which Finance::QuoteHist::Generic sub classes should be invoked in the event this class fails in its attempt to retrieve historical quotes. In the event of failure, the first class in this list is invoked with the same parameters as the original class, and the remaining classes are passed as the lineup to the new class. This sets up a daisy chain of redundancy in the event a particular site is hosed. See Finance::QuoteHist(3) to see an example of how this is done in a top level invocation of these modules. This list is empty by default.
- quote_precision
-
Sets the number of decimal places to which quote values are rounded. This might be of particular significance if there is auto-adjustment taking place (which is only under particular circumstances currently...see Finance::QuoteHist::Yahoo). Setting this to 0 will disable the rounding behavior, returning the quote values as they appear on the sites (assuming no auto-adjustment has taken place). The default is 4.
- env_proxy
-
When set, instructs the underlying LWP::UserAgent to load proxy configuration information from environment variables. See the
ua()
method and LWP::UserAgent for more information. - verbose
-
When set, many status messages are printed to STDERR indicating progression through URLs and lineup invocations.
- quiet
-
When set, certain failure messages are suppressed from appearing on STDERR. These messages would normally appear regardless the setting of the
verbose
flag.
The following methods are the primary user interface methods; methods of interest to developers wishing to make their own site-specific instance of this module will find information on overriding methods further below.
- quotes()
-
Retrieves historical quotes for all provided symbols over the specified date range. Depending on context, returns either a list of rows or an array reference to the same list of rows.
- dividends()
- splits()
-
If available, retrieves dividend or split information for all provided symbols over the specified date range. If there are no site-specific subclassed modules in the lineup capable of getting dividends or splits, the user will be notified on STDERR unless the quiet flag was requested during object creation.
- start_date(date_string)
- end_date(date_string)
-
Set the date boundaries of all queries. The date_string is interpreted by the Date::Manip module.
- clear_cache()
-
When results are gathered for a particular date range, whether they be via direct query or incidental extraction, they are cached. This cache is cleared by invoking this method directly, by resetting the boundary dates of the query, or by changing the
adjusted()
setting. - quote_source(ticker_symbol)
- dividend_source(ticker_symbol)
- split_source(ticker_symbol)
-
After query, these methods can be used to find out which particular subclass in the lineup fulfilled the corresponding request.
The following methods are the primary methods of interest for developers wishing to make a site-specific subclass. For simple quote retrievals, the quote_urls()
method is typically all that is necessary. For splits, dividends, and more complicated data parsing conditions beyond HTML tables, the other methods could be of interest (see the Finance::QuoteHist::Yahoo module as an example of the more complicated behavior). If a new target type is ever defined in addition to quote, split, and dividend, then corresponding methods (TARGET_urls()
, TARGET_get()
, TARGET_symbols()
) should be provided when appropriate.
- quote_urls()
-
When a site supports historical stock quote queries, this method should return the list of URLs necessary to retrieve all historical quotes from a particular site for the symbols and date ranges provided.
- dividend_urls()
-
If a site supports direct dividend queries, this method should provide the list of URLs necessary for the symbol and date range involved. Currently this is only implemented by the Yahoo subclass.
- split_urls()
-
If a site supports direct split queries, this method should provide the list of URLs necessary. Currently no sites support this type of query (splits are gathered from the regular quote output from Yahoo).
- quote_get()
- dividend_get()
- split_get()
-
All three of these methods invoke
target_get()
with the relevant target information. The analogous methods,quotes()
,dividends()
, andsplits()
, should automatically take care of finding these based on the presence of the correspondingTARGET_urls()
method. If theTARGET_urls()
method is not available, then they will look for ways to utilize theTARGET_extract()
method. - split_extract()
- dividend_extract()
-
These extraction methods are not provided by default. When present in a site-specific subclass, they are invoked on a per-row basis during direct (i.e., via URLs provided by a
TARGET_urls()
method) queries of other target types; it is passed an array reference representing a table row, and should return another array reference representing successfully extracted dividend/split information. When a successful extraction occurs, that row is filtered from the target query results. See the Yahoo subclass for an example of its use. Theoretically there could be aquote_extract()
method as well, but it is redundant at this point and therefore never used. - adjusted($boolean)
-
Return or set whether results should be returned as adjusted or non-adjusted values. Adjusted means that the quotes have been retroactively adjusted in terms of the current share price, such as for splits. The sites represented so far by site-specific subclassing all offer pre-adjusted data by default, and most offer nothing else. One significant exception is Yahoo, which provides non-adjusted quotes in HTML, but adjusted for CSV, the default mode of transmission for the Yahoo module. Pre-adjusted quote values can be requested from capable sites by providing a true value to this method. By default, adjusted values are always returned.
If non-adjusted values have been requested, and a site in the lineup that does not provide non-adjusted values ends up fulfilling the request, a warning is issued to STDERR (unless quiet was specified as a parameter to
new()
). Currently, Yahoo is the only supported site that provides non-adjusted values, but they have to be specifically requested.There are a couple of points to note that could be significant; QuoteHist will automatically notice if a quote source has an "Adj" column -- one that represents an adjusted closing value. If present, all other values, including volume, will be adjusted based on the ratio of the represented closing value and the adjusted value. This might actually occur with the Yahoo module if, for example, you request
splits()
before you requestquotes()
. The split data is only available in HTML mode; QuoteHist caches initial queries and will gather the quote information represented in the HTML. It will notice the adjusted close column, and automatically normalize the rest of the quote information. If non-adjusted data is desired, you must pass 0 to this method. The justification for this is that there will be a common expectation for quote data returned from different sites in the lineup, even if there are small deviations due to things such as Yahoo adjusting for dividends as well as splits, so there could be slight variations across sites. - ua()
-
Accessor method for the LWP::UserAgent object used to process HTTP::Request for individual URLs. This can be handy for such things as configuring proxy access for the underlying user agent. Example:
# Manual configuration $qh1->ua->proxy(['http'], 'http://proxy.sn.no:8001/'); # Load from environment variables $qh2->ua->env_proxy();
See LWP::UserAgent for more information on the capabilities of that module.
Most of the methods below are utilized during a call to target_get()
. Average subclasses will probably have little need of them, but they are included here just in case.
- target_get($target)
-
Returns an array reference to the rows that result from a particular TARGET query; this is where the network transaction and data extraction take place. It will gather the results from each URL provided in the corresponding TARGET_urls() method, perform the primary and secondary data extraction, and return the catenated results as a list. For example, the
quote_get()
method will call this method with 'quote' as the TARGET; during its execution, the methodsquote_urls()
andquote_labels()
will be invoked to tailor the quote-specific retrieval and extraction. - fetch($mode, $url, @new_request_args)
-
Returns the web page located at
$url
, using request method$mode
(i.e., GET or POST). The@new_request_args
list gets passed as arguments to the HTTP::Request::new method that handles the request at the behest of the LWP::UserAgent accessible via theua()
method. - method
-
Returns the method under which HTTP::Request objects are created for use by the LWP::UserAgent. By default, this returns 'GET'.
- has_non_adjusted($boolean)
-
Indicator method that specifies whether a particular site subclass is capable of providing non-adjusted quote values. This is assumed to be false by default; Yahoo is a significant exception.
- rows($data_string)
-
Given an data string, returns the extracted rows as either an array or array reference, depending on context. The data string is parsed based on the type of parser registered for the data type; currently this is either HTML via HTML::TableExtract, or CSV using an internal parser. If parsing HTML, the corresponding target labels are passed along to the HTML::TableExtract class. Rows falling outside of the date range specified for the object are discarded.
- parse_method($mode, $parse_sub)
-
Retrieve or set the reference or name of the parsing routine for the specified TARGET. Currently parse methods are registered by default for the 'html' and 'csv' modes.
- parse_mode($mode)
-
Retrieve or set the current parse mode.
- html_table_parser($data_string, @column_labels)
-
HTML table parser routine registered by default.
column_labels
are optional, and will default to the labels provided by theTARGET_labels()
method, where TARGET is the current target mode. - csv_parser($data_string, @column_labels)
-
CSV parser routine registered by default.
column_labels
are optional, but when present represent the labels that might appear in the beginning of the CSV data. They are reordered based on the defaultcolumn_labels
specified for HTML output. - date_in_range($date)
-
Given a date string, test whether it is within the range specified by the current start_date and end_date.
- dates($start_date, $end_date)
-
Returns a list of business days between and including the provided boundary dates. If no arguments are provided, start_date and end_date default to the currently specified date range.
DISCLAIMER
The data returned from these modules is in no way guaranteed, nor are the developers responsible in any way for how this data (or lack thereof) is used. The interface is based on URLs and page layouts that might change at any time. Even though these modules are designed to be adaptive under these circumstances, they will at some point probably be unable to retrieve data unless fixed or provided with new parameters. Furthermore, the data from these web sites is usually not even guaranteed by the web sites themselves, and oftentimes is acquired elsewhere. See the documentation for each site-specific module for more information regarding the disclaimer for that site.
Above all, play nice.
AUTHOR
Matthew P. Sisk, <sisk@mojotoad.com>
COPYRIGHT
Copyright (c) 2000-2002 Matthew P. Sisk. All rights reserved. All wrongs revenged. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
Finance::QuoteHist(3), HTML::TableExtract(3), Date::Manip(3), perlmodlib(1), perl(1).