NAME
DataWarehouse::Dimension - a data warehouse dimension meta-information
VERSION
Version 0.01
SYNOPSIS
use DataWarehouse::Dimension;
# NOTE: "table" and "attributes" were not implemented yet
my $dimension = DataWarehouse::Dimension->new(
dbh => $dbh,
name => 'product',
table => 'product',
attributes => [
{
name => 'id',
PRIMARY_KEY => 1,
},
{
name => 'product_id',
NATURAL_KEY => 1,
},
{
name => 'name',
KEEP_HISTORY => 1,
AGGREGATION_POINT => 1,
},
{
name => 'brand'
KEEP_HISTORY => 1,
AGGREGATION_POINT => 1,
},
{
name => 'price'
KEEP_HISTORY => 1,
},
],
);
DESCRIPTION
Every dimensional model is made of Facts and Dimensions.
Facts are the measurements; for example, sales.
But the facts, per se, provide incomplete information: what is that amount related to? Dimensions are give context to the facts. For instance: "sales by country" or "sales by product", or "sales by product, by day"
In relational databases, facts and dimensions will be stored in separate tables; the result is the star schema:
+-----+ +-------------+
! day !----- -----! salesperson !
+-----+ \ +-------+ / +-------------+
! sales !
+---------+ / +-------+ \ +----------+
! product !---- -----! customer !
+---------+ +----------+
PRIMARY KEY
Every dimension table must have a primary key that is different from the primary key used in the source systems.
The primary key should be meaningless; we call it a "surrogate key".
NATURAL KEYS
You should also store the "natural key", used to identify the records in the source systems: for example, product_id, customer_id, or salesperson_id. You can identify the natural keys with NATURAL_KEY => 1
.
SLOWLY CHANGING DIMENSIONS
One of the major goals of the data warehouse is to preserve history. For instance: a product could be priced at $259 in one year, and have a price drop to $189 in the next year.
You don't want to overwrite the price in the product dimension, because that would cause you to loose information -- the old price was correct in the past.
The strategy to deal with this, is to create a new record with the updated information. That's why the data warehouse must use a surrogate key and should identify the natural keys.
You can identify the attributes for which you want to preserve history with (KEEP_HISTORY => 1
).
If you don't identify an attribute with <KEEP_HISTORY
>, we'll assume that you don't want to preserve history for that information.
Finally, if you think that one attribute will change too often, you should consider storing it in a separate dimension.
AGGREGATION POINTS
Some attributes are typically used as aggregation points, in many queries.
For instance: "month" and "year" are typical aggregate points in the "day" dimension; "brand" and "type" are typical aggregation points in the "product" dimension.
When you create a dimension, you can indicate that an attribute is an aggregation point; this information may be used to generate aggregate tables.
SEE ALSO
AUTHOR
Nelson Ferraz, <nferraz at gmail.com>
BUGS
Please report any bugs or feature requests to bug-dw at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DataWarehouse. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc DataWarehouse::Dimension
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
Copyright 2010 Nelson Ferraz.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.