NAME
AI::Classifier::Text::Analyzer - computing feature vectors from documents
VERSION
version 0.03
SYNOPSIS
use AI::Classifier::Text::Analyzer;
my $analyzer = AI::Classifier::Text::Analyzer->new();
my $features = $analyzer->analyze( 'aaaa http://www.example.com/bbb?xx=yy&bb=cc;dd=ff' );
DESCRIPTION
Computes feature vectors of text using some heuristics and adds words count (using Text::WordCounter by default).
The object is immutable - but some methods use a second parameter as an accumulator for the features found in given text.
It uses some specific values and methods that work for our case - but are not guaranteed to bring good results universally - see the source for details!
ATTRIBUTES
word_counter
-
Object with a word_count method that will calculate the frequency of words in a text document. By default Text::WordCounter.
global_feature_weight
-
The weight assigned for computed features of the text document. By default 2.
METHODS
new(word_counter => $foo, global_feature_weight => 3)
-
Creates a new AI::Classifier::Text::Analyzer object. Both arguments are optional.
analyze($document, $features)
-
Computes the feature vector of the given document and adds the initial vector of
$features
. analyze_urls($document, $features)
-
Computes a vector special url related features of a given text - currently there are used
NO_URLS
,MANY_URLS
andREPEATED_URLS
features. filter($document)
-
Removes html related parts from the text.
SEE ALSO
AI::NaiveBayes (3), AI::Classifier::Text(3)
AUTHOR
Zbigniew Lukasiak <zlukasiak@opera.com>, Tadeusz Sośnierz <tsosnierz@opera.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Opera Software ASA.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 146:
Non-ASCII character seen before =encoding in 'Sośnierz'. Assuming UTF-8