NAME
aie - Automatic Information Extraction
DESCRIPTION
Attempts to extract regular information from non-binary files. AIE accepts any non-binary file as input. It tries to find a repeating sequence in the file and then generalizes a regular expression to extract the information that varies within the repeating structure.
SYNOPSIS
$ aie "./Downloadable NLG systems - ACL Wiki.html"
Extracting major patterns
Length: 40136
.
........................................
Extracting most useful terms
Chose token: $VAR1 = ' class="';
Selected instance 133 of 185 $VAR1 = [ '(.*) class\\=\\"(.*)ree\\" (.*)re(.*)\\=\\"(.*)\\"\\>(.*)\\<\\/(.*)re(.*) \\<\\/p\\>\\<p\\>\\<(.*)re(.*)\\=\\"(.*)fo(.*)\\"', '(.*) class\\=\\"(.*)e\\" (.*)\\=\\"(.*)\\"\\>(.*)\\<\\/(.*)\\>\\<\\/(.*) \\<p\\>\\<(.*)re(.*)\\=\\"(.*)fo(.*)\\"', '(.*) class\\=\\"(.*)ree\\" (.*)re(.*)\\=\\"(.*)\\"\\>(.*)\\<\\/(.*) \\<\\/p\\>\\<p\\>\\<(.*)re(.*)\\=\\"(.*)fo(.*)\\"', '(.*) class\\=\\"(.*)ree\\" (.*)re(.*)\\=\\"(.*)\\"\\>(.*)\\<\\/(.*) \\<\\/p\\>\\<p\\>(.*)fo(.*)cl(.*)as(.*)la(.*)as(.*)re(.*)as(.*)re(.*)re(.*) c(.*)re(.*) \\<\\/p\\> \\<(.*)\\>', '(.*) class\\=\\"(.*)ree\\" (.*)re(.*)\\=\\"(.*)\\"\\>(.*)\\<\\/(.*) \\<\\/p\\>\\<p\\>(.*)as(.*)re(.*) c(.*)re(.*)rela(.*)as(.*)fo(.*)as(.*) c(.*)la(.*)re(.*)re(.*)\\" (.*)la(.*)as(.*)fo(.*)la(.*)re(.*)cl(.*)re(.*)\\=\\"(.*)fo(.*)\\"', '(.*) class\\=\\"(.*)e\\" (.*)\\=\\"(.*)\\"\\>(.*)\\<\\/(.*)\\>\\<\\/(.*) \\<p\\>\\<(.*)re(.*)\\=\\"(.*)fo(.*)\\"', ' class\\=\\"(.*)ree\\" (.*)re(.*)\\=\\"(.*)\\"\\>(.*)\\<\\/(.*) \\<\\/p\\>\\<p\\>(.*)fo(.*) \\<\\/p\\> \\<(.*)\\>', '(.*) class\\=\\"(.*)e\\" (.*)\\=\\"(.*)\\"\\>(.*)\\<\\/(.*)\\>\\<\\/(.*) \\<p\\>\\<(.*)re(.*)\\=\\"(.*)fo(.*)\\"' ]; $VAR1 = ' class="(.*)e" (.*)="(.*)">(.*)</(.*) <p><';
Extracted 23 records $VAR1 = [ [ 'mw-headlin', 'id', 'ASTROGEN', 'ASTROGE/span', 'h2>' ], [ 'mw-headlin', 'id', 'Chimera', 'Chimera</span>', 'h2>' ], [ 'mw-headlin', 'id', 'CRISP', 'CRIS/span', 'h2>' ],
...
AUTHOR
Andrew John Dougherty
LICENSE
GPLv3
INSTALLATION
Using cpan
:
$ cpanm Org::FRDCSA::AIE
Manual install:
$ perl Makefile.PL
$ make
$ make install
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 48:
Deleting unknown formatting code N<>
Deleting unknown formatting code P<>