NAME
Lingua::FR::Ladl::Table - An object representing a Ladl Table
VERSION
This document describes Lingua::FR::Ladl::Table version 0.0.1
SYNOPSIS
use Lingua::FR::Ladl::Table;
my $table = Lingua::FR::Ladl::Table->new({ name => $table_ref->{name} });
# load table data from an excel file:
$table->load({ format => 'xls', file => '1.xls' });
# load table data from a gnumeric xml file:
$table->load({ format => 'xml', file => '1.xml' });
$table->set_name('1');
my $name = $table->get_name();
my $verbCol = $table->get_verb_column(); # which column contains the verb
my $col = $table->get_col_for_header('aux =: avoir'); # which column's header is 'aux =: avoir'?
my $header = $table->get_header_for_col(4); # what is the column header of column 4?
my $dbh = $table->create_db_table( { col_names => 'col_numbers' } ); # get a db handle with column numbers as column names
# Query the table using SQL::Statement: for which verbs is column 8 empty and column 19 = '+'?
my $query = "SELECT col_$verbCol FROM table_$name where col_8 = NULL AND col_19 = '+'";
my $sth = $dbh->prepare($query);
$sth->execute();
DESCRIPTION
This module provides a data structure representing a Ladl table. The Ladl tables are the digitized representation of Maurice Gross's Grammar Lexicon, a very large scale, high precision, French linguistic resource, developed over several years by a group of skilled linguists and according to well defined linguistic criteria. The grammar lexicon describes syntactic and semantic properties of French (basic) sentences.
A table gathers together predicative items (verbs in this case) with comparable syntactico-semantic behaviour.
In a table, columns further specify the syntactico-semantic properties of each verb in that table.
Example
N0 =: Nhum | N0 =: Nnc | 1 | aux =: avoir | aux =: être | N0 est Upp W | N0 U | N1 =: Qu P | N1 =: Qu Psubj | Tp = Tc | Tc =: passé | Tc =: présent | Tc =: futur | Vc =: devoir | Vc =: pouvoir | Vc =: savoir | V-inf0 W = Ppv | N0 U Prép N1 | N0 U Prép Nhum | N0 U Prép N-hum | Prép N1 = Ppv | N0 U dans N1 | N0 U N1 | N0 U Nhum | N0 U N-hum | |||
+ | - | <E> | achever | + | - | - | - | de | - | - | + | - | - | - | - | - | - | - | - | - | - | - | - | + | + | + | Max achève de peindre le mur |
+ | + | <E> | aller | - | - | - | - | <E> | - | - | - | - | - | - | + | + | + | - | - | - | - | - | - | - | - | - | Max va partir |
+ | - | <E> | aller | - | + | - | - | jusqu'à | - | - | + | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | La pluie va tomber |
+ | + | ne | aller Nég | - | + | - | - | sans | - | + | + | - | - | - | + | + | + | - | + | - | + | - | - | - | - | - | Cette mesure n'ira pas sans créer des troubles |
The tables are available as a set of excel spreadsheets from http://ladl.univ-mlv.fr/
This module represents a table as a Ladl::Table object and allows to investigate and query:
- what verbs belong to the table.
- what are the headers of the table?
- which column corresponds to which header.
- which verb corresponds to which row(s).
- what is the value of a column in a given row.
- what is the value in a given row for a given header.
It is also possible to formulate more complex queries using the SQL dialect implemented in SQL::Statement (see SQL::Statement).
INTERFACE
Methods
For all following methods, column and row numbering starts at 0.
- new - build a new Table object
-
my $table = Lingua::FR::Ladl::Table->new({ name => 'test_table' });
There's one optional initial argument: name, the table's name
- set/get_name
- load
-
$table->load( {format => 'xls', file=>'file_name'} ) $table->load( {format => 'xml', file=>'file_name'} )
Load table data from a file in the given format. Format may be one of:
- xls - file is an excel file
-
in this case Spreadsheet::Parser is used to parse the file.
- xml - an xml file in gnumeric xml format
-
the file is parsed using XML::LibXML
The file name is also set to a value inferred from the file name by removing the suffix. The file name is important because the get_verb_column method relies on the correct file name.
- get/set_maxCol, get/set_maxRow
-
get/set maximum row or column value
- get_headers
-
Return a hash with the column headers as keys and the corresponding column numbers as values. When there's no header, col_column number is used as key.
- get_value_at($row, $col)
- get_col_for_header( $header )
-
return column number for a given column header, undef if $header doesn't match. For the table in the example:
$table->get_col_for_header('aux =: avoir') returns 4
- get_header_for_col($col)
-
return the header for a given column.
Example:
$table->get_header_for_col(4) returns 'aux =: avoir'
- get_verb_column
-
return the column (by number) containing the verb. The verb column is assumed to be the column the header of which is equal to the table name.
For the table in the example
$table->get_verb_column(); returns 3
- get_verbs
-
return the list of verbs of the table (as an array).
- get_particle_column
-
The particle column contains entries as ne, n', se, s', occuring in front of the verb. We assume it to be the column right before the verb column.
Example:
$table->get_particle_column() returns 2
- get_example_column
-
The example column contains example phrases with the verb of the row. We assume it's the last column of the table.
For the table in the example above:
my $col = $table->get_example_column()
would set $col to 28
- get_column_types
-
Columns may either contain text or one of '+', '-' and '~'. The method returns a reference to a hash with the column numbers as keys and assigning to the columns either 'text' if they have text content or else '+-~'.
For the table in the example:
$table->get_column_types()
returns the hash:
{ '0' => '+-~', '1' => '+-~', '2' => 'text', '3' => 'text', '4' => '+-~', '5' => '+-~' '6' => '+-~', '7' => '+-~', '8' => 'text', '9' => '+-~', '10' => '+-~', '11' => '+-~', '12' => '+-~', '13' => '+-~', '14' => '+-~', '15' => '+-~', '16' => '+-~', '17' => '+-~', '18' => '+-~', '19' => '+-~', '20' => '+-~', '21' => '+-~', '22' => '+-~', '23' => '+-~', '24' => '+-~', '25' => '+-~', '26' => '+-~', '27' => 'text', };
- get_column_type_for_col
-
Return the column type for a given column. The column type is either +-~ if the columns contains only one of `+', `-' or `~', or text if the column contains some other text content.
Throws an exception when column is inexistant.
Example:
$table->get_column_type_for_col(2) returns `text';
- is_tilda_row($row)
-
A row is a tilda row if all the columns of type '+-~' are '~' - i.e. they contain no specific information about this verb.
- get_verb_for_row($row)
-
Return the verb for a given row. For the the table in the example above:
my $verb = $table->get_verb_for_row(3) returns `aller'
- get_rows_for_verb($verb)
-
Returns the rows the verb occurs in (there may be more than 1). Example:
my @rows = $table->get_rows_for_verb('devoir'); @rows is (27, 28, 29)
- is_column_set($row, $col)
-
A column may be of type text or +-~.
- a text column of a row is set if it's different from the `Empty mark', which by default is <E>.
- a +-~ column of a row is set if it's +.
The empty_string_mark can be set via the "Parameters" accessors.
- has_verb($verb)
-
Returns true if the verb is contained in the table.
- has_verb_matching($regexp)
-
Returns true if a verb of the table matches $regexp.
- create_db_table( { col_names => 'col_numbers' } )
-
Provides a DB interface using DBI and returns a db handle to an in-memory table created using DBD::AnyData. The table name is table_$table_name. The column names are either
- col_$col_numbers,
-
when the argument is { col_names => 'col_numbers' } (the default).
- The column headers
-
when another argument is given. When the header is empty col_column_number is used.
Example:
# get a db handle with columns named col_<column number> # default: { col_names => 'col_numbers' } my $dbh = $table->create_db_table(); # get a db handle using the column headers as column names $dbh = $table->create_db_table( { col_names => 'headers' } );
Once you have a db handle you can start querying the table using SQL::Statement (see SQL::Statements and DBD::AnyData for which SQL statements are supported).
Example:
my $query = "SELECT col_$verbCol FROM table_$name where col_8 = NULL AND col_19 = '+'"; my $sth = $dbh->prepare($query); $sth->execute();
Note: The empty string marks (`<E>' by default) are replaced by empty strings, equivalent to NULL.
Parameters
The class is parametrized by a Parametrizer (see Lingua::FR::Ladl::Parametrizer) object, which can be accessed by the get/set_parameters method. A parametrizer object provides accessors for its customization items. Currently the most important item is the empty_string_mark which defaults to `<E>'. You could change the empty_string_mark like so:
my $par_object = $table->get_parameters();
$par_object->set_empty_string_mark('EMPTY');
$table->set_parameters($par_object);
DIAGNOSTICS
Format must be one of xls, xml not $format
-
Exception thrown when trying to load the table from a format that is not supported currently.
The only supported formats are:
- xls
-
excel table
- xml
-
gnumeric xml format
Could not create file parser context for file "unknown"
-
Thrown by LibXML, the xml parser: Ladl::Table wants to load table data by parsing an xml file, but the xml parser throws an exception. Maybe the file is not accessible?
Couldn't load table data: error parsing file
-
Ladl::Table wants to load table data by parsing an excel file, but Spreadsheet::ParseExcel returned invalid data. Maybe the file is not accessible?
Need table data for table_name, maybe you should call the load method first?
-
Most methods only work and make sense if table data is loaded.
col/row must be less or equal max_row/max_col
-
Method was called with an invalid row/column respectively.
CONFIGURATION AND ENVIRONMENT
Lingua::FR::Ladl::Table requires no configuration files or environment variables.
DEPENDENCIES
- Class::Std
- Readonly
- List::Util
- List::MoreUtils
- XML::LibXML
-
if you want to load table data from a gnumeric XML file.
- Spreadsheet::ParseExcel
-
if you want to load table data from an excel file.
- DBI and DBD::AnyData
-
if you want to use a DB interface.
INCOMPATIBILITIES
None reported.
BUGS AND LIMITATIONS
No bugs have been reported.
Please report any bugs or feature requests to bug-lingua-fr-ladl-table@rt.cpan.org
, or through the web interface at http://rt.cpan.org.
SEE ALSO
http://ladl.univ-mlv.fr/, where the Ladl tables have been developed and where they can be obtained.
Some publications on this project:
- Maurice Gross' grammar lexicon and Natural Language Processing
-
by Claire Gardent, Bruno Guillaume, Guy Perrier, Ingrid Falk
- Extracting subcategorisation information from Maurice Gross' grammar lexicon,
-
by Claire Gardent, Bruno Guillaume, Guy Perrier, Ingrid Falk in Archives of Control Sciences (2005) 289--300
- A talk at the French Perl Workshop 2006 (in French ;-)
-
http://conferences.mongueurs.net/fpw2006/slides/lexique-syntaxique.pdf
AUTHOR
Ingrid Falk <ingrid dot falk at loria dot fr>
LICENCE AND COPYRIGHT
Copyright (c) 2007, Ingrid Falk <ingrid dot falk at loria dot fr>
. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.