NAME

Bio::Community::IO - Read and write files that describe communities

SYNOPSIS

use Bio::Community::IO;

# Read communities from a file, one by one
my $in = Bio::Community::IO->new(
   -file   => 'otu_table.qiime',
   -format => 'qiime', # format is optional
);
my $community1 = $in->next_community(); # a Bio::Community object
my $community2 = $in->next_community();
$in->close;

# Write communities in another file
my $out = Bio::Community::IO->new(
   -file   => '>new_otu_table.generic',
   -format => 'generic',
);
$out->write_community($community);
$out->close;

# Re-read communities, but all at once
$in = Bio::Community::IO->new( -file => 'new_otu_table.generic' );
my $meta = $in->next_metacommunity(); # a Bio::Community::Meta object
$in->close;

DESCRIPTION

A Bio::Community::IO object implement methods to read and write communities in formats used by popular programs such as BIOM, GAAS, QIIME, Unifrac, or as generic tab-separated tables. The format should be automatically detected though it can be manually specified. This module can also convert community member abundance between counts, absolute abundance, relative abundance and fractions.

When reading communities, the next_member() method is called by next_community(), which itself is called by next_metacommunity(). Similarly, when writing, write_member() is called by write_community(), which is called by write_metacommunity().

DRIVER IMPLEMENTATION

Bio::Community::IO provides the higher-level organisation to read and write community files, but it is the modules located in the Bio::Community::IO::Driver::* namespaces that do the low-level format-specific work.

All drivers are expected to implement specific methods, e.g. for reading:

_next_metacommunity_init()

A private hook called at the beginning of next_metacommunity() that returns the name of the metacommunity (if applicable). It also allows drivers to do an action before the metacommunity is read.

_next_community_init()

A private hook called at the beginning of next_community() that returns the name of the community. It also allows drivers to do an action before the current community is read.

next_member()

A public method that returns a Bio::Community::Member and its count in the community being read.

_next_community_finish()

A private hook called at the end of next_community(). It allows drivers to do an action after the current community has been read.

_next_metacommunity_finish()

A private hook called at the end of next_metacommunity(). It allows drivers to do an action after the metacommunity has been read.

Similarly, for a driver to write community information to a file or stream, it should implement these methods:

_write_metacommunity_init()

A private hook called at the beginning of write_metacommunity() and that accepts a Bio::Community::Meta as argument. It allows drivers to do an action before the metacommunity is written.

_write_community_init()

A private hook called at the beginning of write_community() and that accepts a Bio::Community as argument. It allows drivers to do an action before the current community is written.

write_member()

A public method that accepts as arguments a Bio::Community::Member and its count in the community being written, and processes them.

_write_community_finish()

A private hook called at the end of write_community() and that accepts a Bio::Community as argument. It allows drivers to do an action after the current community has been written.

A private hook called at the end of write_metacommunity() and that accepts a Bio::Community::Meta as argument. It allows drivers to do an action after the metacommunity has been written.

Florent Angly florent.angly@gmail.com

SUPPORT AND BUGS

User feedback is an integral part of the evolution of this and other Bioperl modules. Please direct usage questions or support issues to the mailing list, bioperl-l@bioperl.org, rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.

If you have found a bug, please report it on the BioPerl bug tracking system to help us keep track the bugs and their resolution: https://redmine.open-bio.org/projects/bioperl/

COPYRIGHT

Copyright 2011-2014 by Florent Angly <florent.angly@gmail.com>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

new

Function: Create a new Bio::Community::IO object
Usage   : # Reading a file
          my $in = Bio::Community::IO->new( -file => 'community.txt' );
          # Writing a file
          my $out = Bio::Community::IO->new( -file => '>community.txt',
                                             -format => 'generic'       );
Args    : -file : Path of a community file. See file() in Bio::Root::IO.
          -format : Format of the file, either 'generic', 'biom', 'gaas',
              'qiime' or 'unifrac'. This is optional when reading a community
              file because the format is automatically detected by the
              Bio::Community::IO::FormatGuesser module. See also format() in
              Bio::Root::IO.
          -weight_files : Arrayref of files (or filehandles) that contain
              weights to assign to members. See weight_files().
          -weight_assign : When using files of weights, define what to do for
              community members that do not have weights. See weight_assign().
          -taxonomy: Given a Bio::DB::Taxonomy object, try to place the community
              members in this taxonomy. See taxonomy().
          -skip_empty_communities: Skip communities with no members. See
              skip_empty_communities()
          See the documentation for _initialize_io() in Bio::Root::IO for other
          accepted constructors like -fh, -string, -input, or -url.
Returns : A Bio::Community::IO object

next_member

Usage   : my ($member, $count) = $in->next_member;
Function: Get the next member from the community and its abundance. This
          function is implemented by the Bio::Community::IO::Driver used to
          parse the given file format.
Args    : None
Returns : An array containing:
            A Bio::Community::Member object (or undef)
            A positive number (or undef)

next_community

Usage   : my $community = $in->next_community;
Function: Get the next community. Note that communities without members are
          skipped.
Args    : None
Returns : A Bio::Community object
            or
          undef if there were no communities left

next_metacommunity

Usage   : my $meta = $in->next_metacommunity;
Function: Get the next metacommunity. It may contain one or several communities
          depending on the format of the file read,
Args    : None
Returns : A Bio::Community::Meta object
            or
          undef after the metacommunity has been read

write_member

Usage   : $out->write_member($member, $abundance);
Function: Write the next member from the community and its count or relative
          abundance. This function is implemented by a Bio::Community::IO::Driver
          specific to the given file format.
Args    : A Bio::Community::Member object
          A positive number
Returns : 1 for success

write_community

Usage   : $out->write_community($community);
Function: Write the next community.
Args    : A Bio::Community object
Returns : 1 for success

write_metacommunity

Usage   : $out->write_metacommunity($meta);
Function: Write a metacommunity.
Args    : A Bio::Community::Meta object
Returns : 1 for success

skip_empty_communities

Usage   : $in->skip_empty_communities;
Function: Get or set whether empty communities (with no members) should be
          read/written or skipped.
Args    : 0 or 1
Returns : 0 or 1

sort_members

Usage   : $in->sort_members();
Function: When writing a community to a file, sort the community members based
          on their abundance: 0 (off), 1 (by increasing abundance), -1 (by 
          decreasing abundance). The default is specific to each driver used.
Args    : 0, 1 or -1
Returns : 0, 1 or -1

abundance_type

Usage   : $in->abundance_type();
Function: When writing a community to a file, report member abundance in one
          of four possible representations:
           * count     : observed count
           * absolute  : absolute abundance
           * percentage: relative abundance, in percent (0-100%)
           * fraction  : relative abundance, as a fractional number (0-1)
          The default is specific to each driver
Args    : count, absolute, percentage or fraction
Returns : count, absolute, percentage or fraction

missing_string

Usage   : $in->missing_string();
Function: When writing a community to a file, specify what abundance string to
          use for members that are not present in the community. The default is
          specific to each driver used.
Args    : string e.g. '', '0', 'n/a', '-'
Returns : string

multiple_communities

Usage   : $in->multiple_communities();
Function: Return whether or not the file format can represent multiple
          communities in a single file.
Args    : 0 or 1
Returns : 0 or 1

explicit_ids

Usage   : $in->explicit_ids();
Function: Return whether or not the file format explicitly records member IDs.
Args    : 0 or 1
Returns : 0 or 1

weight_files

Usage   : $in->weight_files();
Function: When reading a community, specify files (or filehandles opened in
          read mode) containing weights to assign to the community members.
          Each file can contain a different type of weight to add. The file
          should contain at least two tab-delimited columns: the first one
          should contain the ID, description or string lineage of the member
          and the second one the weight to assign to this member. Other columns
          are ignored. A tab-delimited header line starting with '#' and
          containing the name of the weight can be included.
Args    : arrayref of file names (or filehandles)
Returns : arrayref of filehandles

weight_names

Usage   : $in->weight_names();
Function: After weight files have been read, you can get the name of the
          weights using this method. You can also set them manually.
Args    : arrayref of weight names
Returns : arrayref of weight names

weight_identifier

Usage   : $in->weight_identifier('id');
Function: Get or set whether to lookup and assign weights to community members
          based on the member description or their ID.
Args    : 'desc' (default), or 'id'
Returns : 'desc' or 'id'

weight_assign

Usage   : $in->weight_assign();
Function: When using weights, specify what value to assign to the members for
          which no weight is found in the provided weight file:
           * $num : Check the member description against each file of weights.
                If no weight is found in a file, assign the arbitrary weight
                provided as argument to the member.
           * file_average : Check the member description against each file of
                weights. If no weight is found in a file, assign the average
                weight in this file to the member.
           * community_average : Check the member description against each file
                of weights. If no weight is found in a file, the weight given
                to the member is the average weight of all the other members in
                in this community. If none of the community members have
                weights, the weight assignment method defaults to 'file_average'
                for this community. Note that because the assigned weight is
                the average weight in this community, this means that the same
                members will have different weights in different communities.
                Note also that the processing of members with no explicit
                weights can only be done after all other members have been
                added and is effective only if the community is built using the
                next_community() method.
           * ancestor : Provided the member have a taxonomic assignment, check
                the taxonomic lineage of this member against each file of
                weights. When no weight is found for this taxonomic lineage in
                a weight file, go up the taxonomic lineage of the member and
                assign to it the weight of the first ancestor that has a
                weight in the weights file. Fall back to the 'community_average'
                method if no taxonomic information is available for this member
                (for example a member with no BLAST hit), or if none of the
                ancestors have a specified weight.
Args    : 'file_average', 'community_average', 'ancestor' or a number
Returns : 'file_average', 'community_average', 'ancestor' or a number

_attach_weights

Usage   : $in->_attach_weights($member);
Function: Once a member has been created, a driver should call this method
          to attach the proper weights (read from the user-provided weight
          files) to a member. If no member is provided, this method will not
          complain and will do nothing.
Args    : a Bio::Community::Member or nothing
Returns : 1 for success

taxonomy

Usage   : $in->taxonomy();
Function: When reading communities, try to place the community members on the
          provided taxonomy (provided taxonomic assignments are specified in
          the input. Make sure that you use the same taxonomy as in the
          community file to ensure that members are placed.
          
          As an alternative to using a full-fledged taxonomy, if you provide a
          Bio::DB::Taxonomy::list object containing no taxa, the taxonomy will
          be constructed on the fly from the taxonomic information provided in
          the community file. The advantages are that you build an arbitrary
          taxonomy, and this taxonomy contains only the taxa present in your
          samples, which is fast and memory efficient. A drawback is that
          unfortunately, you can only do this with community file formats that
          report full lineages (e.g. the qiime and generic formats).

          A basic curation is done on the taxonomy strings, so that a GreenGenes
          lineage such as:
             k__Archaea;p__Euryarchaeota;c__Thermoplasmata;o__E2;f__Marine group II;g__;s__
          becomes:
             k__Archaea;p__Euryarchaeota;c__Thermoplasmata;o__E2;f__Marine group II
          Or a Silva lineage such as:
             Bacteria; Cyanobacteria; Chloroplast; uncultured; Other; Other
          becomes:
             Bacteria; Cyanobacteria; Chloroplast; uncultured

Args    : Bio::DB::Taxonomy
Returns : Bio::DB::Taxonomy

_attach_taxon

Usage   : $in->_attach_taxon($member, $taxonomy_string);
Function: Once a member has been created, a driver should call this method
          to attach the proper taxon object to the member. If no member is
          provided, this method will not complain and will do nothing.
Args    : * a Bio::Community::Member or nothing
          * the taxonomic string
          * whether the taxonomic string is a taxon name (1) or taxon ID (0)
Returns : 1 for success

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 120:

=back doesn't take any parameters, but you said =back _write_metacommunity_finish()

Around line 126:

=back doesn't take any parameters, but you said =back =head1 AUTHOR

=back without =over