NAME
Bio::MUST::Core::Ali::Temporary - Thin wrapper for a temporary mapped Ali written on disk
VERSION
version 0.243430
SYNOPSIS
#!/usr/bin/env perl
use Modern::Perl '2011';
# same as:
# use strict;
# use warnings;
# use feature qw(say);
use Bio::MUST::Core;
use aliased 'Bio::MUST::Core::Ali::Temporary';
# build Ali::Temporary object from existing ALI file
my $temp_db = Temporary->new( seqs => 'database.ali' );
# get properties
my $db = $temp_db->filename;
my $dbtype = $temp_db->type;
# pass it to external program
system("makeblastdb -in $db -dbtype $dbtype");
# alternative constructor call
# build Ali::Temporary object from existing Ali object
use aliased 'Bio::MUST::Core::Ali';
my $ali = Ali->load('queries.ali');
my $temp_qu = Temporary->new( seqs => $ali );
# pass it to external program
use File::Temp;
my $query = $temp_qu->filename;
my $out = File::Temp->new( UNLINK => 0, SUFFIX => '.blastp' );
system("blastp -query $query -db $db -out $out");
say "report: $out";
# later... when parsing the BLAST report
# let's say $id is a BLAST hit in database.ali
my $id = 'seq2';
my $long_id = $temp_db->long_id_for($id);
say "hit id: $long_id";
# ...
# more alternative constructor calls
# build Ali::Temporary object from list of Seq objects
my @seqs = $ali->filter_seqs( sub { $_->seq_len >= 500 } );
my $temp_ls = Temporary->new( seqs => \@seqs );
# build Ali::Temporary object preserving gaps in Seq objects
# (and persistent associated FASTA file)
my $temp_gp = Temporary->new(
seqs => \@seqs,
args => { degap => 0, persistent => 1 }
);
my $filename = $temp_gp->filename;
# later...
unlink $filename;
DESCRIPTION
This module implements a class representing a temporary FASTA file where sequence ids are automatically abbreviated (seq1
, seq2
...) for maximum compatibility with external programs. To this end, it combines an internal Bio::MUST::Core::Ali object and a Bio::MUST::Core::IdMapper object.
An Ali::Temporary
can be built from an existing ALI (or FASTA) file or on-the-fly from a list (ArrayRef) of Bio::MUST::Core::Seq objects (see the SYNOPSIS for examples).
Its sequences can be aligned or not but by default sequences are degapped before writing the associated temporary FASTA file. If gaps are to be preserved, this behavior can be altered via the optional args
attribute.
ATTRIBUTES
seqs
Bio::MUST::Core::Ali object (required)
This required attribute contains the Bio::MUST::Core::Seq objects that are written in the associated temporary FASTA file. It can be specified either as a path to an ALI/FASTA file or as an Ali
object or as an ArrayRef of Seq
objects (see the SYNOPSIS for examples).
For now, it provides the following methods: count_comments
, all_comments
, get_comment
, guessing
, all_seq_ids
, has_uniq_ids
, is_protein
, is_aligned
, get_seq
, get_seq_with_id
, first_seq
, all_seqs
, filter_seqs
and count_seqs
(see Bio::MUST::Core::Ali).
args
HashRef (optional)
When specified this optional attribute is passed to the temp_fasta
method of the internal Ali
object. Its purpose is to allow the fine-tuning of the format of the associated temporary FASTA file.
By default, its contents is <clean =
1>> and <degap =
1>>, so as to generate a FASTA file of degapped sequences where ambiguous and missing states are replaced by X
.
Additionally, if you want to keep your temporary files around for debugging purposes, you can pass the option <persistent =
1>>. This will disable the autoremoval of the file on object destruction.
file
Path::Class::File object (auto)
This attribute is automatically initialized with the path of the associated temporary FASTA file. Thus, it cannot be user-specified.
It provides the following methods: remove
and filename
(see below).
mapper
Bio::MUST::Core::IdMapper object (auto)
This attribute is automatically initialized with the mapper associating the long ids of the internal Ali
object to the abbreviated ids used in the associated temporary FASTA file. Thus, it cannot be user-specified.
It provides the following methods: all_long_ids
, all_abbr_ids
, long_id_for
and abbr_id_for
(see Bio::MUST::Core::IdMapper).
ACCESSORS
filename
Returns the stringified filename of the associated temporary FASTA file.
This method does not accept any arguments.
type
Returns the type of the sequences in the internal Ali
object using BLAST denomination (prot
or nucl
). See Bio::MUST::Core::Seq::is_protein for the exact test performed.
This method does not accept any arguments.
MISC METHODS
remove
Remove (unlink) the associated temporary FASTA file.
Since this method is in principle automatically invoked on object destruction, users should not need it. Note that persistent
temporary files (see object constructor) have to be removed manually, which requires to get and store their filename
before object destruction.
AUTHOR
Denis BAURAIN <denis.baurain@uliege.be>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.