NAME
Bio::BPWrapper::AlnManipulations - Functions for bioaln
SYNOPSIS
use Bio::BPWrapper::AlnManipulations;
# Set options hash ...
initialize(\%opts);
write_out(\%opts);
SUBROUTINES
initialize()
Sets up most of the actions to be performed on an alignment.
Call this right after setting up an options hash.
Sets package variables: $in_format
, $binary
, $out_format
, and $out
.
write_out_paml()
Writes output in PAML format.
write_out()
Performs the bulk of the alignment actions actions set via initialize(\%opts)
and calls $AlignIO->write_aln()
or write_out_paml()
.
Call this after calling #initialize(\%opts)
.
print_avp_id
Print the average percent identity of an alignment.
Wraps Bio::SimpleAlign->average_percentage_identity().
boostrap()
Produce a bootstrapped alignment. Bio::Align::Utilities->bootstrap().
draw_codon_view()
Print a CLUSTALW-like alignment, but separated by codons. Intended for use with DNA sequences. Block-final position numbers are printed at the end of every alignment block at the point of wrapping, and block-initial counts appear over first nucleotide in a block.
del_seqs()
Delete sequences based on their id. Option takes a comma-separated list of ids. The list of sequences to delete is in $opts{"delete"}
which is set via #initilize(\%opts)
remove_gaps()
Remove gaps (and returns an de-gapped alignment). Wraps Bio::SimpleAlign->remove_gaps().
print_length()
Print alignment length. Wraps Bio::SimpleAlign->length().
print_match()
Go through all columns and change residues identical to the reference sequence to be the match character, '.' Wraps Bio::SimpleAlign->match().
print_num_seq()
Print number of sequences in alignment.
pick_seqs()
Pick sequences based on their id. Option takes a comma-separated list of ids. The sequences to pick set in $opts{"pick"}
which is set via #initilize(\%opts)
.
change_ref()
Change the reference sequence to be what is in $opts{"refseq"}
which is set via #initilize(\%opts)
. Wraps Bio::SimpleAlign->set_new_reference().
aln_slice()
Get a slice of the alignment. The slice is specified $opts{"slice"}
which is set via #initilize(\%opts)
.
Wraps Bio::SimpleAlign->slice() with improvements.
get_unique()
Extract the alignment of unique sequences. Wraps Bio::SimpleAlign->uniq_seq().
binary_informative
extract binary and informative sites (for clique): discard constant, 3/4-states, non-informative
variable_sites()
Extracts variable sites.
avg_id_by_win()
Calculate pairwise average sequence difference by windows (overlapping windows with fixed step of 1). The window size is set in $opts{"window"}
which is set via #initilize(\%opts)
.
concat()
Concatenate multiple alignments sharing the same set of unique IDs. This is normally used for concatenating individual gene alignments of the same set of samples to a single one for making a "supertree". Wraps Bio::Align::Utilities>cat().
dna_to_protein()
Align CDS sequences according to their corresponding protein alignment. Wraps Bio::Align::Utilities->aa_to_dna_aln().
list_ids()
List all sequence ids.
protein_to_dna()
Align CDS sequences according to their corresponding protein alignment. Wraps "/metacpan.org/pod/Bio::Align::Utilities#aa_to_dna_aln" in Bio::Align::Utilities->aa_to_dna_aln()https:.
sample_seqs()
Picks n random sequences from input alignment and produces a new alignment consisting of those sequences.
If n is not given, default is the number of sequences in alignment divided by 2, rounded down.
This functionality uses an implementation of Reservoir Sampling, based on the algorithm found here: http://blogs.msdn.com/b/spt/archive/2008/02/05/reservoir-sampling.aspx
shuffle_sites()
Make a shuffled (not bootstrapped) alignment. This operation randomizes alignment columns. It is used for testing the significance of long-runs of conserved sites in an alignment (e.g., conserved intergenic spacers [IGSs]).
EXTENDING THIS MODULE
We encourage BioPerl developers to add command-line interface to their BioPerl methods here.
Here is how to extend. We'll use option --avpid
as an example.
Create a new method like one of the above in the previous section.
Document your method in pod using
=head2
. For example:=head2 print_avpid Print the average percent identity of an alignment. Wraps L<Bio::SimpleAlign-E<gt>average_percentage_identity()|https://metacpan.org/pod/Bio::SimpleAlign#average_percentage_identity>. =cut
See
print_avpid()
for how this gets rendered.Add the method to
@EXPORT
list inAlnManipulations.pm
.Add option to
%opt_displatch
which maps the option used inbioaln
to the subroutine that gets called here. For example:"avpid" => \&print_avp_id,
Add option in to
bioaln
script. See the code that starts:GetOptions( ... "avpid|a", ...
This option has a short option name
a
and takes no additional argumentWrite a test for the option. See the file
t/10test-bioaln.t
and Testing.Share back. Create a pull request to the github repository and contact Weigang Qiu, City University of New York, Hunter College (mailto:weigang@genectr.hunter.cuny.edu)
SEE ALSO
bioaln: command-line tool for using this
CONTRIBUTORS
William McCaig <wmccaig at gmail dot com>
Girish Ramrattan <gramratt at gmail dot com>
Che Martin <che dot l dot martin at gmail dot com>
Yözen Hernández yzhernand at gmail dot com
Levy Vargas <levy dot vargas at gmail dot com>
Weigang Qiu (Maintainer)
Rocky Bernstein