NAME

Bio::BPWrapper::AlnManipulations - Functions for bioaln

SYNOPSIS

use Bio::BPWrapper::AlnManipulations;
# Set options hash ...
initialize(\%opts);
write_out(\%opts);

SUBROUTINES

initialize()

Sets up most of the actions to be performed on an alignment.

Call this right after setting up an options hash.

Sets package variables: $in_format, $binary, $out_format, and $out.

write_out_paml()

Writes output in PAML format.

write_out()

Performs the bulk of the alignment actions actions set via initialize(\%opts) and calls $AlignIO->write_aln() or write_out_paml().

Call this after calling #initialize(\%opts).

Print the average percent identity of an alignment.

Wraps Bio::SimpleAlign->average_percentage_identity().

boostrap()

Produce a bootstrapped alignment. Bio::Align::Utilities->bootstrap().

draw_codon_view()

Print a CLUSTALW-like alignment, but separated by codons. Intended for use with DNA sequences. Block-final position numbers are printed at the end of every alignment block at the point of wrapping, and block-initial counts appear over first nucleotide in a block.

del_seqs()

Delete sequences based on their id. Option takes a comma-separated list of ids. The list of sequences to delete is in $opts{"delete"} which is set via #initilize(\%opts)

remove_gaps()

Remove gaps (and returns an de-gapped alignment). Wraps Bio::SimpleAlign->remove_gaps().

Print alignment length. Wraps Bio::SimpleAlign->length().

Go through all columns and change residues identical to the reference sequence to be the match character, '.' Wraps Bio::SimpleAlign->match().

Print number of sequences in alignment.

pick_seqs()

Pick sequences based on their id. Option takes a comma-separated list of ids. The sequences to pick set in $opts{"pick"} which is set via #initilize(\%opts).

change_ref()

Change the reference sequence to be what is in $opts{"refseq"} which is set via #initilize(\%opts). Wraps Bio::SimpleAlign->set_new_reference().

aln_slice()

Get a slice of the alignment. The slice is specified $opts{"slice"} which is set via #initilize(\%opts).

Wraps Bio::SimpleAlign->slice() with improvements.

get_unique()

Extract the alignment of unique sequences. Wraps Bio::SimpleAlign->uniq_seq().

binary_informative

extract binary and informative sites (for clique): discard constant, 3/4-states, non-informative

variable_sites()

Extracts variable sites.

avg_id_by_win()

Calculate pairwise average sequence difference by windows (overlapping windows with fixed step of 1). The window size is set in $opts{"window"} which is set via #initilize(\%opts).

concat()

Concatenate multiple alignments sharing the same set of unique IDs. This is normally used for concatenating individual gene alignments of the same set of samples to a single one for making a "supertree". Wraps Bio::Align::Utilities>cat().

dna_to_protein()

Align CDS sequences according to their corresponding protein alignment. Wraps Bio::Align::Utilities->aa_to_dna_aln().

list_ids()

List all sequence ids.

protein_to_dna()

Align CDS sequences according to their corresponding protein alignment. Wraps "/metacpan.org/pod/Bio::Align::Utilities#aa_to_dna_aln" in Bio::Align::Utilities->aa_to_dna_aln()https:.

sample_seqs()

Picks n random sequences from input alignment and produces a new alignment consisting of those sequences.

If n is not given, default is the number of sequences in alignment divided by 2, rounded down.

This functionality uses an implementation of Reservoir Sampling, based on the algorithm found here: http://blogs.msdn.com/b/spt/archive/2008/02/05/reservoir-sampling.aspx

shuffle_sites()

Make a shuffled (not bootstrapped) alignment. This operation randomizes alignment columns. It is used for testing the significance of long-runs of conserved sites in an alignment (e.g., conserved intergenic spacers [IGSs]).

EXTENDING THIS MODULE

We encourage BioPerl developers to add command-line interface to their BioPerl methods here.

Here is how to extend. We'll use option --avpid as an example.

  • Create a new method like one of the above in the previous section.

  • Document your method in pod using =head2. For example:

    =head2 print_avpid
    
    Print the average percent identity of an alignment.
    
    Wraps
    L<Bio::SimpleAlign-E<gt>average_percentage_identity()|https://metacpan.org/pod/Bio::SimpleAlign#average_percentage_identity>.
    
    =cut

    See print_avpid() for how this gets rendered.

  • Add the method to @EXPORT list in AlnManipulations.pm.

  • Add option to %opt_displatch which maps the option used in bioaln to the subroutine that gets called here. For example:

    "avpid" => \&print_avp_id,
  • Add option in to bioaln script. See the code that starts:

    GetOptions(
    ...
    "avpid|a",
    ...

    This option has a short option name a and takes no additional argument

  • Write a test for the option. See the file t/10test-bioaln.t and Testing.

  • Share back. Create a pull request to the github repository and contact Weigang Qiu, City University of New York, Hunter College (mailto:weigang@genectr.hunter.cuny.edu)

SEE ALSO

CONTRIBUTORS

  • William McCaig <wmccaig at gmail dot com>

  • Girish Ramrattan <gramratt at gmail dot com>

  • Che Martin <che dot l dot martin at gmail dot com>

  • Yözen Hernández yzhernand at gmail dot com

  • Levy Vargas <levy dot vargas at gmail dot com>

  • Weigang Qiu (Maintainer)

  • Rocky Bernstein