NAME
Bio::BPWrapper::SeqManipulations - Functions for bioseq
SYNOPSIS
use Bio::BPWrapper::SeqManipulations;
# Set options hash ...
initialize(\%opts);
write_out(\%opts);
SUBROUTINES
initialize()
Sets up most of the actions to be performed on sequences.
Call this right after setting up an options hash.
Sets package variables: $in
, $in_format
, $filename
, $out_format
, and $out
.
write_out()
Writes out the sequence file.
Call this after calling #initialize(\%opts)
and processing those options.
retrieve_seqs()
Retrieves a sequence from GenBank using the provided accession number. A wrapper for Bio::DB::GenBank>#get_Seq_by_acc
.
remove_gaps()
Remove gaps
print_lengths()
Print all sequence lengths. Wraps Bio::Seq->length.
print_seq_count()
Print all sequence lengths. Wraps Bio::Seq->length.
make_revcom()
Reverse complement. Wraps Bio::Seq->revcom().
print_subseq()
Select substring (of the 1st sequence). Wraps Bio::Seq->subseq().
reading_frame_ops
Translate in 1, 3, or 6 frames based on the value of $opts
set via #initilize(\%opts)
. Wraps Bio::Seq->translate(), Bio::SeqUtils->translate_3frames(), and Bio::SeqUtils->translate_6frames().
restrict_coord()
Finds digestion coordinates by a specified restriction enzyme specified in $opts{restrinct}
set via #initilize(\%opts)
.
An input file with sequences is expected. Wraps Bio::Restriction::Analysis->cut().
Outputs coordinates of overhangs in BED format.
restrict_digest()
Predicted fragments from digestion by a specified restriction enzyme specified in $opts{restrinct}
set via #initilize(\%opts)
.
An input file with sequences is expected. Wraps Bio::Restriction::Analysis->cut().
anonymize()
Replace sequence IDs with serial IDs n characters long, as specified in $opts{'anonymize'}
set via #initilize(\%opts)
. For example if $opts{'anonymize'}
, the first ID will be S0001
. leading 'S' The length of the serial idea
A sed script file is produced with a .sed suffix that may be used with sed's '-f'
argument. If the filename is '-', the sed file is named STDOUT.sed
instead. A message containing the sed filename is written to STDERR
.
shred_seq()
Break into individual sequences writing a FASTA file for each sequence.
count_codons()
Count codons for coding sequences (e.g., a genome file consisting of CDS sequences). Wraps Bio::Tools::SeqStats->count_codons().
print_gb_gene_feats()
print gene sequences in FASTA from a GenBank file of bacterial genome. Won't work for a eukaryote genbank file.
count_leading_gaps()
Count and print the number of leading gaps in each sequence.
hydroB()
Return the mean Kyte-Doolittle hydropathicity for protein sequences. Wraps Bio::Tools::SeqStats->hydropathicity().
linearize()
Linearize FASTA, print one sequence per line.
reloop_at()
Re-circularize a bacterial genome by starting at a specified position given in the $opts{"reloop"
set via #initilize(\%opts)
.
For example for sequence "ABCDE". bioseq -R'2' ..
would generate"'BCDEA".
remove_stop()
Remove stop codons.
EXTENDING THIS MODULE
We encourage BioPerl developers to add command-line interface to their BioPerl methods here.
Here is how to extend. We'll use option --count-codons
as an example.
Create a new method like one of the above in the previous section.
Document your method in pod using
=head2
. For example:=head2 count_codons() Count codons for coding sequences (e.g., a genome file consisting of CDS sequences). Wraps L<Bio::Tools::SeqStats-E<gt>count_codons()|https://metacpan.org/pod/Bio::Tools::SeqStats#count_codons>. =cut
See
count_codons()
for how this gets rendered.Add the method to
@EXPORT
list inSeqManipulations.pm
.Add option to
%opt_displatch
which maps the option used inbioaln
to the subroutine that gets called here. For example:'count-codons' => \&count_codons,
Add option in to
bioseq
script. See the code that starts:GetOptions( ... "count-codons|C", ...
This option has a short option name
C
and takes no additional argument. See Getopt::Long for how to specify options.Write a test for the option. See the file
t/10test-bioseq.t
and Testing.Share back. Create a pull request to the github repository and contact Weigang Qiu, City University of New York, Hunter College (mailto:weigang@genectr.hunter.cuny.edu)
SEE ALSO
bioaseq: command-line tool for using this
CONTRIBUTORS
Yözen Hernández yzhernand at gmail dot com
Pedro Pegan
Weigang Qiu (Maintainer)
Rocky Bernstein