NAME
BioX::Seq - a (very) basic biological sequence object
SYNOPSIS
use BioX::Seq;
my $seq = BioX::Seq->new();
for (qw/AATG TAGG CCAT TTGA/) {
$seq .= $_;
}
$seq->id( 'test_seq' );
my $rc = $seq->rev_com(); # original untouched
print $seq->as_fasta();
# >test_seq
# AATGTAGGCCATTTGA
$seq->rev_com(); # original modified in-place
print $seq->as_fastq(22);
# @test_seq
# TCAAATGGCCTACATT
# +
# 7777777777777777
print $seq->range(3,6)->as_fasta();
# >test_seq
# AAAT
DESCRIPTION
BioX::Seq
is a simple sequence class that can be used to represent biological sequences. It was designed as a compromise between using simple strings and hashes to hold sequences and using the rather bloated objects of Bioperl. Features (or, depending on your viewpoint, bugs) include auto-stringification and context-dependent transformations. It is meant be used primarily as the return object of the BioX::Seq::Stream
and BioX::Seq::Fetch
parsers, but there may be occasions where it is useful in its own right.
BioX::Seq
current implements a small subset of the transformations most commonly used by the author (reverse complement, translate, subrange) - more methods may be added in the future as use suggests and time permits, but the core object will be kept as simple as possible and should be limited to the four current properties - sequence, ID, description, and quality - that satisfy 99% of the author's needs.
Some design decisions have been made for the sake of speed over ease of use. For instance, there is no sanity-checking of the object properties upon creation of a new object or use of the accessor methods. Parameters to the constructor are positional rather than named (testing indicates that this reduces execution times by ~ 40%).
METHODS
- new
- new SEQUENCE
- new SEQUENCE ID
- new SEQUENCE ID DESCRIPTION
- new SEQUENCE ID DESCRIPTION QUALITY
-
Create a new
BioX::Seq
object (empty by default). All arguments are optional but are positional and, if provided, must be given in order.$seq = BioX::Seq->new( SEQ, ID, DESC, QUALITY );
Returns a new
BioX::Seq
object. - seq, id, desc, qual
-
Accessors to the object properties named accordingly. Properties can also be accessed directly as hash keys. This is probably frowned upon by some, but can be useful at times e.g. to perform substution on a property in-place.
$seq->{id} =~ s/^Unnecessary_prefix//;
Takes zero or one arguments. If an argument is given, assigns that value to the property in question. Returns the current value of the property.
- range START END
-
Extract a subsequence from START to END. Coordinates are 1-based.
Returns a new BioX::Seq object, or undef if the coordinates are outside the limits of the parent sequence.
- rev_com
-
Reverse complement the sequence.
Behavior is context-dependent. In scalar or list context, returns a new BioX::Seq object containing the reverse-complemented sequence, leaving the original sequence untouched. In void context, updates the original sequence in-place and returns TRUE if successful.
- translate
- translate FRAME
-
Translate a nucleic acid sequence to a peptide sequence.
FRAME specifies the starting point of the translation. The default is zero. A FRAME value of 0-2 will return the translation of each of the three forward reading frames, respectively, while a value of 3-5 will return the translation of each of the three reverse reading frames, respectively.
- as_fasta
- as_fasta LINE_LENGTH
-
Returns a string representation of the sequence in FASTA format. Requires that, at a minimum, the <seq> and <id> properties be defined. LINE_LENGTH, if given, specifies the line length for wrapping purposes (default: 60).
- as_fastq
- as_fastq DEFAULT_QUALITY
-
Returns a string representation of the sequence in FASTQ format. Requires that, at a minimum, the <seq> and <id> properties be defined. DEFAULT_QUALITY, if given, specifies the default Phred quality score to be assigned to each base if missing - for instance, if converting from FASTA to FASTQ (default: 20).
- as_input
- as_input ARGUMENT
-
If the sequence object comes from a
BioX::Seq::Stream
instance, this method will format the sequence to match the input format, calling eitherBioX::Seq::as_fasta
orBioX::Seq::as_fastq
as appropriate. The optional argument, if given, will be passed on to the appropriate method and evaluated in that context. Throws an error if the input format cannot be deduced (probably because the object was not created by aBioX::Seq::Stream
parser).
CAVEATS AND BUGS
No input validation is performed during construction or modification of the object properties.
Performing certain operations (for instance, s///) on a BioX::Seq object relying on auto-stringification may convert the object into a simple unblessed scalar containing the sequence string. You will likely know if this happens (you are using strict and using warnings, right?) because your script will throw an error if you try to perform a class method on the (now) unblessed scalar.
Please reports bugs or feature requests through the issue tracker at https://github.com/jvolkening/p5-BioX-Seq/issues.
AUTHOR
Jeremy Volkening <jeremy.volkening *at* base2bio.com>
COPYRIGHT AND LICENSE
Copyright 2014-2022 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.