NAME
FASTX::Reader - A simple module to parse FASTA and FASTQ files, supporting compressed files and paired-ends.
VERSION
version 1.5.1
SYNOPSIS
use FASTX::Reader;
my $filepath = '/path/to/assembly.fastq';
die "Input file not found: $filepath\n" unless (-e "$filepath");
my $fasta_reader = FASTX::Reader->new({ filename => "$filepath" });
while (my $seq = $fasta_reader->getRead() ) {
f $seq->{name}, "\t", $seq->{seq}, "\t", $seq->{qual}, "\n";
}
BUILD TEST
The GitHub repository is tested with a GitHub Action. Every CPAN release is tested by the CPAN testers grid.
METHODS
new()
Initialize a new FASTX::Reader object passing 'filename' argument. Will open a filehandle stored as $object->{fh}.
my $seq_from_file = FASTX::Reader->new({ filename => "$file" });
To read from STDIN either pass {{STDIN}}
as filename, or don't pass a filename at all:
my $seq_from_stdin = FASTX::Reader->new();
The parameter loadseqs
will preload all sequences in a hash having the sequence name as key and its sequence as value.
my $seq_from_file = FASTX::Reader->new({
filename => "$file",
loadseqs => 1,
});
getRead()
Will return the next sequence in the FASTA / FASTQ file using Heng Li's implementation of the readfq() algorithm. The returned object has these attributes:
- name
-
header of the sequence (identifier)
- comment
-
any string after the first whitespace in the header
- seq
-
actual sequence
- qual
-
quality if the file is FASTQ
next()
Get the next sequence as a blessed object, having the same attributes as the regular has provided by getRead()
: name, comment, seq, qual. The class for this object is FASTX::Seq
.
getFastqRead()
If the file is FASTQ, this method returns the same read object as getRead() but with a simpler, FASTQ-specific, parser. Attributes of the returned object are name, comment, seq, qual (as for getRead()). It will alter the status
attribute of the reader object if the FASTQ format looks terribly wrong.
use FASTX::Reader;
my $filepath = '/path/to/assembly.fastq';
my $fasta_reader = FASTX::Reader->new({ filename => "$filepath" });
while (my $seq = $fasta_reader->getFastqRead() ) {
die "Error parsing $filepath: " . $fasta_reader->{message} if ($fasta_reader->{status} != 1);
print $seq->{name}, "\t", $seq->{seq}, "\t", $seq->{qual}, "\n";
}
getIlluminaRead()
If the file is FASTQ, this method returns the same read object as getRead() but with a simpler parser. Attributes of the returned object are name, comment, seq, qual (as for getRead()). In addition to this it will parse the name and comment populating these properties fromt the read name: instrument
, run
, flowcell
, lane
, tile
, x
, y
, umi
.
If the comment is also present the following will also populated: read
(1 for R1, and 2 for R2), index
(barcode of the current read), paired_index
(barcode of the other read) and filtered
(true if the read is to be discarded, false elsewhere).
It will alter the status
attribute of the reader object if the FASTQ format looks terribly wrong.
while (my $seq = $fasta_reader->getIlluminaRead() ) {
print $seq->{name}, "\t", $seq->{instrument}, ',', $seq->{index1}, "\n";
}
getFileFormat(filename)
This subroutine returns 'fasta', 'fastq' or <undef> for a given filepath (this is not a method of the instantiated object)
ACKNOWLEDGEMENTS
- Heng Li's readfq()
-
This module is a has been inspired by the readfq() subroutine originally written by Heng Li, that I updated to retain sequence comments. See: readfq repository
- Fabrizio Levorin
-
has contributed to the prototyping of this module
SEE ALSO
- BioX::Seq::Stream
-
The module I would have used if it was available when I started working on this. The .gz reader implementation comes from this module.
AUTHOR
Andrea Telatin <andrea@telatin.com>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2019 by Andrea Telatin.
This is free software, licensed under:
The MIT (X11) License