Proch::N50 0.70
A simple Perl module to calculate N50 of a FASTA or FASTQ file
The updated documentation is in the Meta::CPAN page.
The module ships the n50 program to calculate the N50 of FASTA/FASTQ files (documentation).
Installation
Via CPANminus:
#If you don't have 'cpanm' already installed:
curl -L http://cpanmin.us | perl - App::cpanminus
cpanm Proch::N50
Via Miniconda:
conda install -y -c bioconda n50
n50 program
See full documentation in the CPAN page.
- Simple usage with one input file (FASTA or FASTQ):
n50 file.fasta
- Use the output in bash scripts:
MY_N50=$(n50 input.fasta -n)
- Screen friendly table (-x is a shortcut for --format screen), sorted by N50 descending (default):
n50.pl -x files/*.fa
- Screen friendly table, sorted by total contig length (--sortby max) ascending (--reverse):
n50.pl -x -o max -r files/*.fa
- Tabular (tsv) output is default:
n50 -o max -r files/*.fa
- To print data with your custom output format:
n50 data/*.fa -f custom -t '{path}{tab}N50={N50};Sum={size}{new}'
Output formats
- tsv (tab separated values)
#path seqs size N50 min max
test2.fa 8 825 189 4 256
reads.fa 5 247 100 6 102
small.fa 6 130 65 4 65
- csv (comma separated values), same as
--format tsvand--separator,:
#path,seqs,size,N50,min,max
test.fa,8,825,189,4,256
reads.fa,5,247,100,6,102
small_test.fa,6,130,65,4,65
- screen friendly,
-xas shortcut for--format screen:
.-----------------------------------------------------------.
| File | Seqs | Total bp | N50 | min | max |
+--------------------+-------+----------+------+-----+------+
| test_fasta_grep.fa | 1 | 80 | 80 | 80 | 80 |
| small_test.fa | 6 | 130 | 65 | 4 | 65 |
| rdp_16s_v16.fa | 13212 | 19098167 | 1467 | 320 | 2210 |
'--------------------+-------+----------+------+-----+------'
- json (JSON), use
-jas shortcut for--format json:
{
"small_test.fa" : {
"max" : 65,
"N50" : 65,
"seqs" : 6,
"size" : 130,
"min" : 4
},
"rdp_16s_v16.fa" : {
"seqs" : 13212,
"N50" : 1467,
"max" : 2210,
"min" : 320,
"size" : 19098167
}
}
Proch::N50 - short synopsis of the module
use Proch::N50 qw(getStats getN50);
my $filepath = '/path/to/assembly.fasta';
# Get N50 only: getN50(file) will return an integer
print "N50 only:\t", getN50($filepath), "\n";
# Full stats
my $seq_stats = getStats($filepath);
print Data::Dumper->Dump( [ $seq_stats ], [ qw(*FASTA_stats) ] );
# Will print:
# %FASTA_stats = (
# 'N50' => 65,
# 'dirname' => 'data',
# 'size' => 130,
# 'seqs' => 6,
# 'filename' => 'small_test.fa',
# 'status' => 1
# );
# Get also a JSON object
my $seq_stats_with_JSON = getStats($filepath, 'JSON');
print $seq_stats_with_JSON->{json}, "\n";
# Will print:
# {
# "seqs" : 6,
# "status" : 1,
# "filename" : "small_test.fa",
# "N50" : "65",
# "dirname" : "data",
# "size" : 130
# }