Proch::N50 0.70

CPAN Kwalitee Version install with bioconda Tests

A simple Perl module to calculate N50 of a FASTA or FASTQ file

The updated documentation is in the Meta::CPAN page.

The module ships the n50 program to calculate the N50 of FASTA/FASTQ files (documentation).

Installation

Via CPANminus:

#If you don't have 'cpanm' already installed:
curl -L http://cpanmin.us | perl - App::cpanminus

cpanm Proch::N50

Via Miniconda:

conda install -y -c bioconda n50

n50 program

See full documentation in the CPAN page.

n50 file.fasta
MY_N50=$(n50 input.fasta -n)
n50.pl -x files/*.fa
n50.pl -x -o max -r files/*.fa
n50 -o max -r files/*.fa

n50 data/*.fa -f custom -t '{path}{tab}N50={N50};Sum={size}{new}'

Output formats

#path seqs    size    N50     min     max
test2.fa      8        825    189     4       256
reads.fa      5        247    100     6       102
small.fa      6       130     65      4       65
#path,seqs,size,N50,min,max
test.fa,8,825,189,4,256
reads.fa,5,247,100,6,102
small_test.fa,6,130,65,4,65
    .-----------------------------------------------------------.
    | File               | Seqs  | Total bp | N50  | min | max  |
    +--------------------+-------+----------+------+-----+------+
    | test_fasta_grep.fa |     1 |       80 |   80 |  80 |   80 |
    | small_test.fa      |     6 |      130 |   65 |   4 |   65 |
    | rdp_16s_v16.fa     | 13212 | 19098167 | 1467 | 320 | 2210 |
    '--------------------+-------+----------+------+-----+------'
    {
      "small_test.fa" : {
         "max"  : 65,
         "N50"  : 65,
         "seqs" : 6,
         "size" : 130,
         "min"  : 4
      },
      "rdp_16s_v16.fa" : {
         "seqs" : 13212,
         "N50"  : 1467,
         "max"  : 2210,
         "min"  : 320,
         "size" : 19098167
      }
    }

Proch::N50 - short synopsis of the module

use Proch::N50 qw(getStats getN50);
my $filepath = '/path/to/assembly.fasta';
 
# Get N50 only: getN50(file) will return an integer
print "N50 only:\t", getN50($filepath), "\n";
 
# Full stats
my $seq_stats = getStats($filepath);
print Data::Dumper->Dump( [ $seq_stats ], [ qw(*FASTA_stats) ] );
# Will print:
# %FASTA_stats = (
#               'N50' => 65,
#               'dirname' => 'data',
#               'size' => 130,
#               'seqs' => 6,
#               'filename' => 'small_test.fa',
#               'status' => 1
#             );
 
# Get also a JSON object
my $seq_stats_with_JSON = getStats($filepath, 'JSON');
print $seq_stats_with_JSON->{json}, "\n";
# Will print:
# {
#    "seqs" : 6,
#    "status" : 1,
#    "filename" : "small_test.fa",
#    "N50" : "65",
#    "dirname" : "data",
#    "size" : 130
# }