NAME
biopop - SNP statistics based on BioPerl
SYNOPSIS
biopop [options] <alignment_file>
biopop [-h | --help | -V | --version | --man]
biopop -s pop.fas # num of [s]egregating sites
biopop -p pop.fas # average [p]airwise nucleotide difference
biopop -f pop.fas # [f]our gamete tests
biopop -c pop.fas # [c]oding SNPs
biopop -n pop.fas # [n]on-coding SNPs
biopop -m pop.fas # [m]is-match distribution
biopop -b pop.fas # Retain only [b]inary informative sites
DESCRIPTION
biopop is a pop-genetics utility based on BioPerl modules including Bio::PopGen::Utilities, Bio::PopGen::Statistics, and Bio::PopGen::Population. Most methods are not in BioPerl and have not been validated. Use with caution.
OPTIONS
- --bi-part
-
Prints, for each binary informative SNP sites, a NEWICK tree. This could be used to test site compatibility (recombination), similar to the four-gamete test.
- --bi-sites, -b
-
Prints a FASTA alignment consisting of only binary-informative SNPs.
- --bi-sites-for-r
-
Prints binary-informative SNPs for each individual, in a pseudo-diploid genotype so the output could be imported into R package "genetics" for further analysis.
- --distance|-d 'jc|k2|uncorrected|f81|t92|f84|tajimanei'
-
Prints a distance matrix based on a specified method (JC by default)
- --four-gametes, -f
-
Performs four-gametes test of recombination by Hudson & Kaplan (Genetics.1985. 111:147-164) and a test of epistasis (Wilson??). It identifies all binary-informative SNPs and print, for each of pair of SNPs per line, site coordinates, counts of four possible gametes, Shannon diversity of haplotypes, and whether compatible or not. Two SNPs are incompatibile if all four possible haplotypes are present, indicating recombination. Presence of only two of the four possible haplotypes indicate, on the other hand, a possible epistatic interaction.
- --heterozygosity, -H
-
Print, for each segregating site, the observed heterozygosity [i.e., 1-sum(freq^2)].
- --input, -i <format>
-
Input file format. By default, this is 'FASTA'. Now it tries to guess the format. No more need to set this flag.
- --mis-match, -m
-
Print pairwise mismatches for all sequences, the distribution of which indicates population age.
- --pi, -p
-
Nucleotide Diversity is a measure of genetic variation or differences.
- --seg-sites, -s
-
Prints number of segregating sites.
- --snp-coding, -c
-
Identify & print, for each 2-state SNP, codon position, aligned nucleotide position, syn/nonsyn, frequencies of each allelic state, and Shannon diversity for a coding alignment.
- --snp-coding-long, -C
-
Print long-format of the above method.
- --snp-noncoding, -n
-
Identify & print, for each 2-state SNP, SNP position, SNP states, frequencies of each allleic state, and Shannon diversity.
- --stats, -t <comma separated list of values>
-
Specify the statistics ('pi', 'theda', 'tajima_d', per-site values) you would like to gather from input data. e.g., "theta,pi" will calculate the theta and pi values.
Can also be specified by giving the option multiple times. e.g., biopop --stats=pi --stats=theta
Common Options
- --help, -h
-
Print a brief help message and exit.
- --man
-
Print the manual page and exit.
- --version, -V
-
Print current release version and exit.
SEE ALSO
Bio::BPWrapper::PopManipulations, the underlying Perl Module
bioaln: a wrapper of BioPerl class Bio::SimpleAlign with additional methods
CONTRIBUTORS
Yözen Hernández <yzhernand at gmail dot com> (initial design & implementation)
Weigang Qiu <weigang@genectr.hunter.cuny.edu> (Maintainer)
Rocky Bernstein (testing & release)
TO DO
Clean and refactor PopManipulation codes (e.g., factor out shared variables and subroutines)
Move dist methods to bioaln
Add multiple-loci (pop-genome) capabilities
Add outgroup-based statistics, e.g, mk, iHS
Add KaKs statistiscs
TO CITE
Hernandez, Bernstein, Qiu, et al (2017). "BpWrappers: Command-line utilities for manipulation of sequences, alignments, and phylogenetic trees based on BioPerl". (In prep).
Stajich et al (2002). "The BioPerl Toolkit: Perl Modules for the Life Sciences". Genome Research 12(10):1611-1618.