NAME

Bio::Util::DNA - Basic DNA utilities

SYNOPSES

use Bio::Util::DNA qw(:all);

my $clean_ref = cleanDNA($seq_ref);
my $seq_ref = randomDNA(100);
my $rev_ref = reverse_complement($seq_ref);

DESCRIPTION

Provides a set of functions and predefined variables which are handy when working with DNA.

VARIABLES

BASIC VARIABLES

Basic nucleotide variables that could be useful. All of the variables have a prefix and a suffix;

Prefixes

DNA [ACGT]
RNA [ACGU]
degenerate
all_nucleotide

Suffixes

${prefix}s

String of the different nucleotides

@{prefix}s

Array of the different nucleotides

${prefix}_match

Precompiled regular expression which matches nucleotide characters

${prefix}_fail

Precompiled regular expression which matches non-nucleotide characters

%degenerate2nucleotides

Hash of degenerate nucleotide definitions. Each entry contains a reference to an array of DNA nucleotides that each degenerate nucleotide stands for.

%nucleotides2degenerate

Reverse of %degenerate2nucleotides. Keys are alphabetically-sorted DNA nucleotides and values are the degenerate nucleotide that can represent those nucleotides.

%degenerate_hierarchy

Contains the heirarchy of degenerate nucleotides; N of course contains all the other degenerates, and the four degenerates that can stand for three different bases contain three of the two-base degenerates.

FUNCTIONS

cleanDNA

my $clean_ref = cleanDNA($seq_ref);

Cleans the sequence for use. Strips out comments (lines starting with '>') and whitespace, converts uracil to thymine, and capitalizes all characters.

Examples:

my $clean_ref = cleanDNA($seq_ref);

my $seq_ref = cleanDNA(\'actg');
my $seq_ref = cleanDNA(\'act tag cta');
my $seq_ref = cleanDNA(\'>some mRNA
                         acugauauagau
                         uauagacgaucc');

randomDNA

my $seq_ref = randomDNA($length);

Generate random DNA for testing this module or your own scripts. Default length is 100 nucleotides.

Example:

my $seq_ref = randomDNA();
my $seq_ref = randomDNA(600);

reverse_complement

rev_comp

my $reverse_ref = reverse_complement($seq_ref);

Finds the reverse complement of the sequence and handles degenerate nucleotides.

Example:

$reverse_ref = reverse_complement(\'act');

unrollDNA

my $seq_arrayref = unrollDNA( $seq_ref );

Unroll a DNA string containing degenerate nucleotides. The first entry of the arrayref will be the actual sequence.

Example:

my $seq_arrayref = unrollDNA( \'ACSTAD' ) =
    [
        'ACSTAD', 'ACCTAD', 'ACGTAD',
        'ACSTAR', 'ACCTAR', 'ACGTAR',
        'ACSTAW', 'ACCTAW', 'ACGTAW',
        'ACSTAK', 'ACCTAK', 'ACGTAK',
        'ACSTAA', 'ACCTAA', 'ACGTAA',
        'ACSTAG', 'ACCTAG', 'ACGTAG',
        'ACSTAT', 'ACCTAT', 'ACGTAT'
    ]; 

AUTHOR

Kevin Galinsky, <first initial last name plus cpan at gmail dot com>

COPYRIGHT AND LICENSE

Copyright (c) 2010-2011, Broad Institute.

Copyright (c) 2008-2009, J. Craig Venter Institute.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.