The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Bio::ToolBox::utility - common utility functions for Bio::ToolBox

DESCRIPTION

These are general subroutines that don't fit in with the other modules.

REGULAR SUBROUTINES

The following subroutines are automatically exported when you use this module.

parse_list
        my $index_request = '1,2,5-7';
        my @indices = parse_list($index_request); # returns [1,2,5,6,7]

This subroutine parses a scalar value into a list of values. The scalar is a text string of numbers (usually column or dataset indices) delimited by commas and/or including a range. For example, a string "1,2,5-7" would become an array of [1,2,5,6,7].

Pass the module the scalar string.

It will return the array of numbers.

format_with_commas
        my $count = '4327908475';
        printf " The final count was %s\n", format_with_commas($count);

This subroutine process a large number (e.g. 4327908475) into a human-friendly version with commas delimiting the thousands (4,327,908,475).

Pass the module a scalar string with a number value.

It will return a scalar value containing the formatted number.

ask_user_for_index
        my @answers = ask_user_for_index($Data, 'Please enter 2 or more columns   ');

This subroutine will present the list of column names from a Bio::ToolBox::Data structure along with their numeric indexes to the user and prompt for one or more to be selected and entered. The function is smart enough to only print the list once (if it hasn't changed) so as not to annoy the user with repeated lists of header names when used more than once. A text prompt should be provided, or a generic one is used. The list of indices are validated, and a warning printed for invalid responses. The responses are then returned as a single value or array, depending on context.

simplify_dataset_name
        my $simple_name = simplify_dataset_name($dataset);

This subroutine will take a dataset name and simplify it. Dataset names may often be file names of data files, such as Bam and bigWig files. These may include a file:, http:, or ftp: prefix, one or more directory paths, and one or more file name extensions. Additionally, more than one dataset may be combined, for example two stranded bigWig files, with an ampersand. This function will safely remove the prefix, directories, and everything after the first period.

sane_chromo_sort
    my @chromo = $db->seq_ids;
    my @sorted = sane_chromo_sort(@chromo);

This subroutine will take a list of chromosome or sequence identifiers and sort them into a reasonably sane order: standard numeric identifiers first (numeric order), sex chromosomes (alphabetical), mitochondrial, names with text and numbers (text first alphabetically, then numbers numerically) for contigs and such, and finally anything else (aciibetically). Any 'chr' prefix is ignored. Roman numerals are properly handled numerically.

The provided list may be a list of SCALAR values (chromosome names) or ARRAY references, with the first element assumed to be the name, e.g. [$name, $length].

AUTHOR

 Timothy J. Parnell, PhD
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.