NAME

Bio::ToolBox::big_helper

DESCRIPTION

This module helps in the conversion of wig and bed files to bigWig and bigBed files, respectively. It uses external applications to accomplish this, taking care of generating a chromosome file from a database if necessary.

Two exported subroutines are available for wig and bed conversions.

USAGE

Load the module at the beginning of your program and include the name or names of the subroutines to export. None are automatically exported.

use Bio::ToolBox::big_helper qw(wig_to_bigwig_conversion);
wig_to_bigwig_conversion()

This subroutine will convert a wig file to a bigWig file. See the UCSC documentation regarding wig (http://genome.ucsc.edu/goldenPath/help/wiggle.html) and bigWig (http://genome.ucsc.edu/goldenPath/help/bigWig.html) file formats. It uses Jim Kent's wigToBigWig or bedGraphToBigWig utility to perform the conversion, depending on the format of the wig file. The utility must be available on the system for the conversion to succeed.

The conversion requires a list of chromosome name and sizes in a simple text file, where each line is comprised of two columns, "<chromosome name> <size in bases>". This file may be specified, or automatically generated if given a Bio::DB database name (preferred to ensure genome version compatibility).

After running the utility, the existence of a non-zero byte bigWig file is checked. If it does, then the name of the file is returned. If not, an error is printed and nothing is returned.

Pass the function an anonymous hash of arguments, including the following:

Required:
wig         => The name of the wig source file. 
db          => Provide an opened database object from which to generate 
               the chromosome sizes information.
Optional: 
chromo      => The name of the chromosome sizes text file, described 
               above, as an alternative to providing the database name.
bwapppath   => Provide the full path to Jim Kent's wigToBigWig 
               utility. This parameter may instead be defined in the 
               configuration file "biotoolbox.cfg". 

Example

my $wig_file = 'example_wig';
my $bw_file = wig_to_bigwig_conversion( {
		'wig'   => $wig_file,
		'db'    => $database,
} );
if (-e $bw_file) {
	print " success! wrote bigwig file $bw_file\n";
	unlink $wig_file; # no longer necessary
}
else {
	print " failure! see STDERR for errors\n";
};
bed_to_bigbed_conversion

This subroutine will convert a bed file to a bigBed file. See the UCSC documentation regarding bed (http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED) and bigBed (http://genome.ucsc.edu/goldenPath/help/bigBed.html) file formats. It uses Jim Kent's bedToBigBed utility to perform the conversion. This must be present on the system for the conversion to succeed.

The conversion requires a list of chromosome name and sizes in a simple text file, where each line is comprised of two columns, "<chromosome name> <size in bases>". This file may be specified, or automatically generated if given a Bio::DB database name (preferred to ensure genome version compatibility).

After running the utility, the existence of a non-zero byte bigBed file is checked. If it does, then the name of the file is returned. If not, an error is printed and nothing is returned.

Pass the function an anonymous hash of arguments, including the following:

Required:
bed         => The name of the bed source file. 
db          => Provide an opened database object from which to generate 
               the chromosome sizes information.
Optional: 
chromo      => The name of the chromosome sizes text file, described 
               above, as an alternative to providing the database name.
bbapppath   => Provide the full path to Jim Kent's bedToBigBed  
               utility. This parameter may instead be defined in the 
               configuration file "biotoolbox.cfg". 

Example

my $bed_file = 'example.bed';
my $bb_file = bed_to_bigbed_conversion( {
		'bed'   => $bed_file,
		'db'    => $database,
} );
if ($bb_file) {
	print " success! wrote bigBed file $bb_file\n";
}
else {
	print " failure! see STDERR for errors\n";
};
generate_chromosome_file

This subroutine will generate a chromosome sizes files appropriate for the big file conversion utilities from an available database. It is a two column text file, the first column is the chromosome name, and the second column is the length in bp. The file is written in the current directory with a name of "chr_sizesXXXXX", where X are random characters as defined by File::Temp.

The chromosome names and lengths are obtained from a Bio::DB database using the Bio::ToolBox::db_helper::get_chromosome_list() subroutine.

Pass the subroutine a database name, path to a supported database file, or opened Bio::DB object.

The file will be written, closed, and the filename returned.

AUTHOR

Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.