NAME

Bio::ToolBox::big_helper

DESCRIPTION

This module helps in the conversion of wig and bed files to bigWig and bigBed files, respectively. It uses external applications to accomplish this, taking care of generating a chromosome file from a database if necessary.

For wig to bigWig conversion, see the UCSC documentation regarding wig and bigWig file formats. It uses the UCSC wigToBigWig utility to perform the conversion. The utility must be available on the system for the conversion to succeed. It may be downloaded from the UCSC Genome utilities page.

For bed to bigBed conversion, See the UCSC documentation regarding bed and bigBed file formats. It uses the UCSC bedToBigBed utility to perform the conversion. This must be present on the system for the conversion to succeed. It may be downloaded from the UCSC Genome utilities page.

In both cases, the conversion requires a list of chromosome name and sizes in a simple text file, where each line is comprised of two columns, "chromosome_name <size_in_bases>". This file may be specified, or automatically generated if given a Bio::DB database name (preferred to ensure genome version compatibility).

USAGE

Load the module at the beginning of your program and include the name or names of the subroutines to export. None are automatically exported.

use Bio::ToolBox::big_helper qw(wig_to_bigwig_conversion);

There are are five available exported subroutines.

wig_to_bigwig_conversion

This subroutine will convert a wig file to a bigWig file.

For bedGraph format wig files, the utility bedGraphToBigWig may be substituted if desired, but wigToBigWig can sufficiently handle all wig formats. When no utility is available but Bio::DB::BigFile is installed, then the module may be used for generating the bigWig file.

After running the utility, the existence of a non-zero byte bigWig file is checked. If it does, then the name of the file is returned. If not, an error is printed and nothing is returned.

Pass the function an array of key => value arguments, including the following:

wig

Pass the name of the wig source file. This is required.

chromo

Pass the path to a chromosome sizes text file, described as above. This is required. Alternatively, a database object may provided.

db

Provide an opened database object from which to generate the chromosome sizes information. This will be passed to generate_chromosome_file to generate a chromosome file. This is a convenience option and alternative to providing an existing chromosome file.

bwapppath

Provide the full path to the UCSC wigToBigWig utility. If not provided, the default PATH will be searched for the utility. The path may also be defined in the configuration file .biotoolbox.cfg.

Example:

my $wig_file = 'example_wig';
my $bw_file = wig_to_bigwig_conversion(
	'wig'   => $wig_file,
	'db'    => $database,
);
if (-e $bw_file) {
	print " success! wrote bigwig file $bw_file\n";
	unlink $wig_file; # no longer necessary
}
else {
	print " failure! see STDERR for errors\n";
};
open_wig_to_bigwig_fh

This subroutine will open a forked process to the UCSC wigToBigWig utility as a file handle, allowing wig lines to be "printed" to the utility for conversion. This is useful for writing directly to a bigWig file without having to write a temporary wig file first. This is also useful when you have multiple wig files, for example individual wig files from separate forked processes, that need to be combined into a bigWig file.

Note that the wigToBigWig utility does not handle errors gracefully and will immediately fail upon encountering errors, usually also bringing the main Perl process with it. Make sure the chromosome file is accurate and the wig lines are properly formatted and in order!

Pass the function an array of key => value arguments. An IO::File object will be returned. Upon the closing the file handle, the wigToBigWig utility will generate the bigWig file.

bw

The output file name for the bigWig file. Also accepts the keys file, wig, and out. This is required.

chromo

Pass the path to a chromosome sizes text file, described as above. This is required. Alternatively, a database object may provided.

db

Provide an opened database object from which to generate the chromosome sizes information. This will be passed to generate_chromosome_file to generate a chromosome file. This is a convenience option and alternative to providing an existing chromosome file. Note that you will need to clean up the file yourself; this function will not do it for you!

chrskip

Provide a regular-expression compatible string for any chromosomes that should be skipped when generating a chromosome sizes file from a provided database.

bwapppath

Provide the full path to the UCSC wigToBigWig utility. If not provided, the default PATH will be searched for the utility. The path may also be defined in the configuration file .biotoolbox.cfg.

Example:

my $bw_file = 'example.bw';
my $chromo_file = generate_chromosome_file($db);
my $bwfh = open_wig_to_bigwig_fh(
	bw      => $bw_file,
	chromo  => $chromo_file,
);
foreach (@wig_lines) {
	$bwfh->print("$_\n");
}
$bwfh->close;
	# this signals the forked wigToBigWig process to write 
	# the bigWig file, which may take a few seconds to minutes
unlink $chromo_file;
open_bigwig_to_wig_fh

This subroutine will open a forked process from the UCSC bigWigToWig utility as a file handle, allowing a bigWig file to be converted to standard text wig format and processed as an input stream. Note that the entire file will be converted in this manner, not specific locations. This is intended for working with the wig file as a whole.

Note that bigWigToWig will output a wig file in whatever format as the bigWig was originally generated with, i.e. fixedStep, varStep, or bedGraph. To export explicitly as a bedGraph, which may be useful in certain circumstances, the UCSC bigWigToBedGraph utility is also supported.

Note that the UCSC utilities do not always handle errors gracefully and will immediately fail upon encountering errors, usually also bringing the main Perl process with it.

Pass the function an array of key => value arguments. An IO::File object will be returned.

bw

The output file name for the bigWig file. Also accepts the keys file and wig. This is required.

bwapppath

Provide the full path to the UCSC bigWigToWig or bigWigToBedGraph utility. If not provided, the default PATH will be searched for the utility. The path may also be defined in the configuration file .biotoolbox.cfg.

Example:

my $bw_file = 'example.bw';
my $bwfh = open_bigwig_to_wig_fh(
	bw    => $bw_file,
);
while (my $line = $bwfh->getline) {
	# do something with wig line
}
$bwfh->close;
bed_to_bigbed_conversion

This subroutine will convert a bed file to a bigBed file.

After running the utility, the existence of a non-zero byte bigBed file is checked. If it does, then the name of the file is returned. If not, an error is printed and nothing is returned.

Pass the function an array of key => value arguments, including the following:

bed

The path and name for the bed file. Only standard Bed files with 3-12 columns are supported. Additional columns, e.g. bed6+4 formats, are not supported. This value is required.

chromo

Pass the path to a chromosome sizes text file, described as above. This is required. Alternatively, a database object may provided.

db

Provide an opened database object from which to generate the chromosome sizes information. This will be passed to generate_chromosome_file to generate a chromosome file. This is a convenience option and alternative to providing an existing chromosome file.

bbapppath

Provide the full path to the UCSC bedToBigBed utility. If not provided, the default PATH will be searched for the utility. The path may also be defined in the configuration file .biotoolbox.cfg.

Example:

my $bed_file = 'example.bed';
my $bb_file = bed_to_bigbed_conversion(
	'bed'   => $bed_file,
	'db'    => $database,
);
if ($bb_file) {
	print " success! wrote bigBed file $bb_file\n";
}
else {
	print " failure! see STDERR for errors\n";
};
generate_chromosome_file

This subroutine will generate a chromosome sizes files appropriate for the big file conversion utilities from an available database. It is a two column text file, the first column is the chromosome name, and the second column is the length in bp. The file is written in the current directory with a name of chr_sizesXXXXX, where X are random characters as defined by File::Temp.

The chromosome names and lengths are obtained from a Bio::DB database using the "get_chromosome_list" in Bio::ToolBox::db_helper subroutine.

Pass the subroutine a database name, path to a supported database file, or opened Bio::DB object.

Optionally pass a second value, a regular expression compatible string or qr for skipping specific chromosomes or chromosome classes, such as mitochondrial or unmapped contigs. The default is to return all chromosomes.

The file will be written, closed, and the filename returned.

AUTHOR

Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.