NAME

CracTools::GenomeMask - A bit vector mask over the whole genome

VERSION

version 1.251

SYNOPSIS

my $genome_mask = CracTools::GenomeMask->new( genome => { "chr1" => 100000, "chr2" => 20000 } );

$genome_mask->setRegion("chr1",200,250);

$genome_mask->getNbBitsSetInRegion("chr1",190,220);

DESCRIPTION

This module defines a BitVector mask over a whole genome and provide method to query this mask. It can read genome sequence and length from various sources (SAM headers, CRAC index, User input).

SEE ALSO

You can look at CracTools::BitVector that is the underlying datastructure of CracTools::GenomeMask.

TODO

The GenomeMask should be able to handle double strand DNA (as an option)

METHODS

new

There is mutiple ways to create a genome mask:

One can specify a argument called genome that is a hashref where keys are chromosome names and values are chromosomes length.

my $genome_mask = CracTools::GenomeMask->new( genome => { seq_name => length,
                                                          seq_name => length,
                                                          ...} );
One can specify a argument called C<crac_index_conf> that the configuration file of a CRAC index

my $genome_mask = CracTools::GenomeMask->new(crac_index_conf => file.conf);

One can specify a CracTools::SAMReader object in order to read chromosomes names and lenght from the header

my $genome_mask = CracTools::GenomeMask->new(sam_reader => CracTools::SAMReader->new(file.sam));

getBitvector

Arg [1] : String - Chromosome

Description : Return the CracTools::BitVector associated with the reference name given in argument.
              If no bitvectors exists for this reference, a warning will be reported.
ReturnType  : CracTools::BitVector

getChrLength

Arg [1] : String - Chromosome

Description : Return the length of the chromosome
ReturnType  : Integer

setPos

Arg [1] : String - Chromosome
Arg [2] : Integer - Position

Description : Set the bit a this genome location

setRegion

Arg [1] : String - Chromosome
Arg [2] : Integer - Position start
Arg [3] : Integer - Position end

Example     ; $genome_mask->setRegion($chr,$start,$end)
Description : Set all bits to 1 for this region

getPos

Arg [1] : String - Chromosome
Arg [2] : Integer - Position

Description : Return true is the bit is set at this genomic location
ReturnType  : Boolean

getPosSetInRegion

Arg [1] : String - Chromosome
Arg [2] : Integer - Position start
Arg [3] : Integer - Position end

Example     : my @nb_pos_set = @{$genome_mask->getNbBitsSetInRegion($chr,$start,$end)};
Description : Return all the posititions of the bits set in this genomic
              region
ReturnType  : Array(Integer)

getNbBitsSetInRegion

Arg [1] : String - Chromosome
Arg [2] : Integer - Position start
Arg [3] : Integer - Position end

Description : Return the number of bits set in this genomic region
ReturnType  : Integer

rank

Arg [1] : String - Chromosome
Arg [2] : Integer - Position

Description : Return the number of bits set, up to this genomic
              position as if the genome was linear.
ReturnType  : Integer

select

Arg [1] : Integer - Nth bit set

my ($chr,$pos) = $genome_mask->select(12)
Description : Return an array with the (chr,pos) of the Nth bit set
ReturnType  : Array(String,Integer)

AUTHORS

  • Nicolas PHILIPPE <nphilippe.research@gmail.com>

  • Jérôme AUDOUX <jaudoux@cpan.org>

  • Sacha BEAUMEUNIER <sacha.beaumeunier@gmail.com>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2017 by IRMB/INSERM (Institute for Regenerative Medecine and Biotherapy / Institut National de la Santé et de la Recherche Médicale) and AxLR/SATT (Lanquedoc Roussilon / Societe d'Acceleration de Transfert de Technologie).

This is free software, licensed under:

The GNU Affero General Public License, Version 3, November 2007