LICENSE

Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

CONTACT

Please email comments or questions to the public Ensembl
developers list at <http://lists.ensembl.org/mailman/listinfo/dev>.

Questions may also be sent to the Ensembl help desk at
<http://www.ensembl.org/Help/Contact>.

NAME

Bio::EnsEMBL::AssemblyMapper - Handles mapping between two coordinate systems using the information stored in the assembly table.

SYNOPSIS

$db   = Bio::EnsEMBL::DBSQL::DBAdaptor->new(...);
$asma = $db->get_AssemblyMapperAdaptor();
$csa  = $db->get_CoordSystemAdaptor();

my $chr_cs = $cs_adaptor->fetch_by_name( 'chromosome', 'NCBI33' );
my $ctg_cs = $cs_adaptor->fetch_by_name('contig');

$asm_mapper = $map_adaptor->fetch_by_CoordSystems( $cs1, $cs2 );

# Map to contig coordinate system from chromosomal.
@ctg_coords =
  $asm_mapper->map( 'X', 1_000_000, 2_000_000, 1, $chr_cs );

# Map to chromosome coordinate system from contig.
@chr_coords =
  $asm_mapper->map( 'AL30421.1.200.92341', 100, 10000, -1,
  $ctg_cs );

# List contig names for a region of chromsome.
@ctg_ids = $asm_mapper->list_ids( '13', 1_000_000, 1, $chr_cs );

# List chromosome names for a contig region.
@chr_ids =
  $asm_mapper->list_ids( 'AL30421.1.200.92341', 1, 1000, -1,
  $ctg_cs );

DESCRIPTION

The AssemblyMapper is a database aware mapper which faciliates conversion of coordinates between any two coordinate systems with an relationship explicitly defined in the assembly table. In the future it may be possible to perform multiple step (implicit) mapping between coordinate systems.

It is implemented using the Bio::EnsEMBL::Mapper object, which is a generic mapper object between disjoint coordinate systems.

METHODS

new

Arg [1]    : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
Arg [2]    : Bio::EnsEMBL::CoordSystem $asm_cs
Arg [3]    : Bio::EnsEMBL::CoordSystem $cmp_cs
Example    : Should use AssemblyMapperAdaptor->fetch_by_CoordSystems()
Description: Creates a new AssemblyMapper
Returntype : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
Exceptions : Throws if multiple coord_systems are provided
Caller     : AssemblyMapperAdaptor
Status     : Stable

max_pair_count

Arg [1]    : (optional) int $max_pair_count
Example    : $mapper->max_pair_count(100000)
Description: Getter/Setter for the number of mapping pairs allowed
             in the internal cache.  This can be used to override
             the default value (1000) to tune the performance and
             memory usage for certain scenarios.  Higher value
             means bigger cache, more memory used.
Return type: int
Exceptions : None
Caller     : General
Status     : Stable

register_all

Arg [1]    : None
Example    : $mapper->max_pair_count(10e6);
             $mapper->register_all();
Description: Pre-registers all assembly information in this
             mapper.  The cache size should be set to a
             sufficiently large value so that all of the
             information can be stored.  This method is useful
             when *a lot* of mapping will be done in regions
             which are distributed around the genome.  After
             registration the mapper will consume a lot of memory
             but will not have to perform any SQL and will be
             faster.
Return type: None
Exceptions : None
Caller     : Specialised programs doing a lot of mapping.
Status     : Stable

map

Arg [1]    : string $frm_seq_region
             The name of the sequence region to transform FROM.
Arg [2]    : int $frm_start
             The start of the region to transform FROM.
Arg [3]    : int $frm_end
             The end of the region to transform FROM.
Arg [4]    : int $strand
             The strand of the region to transform FROM.
Arg [5]    : Bio::EnsEMBL::CoordSystem
             The coordinate system to transform FROM
Arg [6]    : Dummy placeholder to keep the interface consistent
             across different mappers
Arg [7]    : Bio::EnsEMBL::Slice
             Target slice
Arg [8]    : (optional) boolean
             Whether to include the original coordinates or not
Example    : @coords =
              $asm_mapper->map( 'X', 1_000_000, 2_000_000, 1,
                                $chr_cs );
Description: Transforms coordinates from one coordinate system to
             another.
Return type: List of Bio::EnsEMBL::Mapper::Coordinate and/or
             Bio::EnsEMBL::Mapper:Gap objects.
Exceptions : Throws if if the specified TO coordinat system is not
             one of the coordinate systems associated with this
             assembly mapper.
Caller     : General
Status     : Stable

flush

Args       : None
Example    : None
Description: Remove all cached items from this AssemblyMapper.
Return type: None
Exceptions : None
Caller     : AssemblyMapperAdaptor
Status     : Stable

size

Args       : None
Example    : $num_of_pairs = $mapper->size();
Description: Returns the number of pairs currently stored.
Return type: int
Exceptions : None
Caller     : General
Status     : Stable

fastmap

Arg [1]    : string $frm_seq_region
             The name of the sequence region to transform FROM.
Arg [2]    : int $frm_start
             The start of the region to transform FROM.
Arg [3]    : int $frm_end
             The end of the region to transform FROM.
Arg [4]    : int $strand
             The strand of the region to transform FROM.
Arg [5]    : Bio::EnsEMBL::CoordSystem
             The coordinate system to transform FROM.
Example    : @coords =
              $asm_mapper->map( 'X', 1_000_000, 2_000_000, 1,
                                $chr_cs );
Description: Transforms coordinates from one coordinate system to
             another.
Return type: List of Bio::EnsEMBL::Mapper::Coordinate and/or
             Bio::EnsEMBL::Mapper:Gap objects.
Exceptions : Throws if the specified TO coordinat system is not
             one of the coordinate systems associated with this
             assembly mapper.
Caller     : General
Status     : Stable

list_ids

Arg [1]    : string $frm_seq_region
             The name of the sequence region of interest.
Arg [2]    : int $frm_start
             The start of the region of interest.
Arg [3]    : int $frm_end
             The end of the region to transform of interest.
Arg [5]    : Bio::EnsEMBL::CoordSystem $frm_cs
             The coordinate system to obtain overlapping IDs of.
Example    : foreach my $id (
                      $asm_mapper->list_ids( 'X', 1, 1000, $ctg_cs ) )
              { ... }
Description: Retrieves a list of overlapping seq_region names of
             another coordinate system.  This is the same as the
             list_ids method but uses seq_region names rather
             internal IDs.
Return type: List of strings.
Exceptions : None
Caller     : General
Status     : Stable

list_seq_regions

Arg [1]    : string $frm_seq_region
             The name of the sequence region of interest.
Arg [2]    : int $frm_start
             The start of the region of interest.
Arg [3]    : int $frm_end
             The end of the region to transform of interest.
Arg [5]    : Bio::EnsEMBL::CoordSystem $frm_cs
             The coordinate system to obtain overlapping IDs of.
Example    : foreach my $id (
                               $asm_mapper->list_seq_regions(
                                                 'X', 1, 1000, $chr_cs
                               ) ) { ... }
Description: Retrieves a list of overlapping seq_region internal
             identifiers of another coordinate system.  This is
             the same as the list_seq_regions method but uses
             internal identfiers rather than seq_region strings.
Return type: List of ints.
Exceptions : None
Caller     : General
Status     : Stable

have_registered_component

Arg [1]    : string $cmp_seq_region
             The name of the sequence region to check for
             registration.
Example    : if ( $asm_mapper->have_registered_component('AL240214.1') ) {}
Description: Returns true if a given component region has
             been registered with this assembly mapper.  This
             should only be called by this class or the
             AssemblyMapperAdaptor.  In other words, do not use
             this method unless you really know what you are
             doing.
Return type: Boolean (0 or 1)
Exceptions : Throws on incorrect arguments.
Caller     : Internal, AssemblyMapperAdaptor
Status     : Stable

have_registered_assembled

Arg [1]    : string $asm_seq_region
             The name of the sequence region to check for
             registration.
Arg [2]    : int $chunk_id
             The chunk number of the provided seq_region to check
             for registration.
Example    : if ( $asm_mapper->have_registered_component( 'X', 9 ) ) { }
Description: Returns true if a given assembled region chunk
             has been registered with this assembly mapper.
             This should only be called by this class or the
             AssemblyMapperAdaptor.  In other words, do not use
             this method unless you really know what you are
             doing.
Return type: Boolean (0 or 1)
Exceptions : Throws on incorrect arguments
Caller     : Internal, AssemblyMapperAdaptor
Status     : Stable

register_component

Arg [1]    : integer $cmp_seq_region
             The dbID of the component sequence region to
             register.
Example    : $asm_mapper->register_component('AL312341.1');
Description: Flags a given component sequence region as registered
             in this assembly mapper.  This should only be called
             by this class or the AssemblyMapperAdaptor.
Return type: None
Exceptions : Throws on incorrect arguments
Caller     : Internal, AssemblyMapperAdaptor
Status     : Stable

register_assembled

Arg [1]    : integer $asm_seq_region
             The dbID of the sequence region to register.
Arg [2]    : int $chunk_id
             The chunk number of the provided seq_region to register.
Example    : $asm_mapper->register_assembled( 'X', 4 );
Description: Flags a given assembled region as registered in this
             assembly mapper.  This should only be called by this
             class or the AssemblyMapperAdaptor.  Do not call this
             method unless you really know what you are doing.
Return type: None
Exceptions : Throws on incorrect arguments
Caller     : Internal, AssemblyMapperAdaptor
Status     : Stable

mapper

Arg [1]    : None
Example    : $mapper = $asm_mapper->mapper();
Description: Retrieves the internal mapper used by this Assembly
             Mapper.  This is unlikely to be useful unless you
             _really_ know what you are doing.
Return type: Bio::EnsEMBL::Mapper
Exceptions : None
Caller     : Internal, AssemblyMapperAdaptor
Status     : Stable

assembled_CoordSystem

Arg [1]    : None
Example    : $cs = $asm_mapper->assembled_CoordSystem();
Description: Retrieves the assembled CoordSystem from this
             assembly mapper.
Return type: Bio::EnsEMBL::CoordSystem
Exceptions : None
Caller     : Internal, AssemblyMapperAdaptor
Status     : Stable

component_CoordSystem

Arg [1]    : None
Example    : $cs = $asm_mapper->component_CoordSystem();
Description: Retrieves the component CoordSystem from this
             assembly mapper.
Return type: Bio::EnsEMBL::CoordSystem
Exceptions : None
Caller     : Internal, AssemblyMapperAdaptor
Status     : Stable

adaptor

Arg [1]    : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor $adaptor
Description: Getter/set terfor this object's database adaptor.
Returntype : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
Exceptions : None
Caller     : General
Status     : Stable