The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

LICENSE

Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

CONTACT

  Please email comments or questions to the public Ensembl
  developers list at <http://lists.ensembl.org/mailman/listinfo/dev>.

  Questions may also be sent to the Ensembl help desk at
  <http://www.ensembl.org/Help/Contact>.

NAME

Bio::EnsEMBL::AssemblyMapper - Handles mapping between two coordinate systems using the information stored in the assembly table.

SYNOPSIS

    $db   = Bio::EnsEMBL::DBSQL::DBAdaptor->new(...);
    $asma = $db->get_AssemblyMapperAdaptor();
    $csa  = $db->get_CoordSystemAdaptor();

    my $chr_cs = $cs_adaptor->fetch_by_name( 'chromosome', 'NCBI33' );
    my $ctg_cs = $cs_adaptor->fetch_by_name('contig');

    $asm_mapper = $map_adaptor->fetch_by_CoordSystems( $cs1, $cs2 );

    # Map to contig coordinate system from chromosomal.
    @ctg_coords =
      $asm_mapper->map( 'X', 1_000_000, 2_000_000, 1, $chr_cs );

    # Map to chromosome coordinate system from contig.
    @chr_coords =
      $asm_mapper->map( 'AL30421.1.200.92341', 100, 10000, -1,
      $ctg_cs );

    # List contig names for a region of chromsome.
    @ctg_ids = $asm_mapper->list_ids( '13', 1_000_000, 1, $chr_cs );

    # List chromosome names for a contig region.
    @chr_ids =
      $asm_mapper->list_ids( 'AL30421.1.200.92341', 1, 1000, -1,
      $ctg_cs );

DESCRIPTION

The AssemblyMapper is a database aware mapper which faciliates conversion of coordinates between any two coordinate systems with an relationship explicitly defined in the assembly table. In the future it may be possible to perform multiple step (implicit) mapping between coordinate systems.

It is implemented using the Bio::EnsEMBL::Mapper object, which is a generic mapper object between disjoint coordinate systems.

METHODS

new

  Arg [1]    : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
  Arg [2]    : Bio::EnsEMBL::CoordSystem $asm_cs
  Arg [3]    : Bio::EnsEMBL::CoordSystem $cmp_cs
  Example    : Should use AssemblyMapperAdaptor->fetch_by_CoordSystems()
  Description: Creates a new AssemblyMapper
  Returntype : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
  Exceptions : Throws if multiple coord_systems are provided
  Caller     : AssemblyMapperAdaptor
  Status     : Stable

max_pair_count

  Arg [1]    : (optional) int $max_pair_count
  Example    : $mapper->max_pair_count(100000)
  Description: Getter/Setter for the number of mapping pairs allowed
               in the internal cache.  This can be used to override
               the default value (1000) to tune the performance and
               memory usage for certain scenarios.  Higher value
               means bigger cache, more memory used.
  Return type: int
  Exceptions : None
  Caller     : General
  Status     : Stable

register_all

  Arg [1]    : None
  Example    : $mapper->max_pair_count(10e6);
               $mapper->register_all();
  Description: Pre-registers all assembly information in this
               mapper.  The cache size should be set to a
               sufficiently large value so that all of the
               information can be stored.  This method is useful
               when *a lot* of mapping will be done in regions
               which are distributed around the genome.  After
               registration the mapper will consume a lot of memory
               but will not have to perform any SQL and will be
               faster.
  Return type: None
  Exceptions : None
  Caller     : Specialised programs doing a lot of mapping.
  Status     : Stable

map

  Arg [1]    : string $frm_seq_region
               The name of the sequence region to transform FROM.
  Arg [2]    : int $frm_start
               The start of the region to transform FROM.
  Arg [3]    : int $frm_end
               The end of the region to transform FROM.
  Arg [4]    : int $strand
               The strand of the region to transform FROM.
  Arg [5]    : Bio::EnsEMBL::CoordSystem
               The coordinate system to transform FROM
  Arg [6]    : Dummy placeholder to keep the interface consistent
               across different mappers
  Arg [7]    : Bio::EnsEMBL::Slice
               Target slice
  Arg [8]    : (optional) boolean
               Whether to include the original coordinates or not
  Example    : @coords =
                $asm_mapper->map( 'X', 1_000_000, 2_000_000, 1,
                                  $chr_cs );
  Description: Transforms coordinates from one coordinate system to
               another.
  Return type: List of Bio::EnsEMBL::Mapper::Coordinate and/or
               Bio::EnsEMBL::Mapper:Gap objects.
  Exceptions : Throws if if the specified TO coordinat system is not
               one of the coordinate systems associated with this
               assembly mapper.
  Caller     : General
  Status     : Stable

flush

  Args       : None
  Example    : None
  Description: Remove all cached items from this AssemblyMapper.
  Return type: None
  Exceptions : None
  Caller     : AssemblyMapperAdaptor
  Status     : Stable

size

  Args       : None
  Example    : $num_of_pairs = $mapper->size();
  Description: Returns the number of pairs currently stored.
  Return type: int
  Exceptions : None
  Caller     : General
  Status     : Stable

fastmap

  Arg [1]    : string $frm_seq_region
               The name of the sequence region to transform FROM.
  Arg [2]    : int $frm_start
               The start of the region to transform FROM.
  Arg [3]    : int $frm_end
               The end of the region to transform FROM.
  Arg [4]    : int $strand
               The strand of the region to transform FROM.
  Arg [5]    : Bio::EnsEMBL::CoordSystem
               The coordinate system to transform FROM.
  Example    : @coords =
                $asm_mapper->map( 'X', 1_000_000, 2_000_000, 1,
                                  $chr_cs );
  Description: Transforms coordinates from one coordinate system to
               another.
  Return type: List of Bio::EnsEMBL::Mapper::Coordinate and/or
               Bio::EnsEMBL::Mapper:Gap objects.
  Exceptions : Throws if the specified TO coordinat system is not
               one of the coordinate systems associated with this
               assembly mapper.
  Caller     : General
  Status     : Stable

list_ids

  Arg [1]    : string $frm_seq_region
               The name of the sequence region of interest.
  Arg [2]    : int $frm_start
               The start of the region of interest.
  Arg [3]    : int $frm_end
               The end of the region to transform of interest.
  Arg [5]    : Bio::EnsEMBL::CoordSystem $frm_cs
               The coordinate system to obtain overlapping IDs of.
  Example    : foreach my $id (
                        $asm_mapper->list_ids( 'X', 1, 1000, $ctg_cs ) )
                { ... }
  Description: Retrieves a list of overlapping seq_region names of
               another coordinate system.  This is the same as the
               list_ids method but uses seq_region names rather
               internal IDs.
  Return type: List of strings.
  Exceptions : None
  Caller     : General
  Status     : Stable

list_seq_regions

  Arg [1]    : string $frm_seq_region
               The name of the sequence region of interest.
  Arg [2]    : int $frm_start
               The start of the region of interest.
  Arg [3]    : int $frm_end
               The end of the region to transform of interest.
  Arg [5]    : Bio::EnsEMBL::CoordSystem $frm_cs
               The coordinate system to obtain overlapping IDs of.
  Example    : foreach my $id (
                                 $asm_mapper->list_seq_regions(
                                                   'X', 1, 1000, $chr_cs
                                 ) ) { ... }
  Description: Retrieves a list of overlapping seq_region internal
               identifiers of another coordinate system.  This is
               the same as the list_seq_regions method but uses
               internal identfiers rather than seq_region strings.
  Return type: List of ints.
  Exceptions : None
  Caller     : General
  Status     : Stable

have_registered_component

  Arg [1]    : string $cmp_seq_region
               The name of the sequence region to check for
               registration.
  Example    : if ( $asm_mapper->have_registered_component('AL240214.1') ) {}
  Description: Returns true if a given component region has
               been registered with this assembly mapper.  This
               should only be called by this class or the
               AssemblyMapperAdaptor.  In other words, do not use
               this method unless you really know what you are
               doing.
  Return type: Boolean (0 or 1)
  Exceptions : Throws on incorrect arguments.
  Caller     : Internal, AssemblyMapperAdaptor
  Status     : Stable

have_registered_assembled

  Arg [1]    : string $asm_seq_region
               The name of the sequence region to check for
               registration.
  Arg [2]    : int $chunk_id
               The chunk number of the provided seq_region to check
               for registration.
  Example    : if ( $asm_mapper->have_registered_component( 'X', 9 ) ) { }
  Description: Returns true if a given assembled region chunk
               has been registered with this assembly mapper.
               This should only be called by this class or the
               AssemblyMapperAdaptor.  In other words, do not use
               this method unless you really know what you are
               doing.
  Return type: Boolean (0 or 1)
  Exceptions : Throws on incorrect arguments
  Caller     : Internal, AssemblyMapperAdaptor
  Status     : Stable

register_component

  Arg [1]    : integer $cmp_seq_region
               The dbID of the component sequence region to
               register.
  Example    : $asm_mapper->register_component('AL312341.1');
  Description: Flags a given component sequence region as registered
               in this assembly mapper.  This should only be called
               by this class or the AssemblyMapperAdaptor.
  Return type: None
  Exceptions : Throws on incorrect arguments
  Caller     : Internal, AssemblyMapperAdaptor
  Status     : Stable

register_assembled

  Arg [1]    : integer $asm_seq_region
               The dbID of the sequence region to register.
  Arg [2]    : int $chunk_id
               The chunk number of the provided seq_region to register.
  Example    : $asm_mapper->register_assembled( 'X', 4 );
  Description: Flags a given assembled region as registered in this
               assembly mapper.  This should only be called by this
               class or the AssemblyMapperAdaptor.  Do not call this
               method unless you really know what you are doing.
  Return type: None
  Exceptions : Throws on incorrect arguments
  Caller     : Internal, AssemblyMapperAdaptor
  Status     : Stable

mapper

  Arg [1]    : None
  Example    : $mapper = $asm_mapper->mapper();
  Description: Retrieves the internal mapper used by this Assembly
               Mapper.  This is unlikely to be useful unless you
               _really_ know what you are doing.
  Return type: Bio::EnsEMBL::Mapper
  Exceptions : None
  Caller     : Internal, AssemblyMapperAdaptor
  Status     : Stable

assembled_CoordSystem

  Arg [1]    : None
  Example    : $cs = $asm_mapper->assembled_CoordSystem();
  Description: Retrieves the assembled CoordSystem from this
               assembly mapper.
  Return type: Bio::EnsEMBL::CoordSystem
  Exceptions : None
  Caller     : Internal, AssemblyMapperAdaptor
  Status     : Stable

component_CoordSystem

  Arg [1]    : None
  Example    : $cs = $asm_mapper->component_CoordSystem();
  Description: Retrieves the component CoordSystem from this
               assembly mapper.
  Return type: Bio::EnsEMBL::CoordSystem
  Exceptions : None
  Caller     : Internal, AssemblyMapperAdaptor
  Status     : Stable

adaptor

  Arg [1]    : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor $adaptor
  Description: Getter/set terfor this object's database adaptor.
  Returntype : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
  Exceptions : None
  Caller     : General
  Status     : Stable