The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

LICENSE

Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

CONTACT

  Please email comments or questions to the public Ensembl
  developers list at <http://lists.ensembl.org/mailman/listinfo/dev>.

  Questions may also be sent to the Ensembl help desk at
  <http://www.ensembl.org/Help/Contact>.

NAME

Bio::EnsEMBL::Mapper

SYNOPSIS

  $map = Bio::EnsEMBL::Mapper->new( 'rawcontig', 'chromosome' );

  # add a coodinate mapping - supply two pairs or coordinates
  $map->add_map_coordinates(
    $contig_id, $contig_start, $contig_end, $contig_ori,
    $chr_name,  chr_start,     $chr_end
  );

  # map from one coordinate system to another
  my @coordlist =
    $mapper->map_coordinates( 627012, 2, 5, -1, "rawcontig" );

DESCRIPTION

Generic mapper to provide coordinate transforms between two disjoint coordinate systems. This mapper is intended to be 'context neutral' - in that it does not contain any code relating to any particular coordinate system. This is provided in, for example, Bio::EnsEMBL::AssemblyMapper.

Mappings consist of pairs of 'to-' and 'from-' contigs with coordinates on each. Orientation is abbreviated to 'ori',

The contig pair hash is divided into mappings per seq_region, the code below makes assumptions about how to filter these results, thus the comparisons for some properties are absent in the code but implicit by data structure.

The assembly mapping hash '_pair_last' orders itself by the target seq region and looks like this:

   1 => ARRAY(0x1024c79c0)
      0  Bio::EnsEMBL::Mapper::Pair=HASH(0x1024d6198)
         'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025edf98)
            'end' => 4
            'id' => 4
            'start' => 1
         'ori' => 1
         'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025edf68)
            'end' => 4
            'id' => 1
            'start' => 1
      1  Bio::EnsEMBL::Mapper::Pair=HASH(0x1026c20f0)
         'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee3a0)
            'end' => 12
            'id' => 4
            'start' => 9
         'ori' => 1
         'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee370)
            'end' => 4
            'id' => 1
            'start' => 1
   2 => ARRAY(0x1025ee460)
      0  Bio::EnsEMBL::Mapper::Pair=HASH(0x1025ee400)
         'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee2c8)
            'end' => 8
            'id' => 4
            'start' => 5
         'ori' => 1
         'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee2b0)
            'end' => 4
            'id' => 2
            'start' => 1
      1  Bio::EnsEMBL::Mapper::Pair=HASH(0x1025ee658)
         'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025eea48)
            'end' => 16
            'id' => 4
            'start' => 13
         'ori' => 1
         'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025eea18)
            'end' => 4
            'id' => 2
            'start' => 1

The other mapping hash available is the reverse sense, putting the 'from' seq_region as the sorting key. Here is an excerpt.

0 HASH(0x102690bb8) 4 => ARRAY(0x1025ee028) 0 Bio::EnsEMBL::Mapper::Pair=HASH(0x1024d6198) 'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025edf98) 'end' => 4 'id' => 4 'start' => 1 'ori' => 1 'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025edf68) 'end' => 4 'id' => 1 'start' => 1 1 Bio::EnsEMBL::Mapper::Pair=HASH(0x1025ee400) 'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee2c8) 'end' => 8 'id' => 4 'start' => 5 'ori' => 1 'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee2b0) 'end' => 4 'id' => 2 'start' => 1

METHODS

new

  Arg [1]    : string $from
               The name of the 'from' coordinate system
  Arg [2]    : string $to
               The name of the 'to' coordinate system
  Arg [3]    : (optional) Bio::EnsEMBL::CoordSystem $from_cs
               The 'from' coordinate system
  Arg [4]    : (optional) Bio::EnsEMBL::CoordSystem $to_cs
  Example    : my $mapper = Bio::EnsEMBL::Mapper->new('FROM', 'TO');
  Description: Constructor.  Creates a new Bio::EnsEMBL::Mapper object.
  Returntype : Bio::EnsEMBL::Mapper
  Exceptions : none
  Caller     : general

flush

  Args       : none
  Example    : none
  Description: removes all cached information out of this mapper
  Returntype : none
  Exceptions : none
  Caller     : AssemblyMapper, ChainedAssemblyMapper

map_coordinates

    Arg  1      string $id
                id of 'source' sequence
    Arg  2      int $start
                start coordinate of 'source' sequence
    Arg  3      int $end
                end coordinate of 'source' sequence
    Arg  4      int $strand
                raw contig orientation (+/- 1)
    Arg  5      string $type
                nature of transform - gives the type of
                coordinates to be transformed *from*
    Arg  6      boolean (0 or 1) $include_original_region
                option to include original input coordinate region mappings in the result
    Arg  7      int  $cdna_coding_start
                cdna coding start  
    Function    generic map method
    Returntype  if $include_original_region == 0
                  array of mappped Bio::EnsEMBL::Mapper::Coordinate
                  and/or   Bio::EnsEMBL::Mapper::Gap
                if $include_original_region == 1
                  hash of mapped and original Bio::EnsEMBL::Mapper::Coordinate
                  and/or   Bio::EnsEMBL::Mapper::Gap
    Exceptions  none
    Caller      Bio::EnsEMBL::Mapper

map_insert

  Arg [1]    : string $id
  Arg [2]    : int $start - start coord. Since this is an insert should always
               be one greater than end.
  Arg [3]    : int $end - end coord. Since this is an insert should always
               be one less than start.
  Arg [4]    : int $strand (0, 1, -1)
  Arg [5]    : string $type - the coordinate system name the coords are from.
  Arg [6]    : boolean $fastmap - if specified, this is being called from
               the fastmap call. The mapping done is not any faster for
               inserts, but the return value is different.
  Example    : 
  Description: This is in internal function which handles the special mapping
               case for inserts (start = end +1).  This function will be called
               automatically by the map function so there is no reason to
               call it directly.
  Returntype : list of Bio::EnsEMBL::Mapper::Coordinate and/or Gap objects
  Exceptions : none
  Caller     : map_coordinates()

fastmap

    Arg  1      string $id
                id of 'source' sequence
    Arg  2      int $start
                start coordinate of 'source' sequence
    Arg  3      int $end
                end coordinate of 'source' sequence
    Arg  4      int $strand
                raw contig orientation (+/- 1)
    Arg  5      int $type
                nature of transform - gives the type of
                coordinates to be transformed *from*
    Function    inferior map method. Will only do ungapped unsplit mapping.
                Will return id, start, end strand in a list.
    Returntype  list of results
    Exceptions  none
    Caller      Bio::EnsEMBL::AssemblyMapper

add_map_coordinates

    Arg  1      int $id
                id of 'source' sequence
    Arg  2      int $start
                start coordinate of 'source' sequence
    Arg  3      int $end
                end coordinate of 'source' sequence
    Arg  4      int $strand
                relative orientation of source and target (+/- 1)
    Arg  5      int $id
                id of 'target' sequence
    Arg  6      int $start
                start coordinate of 'target' sequence
    Arg  7      int $end
                end coordinate of 'target' sequence
    Function    Stores details of mapping between
                'source' and 'target' regions.
    Returntype  none
    Exceptions  none
    Caller      Bio::EnsEMBL::Mapper

add_indel_coordinates

    Arg  1      int $id
                id of 'source' sequence
    Arg  2      int $start
                start coordinate of 'source' sequence
    Arg  3      int $end
                end coordinate of 'source' sequence
    Arg  4      int $strand
                relative orientation of source and target (+/- 1)
    Arg  5      int $id
                id of 'targe' sequence
    Arg  6      int $start
                start coordinate of 'targe' sequence
    Arg  7      int $end
                end coordinate of 'targe' sequence
    Function    stores details of mapping between two regions:
                'source' and 'target'. Returns 1 if the pair was added, 0 if it
                was already in. Used when adding an indel
    Returntype  int 0,1
    Exceptions  none
    Caller      Bio::EnsEMBL::Mapper

map_indel

  Arg [1]    : string $id
  Arg [2]    : int $start - start coord. Since this is an indel should always
               be one greater than end.
  Arg [3]    : int $end - end coord. Since this is an indel should always
               be one less than start.
  Arg [4]    : int $strand (0, 1, -1)
  Arg [5]    : string $type - the coordinate system name the coords are from.
  Example    : @coords = $mapper->map_indel();
  Description: This is in internal function which handles the special mapping
               case for indels (start = end +1). It will be used to map from
               a coordinate system with a gap to another that contains an
               insertion. It will be mainly used by the Variation API.
  Returntype : Bio::EnsEMBL::Mapper::Unit objects
  Exceptions : none
  Caller     : general

add_Mapper

    Arg  1      Bio::EnsEMBL::Mapper $mapper2
    Example     $mapper->add_Mapper($mapper2)
    Function    add all the map coordinates from $mapper to this mapper.
                This object will contain mapping pairs from both the old
                object and $mapper2.
    Returntype  int 0,1
    Exceptions  throw if 'to' and 'from' from both Bio::EnsEMBL::Mappers
                are incompatible
    Caller      $mapper->methodname()

list_pairs

    Arg  1      int $id
                id of 'source' sequence
    Arg  2      int $start
                start coordinate of 'source' sequence
    Arg  3      int $end
                end coordinate of 'source' sequence
    Arg  4      string $type
                nature of transform - gives the type of
                coordinates to be transformed *from*
    Function    list all pairs of mappings in a region
    Returntype  list of Bio::EnsEMBL::Mapper::Pair
    Exceptions  none
    Caller      Bio::EnsEMBL::Mapper

to

    Arg  1      Bio::EnsEMBL::Mapper::Unit $id
                id of 'source' sequence
    Function    accessor method form the 'source'
                and 'target' in a Mapper::Pair
    Returntype  Bio::EnsEMBL::Mapper::Unit
    Exceptions  none
    Caller      Bio::EnsEMBL::Mapper

from

    Arg  1      Bio::EnsEMBL::Mapper::Unit $id
                id of 'source' sequence
    Function    accessor method form the 'source'
                and 'target' in a Mapper::Pair
    Returntype  Bio::EnsEMBL::Mapper::Unit
    Exceptions  none
    Caller      Bio::EnsEMBL::Mapper