LICENSE

Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

CONTACT

Please email comments or questions to the public Ensembl
developers list at <http://lists.ensembl.org/mailman/listinfo/dev>.

Questions may also be sent to the Ensembl help desk at
<http://www.ensembl.org/Help/Contact>.

NAME

Bio::EnsEMBL::Mapper

SYNOPSIS

$map = Bio::EnsEMBL::Mapper->new( 'rawcontig', 'chromosome' );

# add a coodinate mapping - supply two pairs or coordinates
$map->add_map_coordinates(
  $contig_id, $contig_start, $contig_end, $contig_ori,
  $chr_name,  chr_start,     $chr_end
);

# map from one coordinate system to another
my @coordlist =
  $mapper->map_coordinates( 627012, 2, 5, -1, "rawcontig" );

DESCRIPTION

Generic mapper to provide coordinate transforms between two disjoint coordinate systems. This mapper is intended to be 'context neutral' - in that it does not contain any code relating to any particular coordinate system. This is provided in, for example, Bio::EnsEMBL::AssemblyMapper.

Mappings consist of pairs of 'to-' and 'from-' contigs with coordinates on each. Orientation is abbreviated to 'ori',

The contig pair hash is divided into mappings per seq_region, the code below makes assumptions about how to filter these results, thus the comparisons for some properties are absent in the code but implicit by data structure.

The assembly mapping hash '_pair_last' orders itself by the target seq region and looks like this:

1 => ARRAY(0x1024c79c0)
   0  Bio::EnsEMBL::Mapper::Pair=HASH(0x1024d6198)
      'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025edf98)
         'end' => 4
         'id' => 4
         'start' => 1
      'ori' => 1
      'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025edf68)
         'end' => 4
         'id' => 1
         'start' => 1
   1  Bio::EnsEMBL::Mapper::Pair=HASH(0x1026c20f0)
      'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee3a0)
         'end' => 12
         'id' => 4
         'start' => 9
      'ori' => 1
      'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee370)
         'end' => 4
         'id' => 1
         'start' => 1
2 => ARRAY(0x1025ee460)
   0  Bio::EnsEMBL::Mapper::Pair=HASH(0x1025ee400)
      'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee2c8)
         'end' => 8
         'id' => 4
         'start' => 5
      'ori' => 1
      'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee2b0)
         'end' => 4
         'id' => 2
         'start' => 1
   1  Bio::EnsEMBL::Mapper::Pair=HASH(0x1025ee658)
      'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025eea48)
         'end' => 16
         'id' => 4
         'start' => 13
      'ori' => 1
      'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025eea18)
         'end' => 4
         'id' => 2
         'start' => 1

The other mapping hash available is the reverse sense, putting the 'from' seq_region as the sorting key. Here is an excerpt.

0 HASH(0x102690bb8) 4 => ARRAY(0x1025ee028) 0 Bio::EnsEMBL::Mapper::Pair=HASH(0x1024d6198) 'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025edf98) 'end' => 4 'id' => 4 'start' => 1 'ori' => 1 'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025edf68) 'end' => 4 'id' => 1 'start' => 1 1 Bio::EnsEMBL::Mapper::Pair=HASH(0x1025ee400) 'from' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee2c8) 'end' => 8 'id' => 4 'start' => 5 'ori' => 1 'to' => Bio::EnsEMBL::Mapper::Unit=HASH(0x1025ee2b0) 'end' => 4 'id' => 2 'start' => 1

METHODS

new

Arg [1]    : string $from
             The name of the 'from' coordinate system
Arg [2]    : string $to
             The name of the 'to' coordinate system
Arg [3]    : (optional) Bio::EnsEMBL::CoordSystem $from_cs
             The 'from' coordinate system
Arg [4]    : (optional) Bio::EnsEMBL::CoordSystem $to_cs
Example    : my $mapper = Bio::EnsEMBL::Mapper->new('FROM', 'TO');
Description: Constructor.  Creates a new Bio::EnsEMBL::Mapper object.
Returntype : Bio::EnsEMBL::Mapper
Exceptions : none
Caller     : general

flush

Args       : none
Example    : none
Description: removes all cached information out of this mapper
Returntype : none
Exceptions : none
Caller     : AssemblyMapper, ChainedAssemblyMapper

map_coordinates

Arg  1      string $id
            id of 'source' sequence
Arg  2      int $start
            start coordinate of 'source' sequence
Arg  3      int $end
            end coordinate of 'source' sequence
Arg  4      int $strand
            raw contig orientation (+/- 1)
Arg  5      string $type
            nature of transform - gives the type of
            coordinates to be transformed *from*
Arg  6      boolean (0 or 1) $include_original_region
            option to include original input coordinate region mappings in the result
Arg  7      int  $cdna_coding_start
            cdna coding start  
Function    generic map method
Returntype  if $include_original_region == 0
              array of mappped Bio::EnsEMBL::Mapper::Coordinate
              and/or   Bio::EnsEMBL::Mapper::Gap
            if $include_original_region == 1
              hash of mapped and original Bio::EnsEMBL::Mapper::Coordinate
              and/or   Bio::EnsEMBL::Mapper::Gap
Exceptions  none
Caller      Bio::EnsEMBL::Mapper

map_insert

Arg [1]    : string $id
Arg [2]    : int $start - start coord. Since this is an insert should always
             be one greater than end.
Arg [3]    : int $end - end coord. Since this is an insert should always
             be one less than start.
Arg [4]    : int $strand (0, 1, -1)
Arg [5]    : string $type - the coordinate system name the coords are from.
Arg [6]    : boolean $fastmap - if specified, this is being called from
             the fastmap call. The mapping done is not any faster for
             inserts, but the return value is different.
Example    : 
Description: This is in internal function which handles the special mapping
             case for inserts (start = end +1).  This function will be called
             automatically by the map function so there is no reason to
             call it directly.
Returntype : list of Bio::EnsEMBL::Mapper::Coordinate and/or Gap objects
Exceptions : none
Caller     : map_coordinates()

fastmap

Arg  1      string $id
            id of 'source' sequence
Arg  2      int $start
            start coordinate of 'source' sequence
Arg  3      int $end
            end coordinate of 'source' sequence
Arg  4      int $strand
            raw contig orientation (+/- 1)
Arg  5      int $type
            nature of transform - gives the type of
            coordinates to be transformed *from*
Function    inferior map method. Will only do ungapped unsplit mapping.
            Will return id, start, end strand in a list.
Returntype  list of results
Exceptions  none
Caller      Bio::EnsEMBL::AssemblyMapper

add_map_coordinates

Arg  1      int $id
            id of 'source' sequence
Arg  2      int $start
            start coordinate of 'source' sequence
Arg  3      int $end
            end coordinate of 'source' sequence
Arg  4      int $strand
            relative orientation of source and target (+/- 1)
Arg  5      int $id
            id of 'target' sequence
Arg  6      int $start
            start coordinate of 'target' sequence
Arg  7      int $end
            end coordinate of 'target' sequence
Function    Stores details of mapping between
            'source' and 'target' regions.
Returntype  none
Exceptions  none
Caller      Bio::EnsEMBL::Mapper

add_indel_coordinates

Arg  1      int $id
            id of 'source' sequence
Arg  2      int $start
            start coordinate of 'source' sequence
Arg  3      int $end
            end coordinate of 'source' sequence
Arg  4      int $strand
            relative orientation of source and target (+/- 1)
Arg  5      int $id
            id of 'targe' sequence
Arg  6      int $start
            start coordinate of 'targe' sequence
Arg  7      int $end
            end coordinate of 'targe' sequence
Function    stores details of mapping between two regions:
            'source' and 'target'. Returns 1 if the pair was added, 0 if it
            was already in. Used when adding an indel
Returntype  int 0,1
Exceptions  none
Caller      Bio::EnsEMBL::Mapper

map_indel

Arg [1]    : string $id
Arg [2]    : int $start - start coord. Since this is an indel should always
             be one greater than end.
Arg [3]    : int $end - end coord. Since this is an indel should always
             be one less than start.
Arg [4]    : int $strand (0, 1, -1)
Arg [5]    : string $type - the coordinate system name the coords are from.
Example    : @coords = $mapper->map_indel();
Description: This is in internal function which handles the special mapping
             case for indels (start = end +1). It will be used to map from
             a coordinate system with a gap to another that contains an
             insertion. It will be mainly used by the Variation API.
Returntype : Bio::EnsEMBL::Mapper::Unit objects
Exceptions : none
Caller     : general

add_Mapper

Arg  1      Bio::EnsEMBL::Mapper $mapper2
Example     $mapper->add_Mapper($mapper2)
Function    add all the map coordinates from $mapper to this mapper.
            This object will contain mapping pairs from both the old
            object and $mapper2.
Returntype  int 0,1
Exceptions  throw if 'to' and 'from' from both Bio::EnsEMBL::Mappers
            are incompatible
Caller      $mapper->methodname()

list_pairs

Arg  1      int $id
            id of 'source' sequence
Arg  2      int $start
            start coordinate of 'source' sequence
Arg  3      int $end
            end coordinate of 'source' sequence
Arg  4      string $type
            nature of transform - gives the type of
            coordinates to be transformed *from*
Function    list all pairs of mappings in a region
Returntype  list of Bio::EnsEMBL::Mapper::Pair
Exceptions  none
Caller      Bio::EnsEMBL::Mapper

to

Arg  1      Bio::EnsEMBL::Mapper::Unit $id
            id of 'source' sequence
Function    accessor method form the 'source'
            and 'target' in a Mapper::Pair
Returntype  Bio::EnsEMBL::Mapper::Unit
Exceptions  none
Caller      Bio::EnsEMBL::Mapper

from

Arg  1      Bio::EnsEMBL::Mapper::Unit $id
            id of 'source' sequence
Function    accessor method form the 'source'
            and 'target' in a Mapper::Pair
Returntype  Bio::EnsEMBL::Mapper::Unit
Exceptions  none
Caller      Bio::EnsEMBL::Mapper