LICENSE

Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

CONTACT

Please email comments or questions to the public Ensembl
developers list at <http://lists.ensembl.org/mailman/listinfo/dev>.

Questions may also be sent to the Ensembl help desk at
<http://www.ensembl.org/Help/Contact>.

NAME

Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor

SYNOPSIS

use Bio::EnsEMBL::Registry;

Bio::EnsEMBL::Registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

$asma = Bio::EnsEMBL::Registry->get_adaptor( "human", "core",
  "assemblymapper" );

$csa = Bio::EnsEMBL::Registry->get_adaptor( "human", "core",
  "coordsystem" );

my $chr33_cs = $csa->fetch_by_name( 'chromosome', 'NCBI33' );
my $chr34_cs = $csa->fetch_by_name( 'chromosome', 'NCBI34' );
my $ctg_cs   = $csa->fetch_by_name('contig');
my $clone_cs = $csa->fetch_by_name('clone');

my $chr_ctg_mapper =
  $asma->fetch_by_CoordSystems( $chr33_cs, $ctg_cs );

my $ncbi33_ncbi34_mapper =
  $asm_adptr->fetch_by_CoordSystems( $chr33, $chr34 );

my $ctg_clone_mapper =
  $asm_adptr->fetch_by_CoordSystems( $ctg_cs, $clone_cs );

DESCRIPTION

Adaptor for handling Assembly mappers. This is a Singleton class. ie: There is only one per database (DBAdaptor).

This is used to retrieve mappers between any two coordinate systems whose makeup is described by the assembly table. Currently one step (explicit) and two step (implicit) pairwise mapping is supported. In one-step mapping an explicit relationship between the coordinate systems is defined in the assembly table. In two-step 'chained' mapping no explicit mapping is present but the coordinate systems must share a common mapping to an intermediate coordinate system.

METHODS

new

Arg [1]    : Bio::EnsEMBL::DBAdaptor $dbadaptor the adaptor for
             the database this assembly mapper is using.
Example    : my $asma = new Bio::EnsEMBL::AssemblyMapperAdaptor($dbadaptor);
Description: Creates a new AssemblyMapperAdaptor object
Returntype : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
Exceptions : none
Caller     : Bio::EnsEMBL::DBSQL::DBAdaptor
Status     : Stable

cache_seq_ids_with_mult_assemblys

Example    : $self->adaptor->cache_seq_ids_with_mult_assemblys();
Description: Creates a hash of the component seq region ids that
             map to more than one assembly from the assembly table.
Retruntype : none
Exceptions : none
Caller     : AssemblyMapper, ChainedAssemblyMapper
Status     : At Risk

fetch_by_CoordSystems

Arg [1]    : Bio::EnsEMBL::CoordSystem $cs1
             One of the coordinate systems to retrieve the mapper
             between
Arg [2]    : Bio::EnsEMBL::CoordSystem $cs2
             The other coordinate system to map between
Description: Retrieves an Assembly mapper for two coordinate
             systems whose relationship is described in the
             assembly table.

             The ordering of the coodinate systems is arbitrary.
             The following two statements are equivalent:
             $mapper = $asma->fetch_by_CoordSystems($cs1,$cs2);
             $mapper = $asma->fetch_by_CoordSystems($cs2,$cs1);
Returntype : Bio::EnsEMBL::AssemblyMapper
Exceptions : wrong argument types
Caller     : general
Status     : Stable

register_assembled

Arg [1]    : Bio::EnsEMBL::AssemblyMapper $asm_mapper
             A valid AssemblyMapper object
Arg [2]    : integer $asm_seq_region
             The dbID of the seq_region to be registered
Arg [3]    : int $asm_start
             The start of the region to be registered
Arg [4]    : int $asm_end
             The end of the region to be registered
Description: Declares an assembled region to the AssemblyMapper.
             This extracts the relevant data from the assembly
             table and stores it in Mapper internal to the $asm_mapper.
             It therefore must be called before any mapping is
             attempted on that region. Otherwise only gaps will
             be returned.  Note that the AssemblyMapper automatically
             calls this method when the need arises.
Returntype : none
Exceptions : throw if the seq_region to be registered does not exist
             or if it associated with multiple assembled pieces (bad data
             in assembly table)
Caller     : Bio::EnsEMBL::AssemblyMapper
Status     : Stable

register_component

Arg [1]    : Bio::EnsEMBL::AssemblyMapper $asm_mapper
             A valid AssemblyMapper object
Arg [2]    : integer $cmp_seq_region
             The dbID of the seq_region to be registered
Description: Declares a component region to the AssemblyMapper.
             This extracts the relevant data from the assembly
             table and stores it in Mapper internal to the $asm_mapper.
             It therefore must be called before any mapping is
             attempted on that region. Otherwise only gaps will
             be returned.  Note that the AssemblyMapper automatically
             calls this method when the need arises.
Returntype : none
Exceptions : throw if the seq_region to be registered does not exist
             or if it associated with multiple assembled pieces (bad data
             in assembly table)
Caller     : Bio::EnsEMBL::AssemblyMapper
Status     : Stable

register_chained

Arg [1]    : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
             The chained assembly mapper to register regions on
Arg [2]    : string $from ('first' or 'last')
             The direction we are registering from, and the name of the
             internal mapper.
Arg [3]    : string $seq_region_name
             The name of the seqregion we are registering on
Arg [4]    : listref $ranges
             A list  of ranges to register (in [$start,$end] tuples).
Arg [5]    : (optional) $to_slice
             Only register those on this Slice.
Description: Registers a set of ranges on a chained assembly mapper.
             This function is at the heart of the chained mapping process.
             It retrieves information from the assembly table and
             dynamically constructs the mappings between two coordinate
             systems which are 2 mapping steps apart. It does this by using
             two internal mappers to load up a third mapper which is
             actually used by the ChainedAssemblyMapper to perform the
             mapping.

             This method must be called before any mapping is
             attempted on regions of interest, otherwise only gaps will
             be returned.  Note that the ChainedAssemblyMapper automatically
             calls this method when the need arises.
Returntype : none
Exceptions : throw if the seq_region to be registered does not exist
             or if it associated with multiple assembled pieces (bad data
             in assembly table)

             throw if the mapping between the coordinate systems cannot
             be performed in two steps, which means there is an internal
             error in the data in the meta table or in the code that creates
             the mapping paths.
Caller     : Bio::EnsEMBL::AssemblyMapper
Status     : Stable

_register_chained_special

Arg [1]    : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
             The chained assembly mapper to register regions on
Arg [2]    : string $from ('first' or 'last')
             The direction we are registering from, and the name of the
             internal mapper.
Arg [3]    : string $seq_region_name
             The name of the seqregion we are registering on
Arg [4]    : listref $ranges
             A list  of ranges to register (in [$start,$end] tuples).
Arg [5]    : (optional) $to_slice
             Only register those on this Slice.
Description: Registers a set of ranges on a chained assembly mapper.
             This function is at the heart of the chained mapping process.
             It retrieves information from the assembly table and
             dynamically constructs the mappings between two coordinate
             systems which are 2 mapping steps apart. It does this by using
             two internal mappers to load up a third mapper which is
             actually used by the ChainedAssemblyMapper to perform the
             mapping.

             This method must be called before any mapping is
             attempted on regions of interest, otherwise only gaps will
             be returned.  Note that the ChainedAssemblyMapper automatically
             calls this method when the need arises.
Returntype : none
Exceptions : throw if the seq_region to be registered does not exist
             or if it associated with multiple assembled pieces (bad data
             in assembly table)

             throw if the mapping between the coordinate systems cannot
             be performed in two steps, which means there is an internal
             error in the data in the meta table or in the code that creates
             the mapping paths.
Caller     : Bio::EnsEMBL::AssemblyMapper
Status     : Stable

register_all

Arg [1]    : Bio::EnsEMBL::AssemblyMapper $mapper
Example    : $mapper = $asm_mapper_adaptor->fetch_by_CoordSystems($cs1,$cs2);

             # make cache large enough to hold all of the mappings
             $mapper->max_pair_count(10e6);
             $asm_mapper_adaptor->register_all($mapper);

             # perform mappings as normal
             $mapper->map($slice->seq_region_name(), $sr_start, $sr_end,
                          $sr_strand, $cs1);
             ...
Description: This function registers the entire set of mappings between
             two coordinate systems in an assembly mapper.
             This will use a lot of memory but will be much more efficient
             when doing a lot of mapping which is spread over the entire
             genome.
Returntype : none
Exceptions : none
Caller     : specialised prograhsm
Status     : Stable

register_all_chained

Arg [1]    : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
Example    : $mapper = $asm_mapper_adaptor->fetch_by_CoordSystems($cs1,$cs2);

             # make the cache large enough to hold all of the mappings
             $mapper->max_pair_count(10e6);
             # load all of the mapping data
             $asm_mapper_adaptor->register_all_chained($mapper);

             # perform mappings as normal
             $mapper->map($slice->seq_region_name(), $sr_start, $sr_end,
                          $sr_strand, $cs1);
             ...
Description: This function registers the entire set of mappings between
             two coordinate systems in a chained mapper.  This will use a lot
             of memory but will be much more efficient when doing a lot of
             mapping which is spread over the entire genome.
Returntype : none
Exceptions : throw if mapper is between coord systems with unexpected
             mapping paths
Caller     : specialised programs doing a lot of genome-wide mapping
Status     : Stable

seq_regions_to_ids

Arg [1]    : Bio::EnsEMBL::CoordSystem $coord_system
Arg [2]    : listref of strings $seq_regions
Example    : my @ids = @{$asma->seq_regions_to_ids($coord_sys, \@seq_regs)};
Description: Converts a list of seq_region names to internal identifiers
             using the internal cache that has accumulated while registering
             regions for AssemblyMappers. If any requested regions are
             not  found in the cache an attempt is made to retrieve them
             from the database.
Returntype : listref of ints
Exceptions : throw if a non-existant seqregion is provided
Caller     : general
Status     : Stable

seq_ids_to_regions

Arg [1]    : listref of   seq_region ids
Example    : my @ids = @{$asma->ids_to_seq_regions(\@seq_ids)};
Description: Converts a list of seq_region ids to seq region names
             using the internal cache that has accumulated while registering
             regions for AssemblyMappers. If any requested regions are
             not  found in the cache an attempt is made to retrieve them
             from the database.
Returntype : listref of strings
Exceptions : throw if a non-existant seq_region_id is provided
Caller     : general
Status     : Stable

delete_cache

Description: Delete all the caches for the mappings/seq_regions
Returntype : none
Exceptions : none
Caller     : General
Status     : At risk