LICENSE
Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
CONTACT
Please email comments or questions to the public Ensembl
developers list at <http://lists.ensembl.org/mailman/listinfo/dev>.
Questions may also be sent to the Ensembl help desk at
<http://www.ensembl.org/Help/Contact>.
NAME
Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
SYNOPSIS
use Bio::EnsEMBL::Registry;
Bio::EnsEMBL::Registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous'
);
$asma = Bio::EnsEMBL::Registry->get_adaptor( "human", "core",
"assemblymapper" );
$csa = Bio::EnsEMBL::Registry->get_adaptor( "human", "core",
"coordsystem" );
my $chr33_cs = $csa->fetch_by_name( 'chromosome', 'NCBI33' );
my $chr34_cs = $csa->fetch_by_name( 'chromosome', 'NCBI34' );
my $ctg_cs = $csa->fetch_by_name('contig');
my $clone_cs = $csa->fetch_by_name('clone');
my $chr_ctg_mapper =
$asma->fetch_by_CoordSystems( $chr33_cs, $ctg_cs );
my $ncbi33_ncbi34_mapper =
$asm_adptr->fetch_by_CoordSystems( $chr33, $chr34 );
my $ctg_clone_mapper =
$asm_adptr->fetch_by_CoordSystems( $ctg_cs, $clone_cs );
DESCRIPTION
Adaptor for handling Assembly mappers. This is a Singleton class. ie: There is only one per database (DBAdaptor
).
This is used to retrieve mappers between any two coordinate systems whose makeup is described by the assembly table. Currently one step (explicit) and two step (implicit) pairwise mapping is supported. In one-step mapping an explicit relationship between the coordinate systems is defined in the assembly table. In two-step 'chained' mapping no explicit mapping is present but the coordinate systems must share a common mapping to an intermediate coordinate system.
METHODS
new
Arg [1] : Bio::EnsEMBL::DBAdaptor $dbadaptor the adaptor for
the database this assembly mapper is using.
Example : my $asma = new Bio::EnsEMBL::AssemblyMapperAdaptor($dbadaptor);
Description: Creates a new AssemblyMapperAdaptor object
Returntype : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
Exceptions : none
Caller : Bio::EnsEMBL::DBSQL::DBAdaptor
Status : Stable
cache_seq_ids_with_mult_assemblys
Example : $self->adaptor->cache_seq_ids_with_mult_assemblys();
Description: Creates a hash of the component seq region ids that
map to more than one assembly from the assembly table.
Retruntype : none
Exceptions : none
Caller : AssemblyMapper, ChainedAssemblyMapper
Status : At Risk
fetch_by_CoordSystems
Arg [1] : Bio::EnsEMBL::CoordSystem $cs1
One of the coordinate systems to retrieve the mapper
between
Arg [2] : Bio::EnsEMBL::CoordSystem $cs2
The other coordinate system to map between
Description: Retrieves an Assembly mapper for two coordinate
systems whose relationship is described in the
assembly table.
The ordering of the coodinate systems is arbitrary.
The following two statements are equivalent:
$mapper = $asma->fetch_by_CoordSystems($cs1,$cs2);
$mapper = $asma->fetch_by_CoordSystems($cs2,$cs1);
Returntype : Bio::EnsEMBL::AssemblyMapper
Exceptions : wrong argument types
Caller : general
Status : Stable
register_assembled
Arg [1] : Bio::EnsEMBL::AssemblyMapper $asm_mapper
A valid AssemblyMapper object
Arg [2] : integer $asm_seq_region
The dbID of the seq_region to be registered
Arg [3] : int $asm_start
The start of the region to be registered
Arg [4] : int $asm_end
The end of the region to be registered
Description: Declares an assembled region to the AssemblyMapper.
This extracts the relevant data from the assembly
table and stores it in Mapper internal to the $asm_mapper.
It therefore must be called before any mapping is
attempted on that region. Otherwise only gaps will
be returned. Note that the AssemblyMapper automatically
calls this method when the need arises.
Returntype : none
Exceptions : throw if the seq_region to be registered does not exist
or if it associated with multiple assembled pieces (bad data
in assembly table)
Caller : Bio::EnsEMBL::AssemblyMapper
Status : Stable
register_component
Arg [1] : Bio::EnsEMBL::AssemblyMapper $asm_mapper
A valid AssemblyMapper object
Arg [2] : integer $cmp_seq_region
The dbID of the seq_region to be registered
Description: Declares a component region to the AssemblyMapper.
This extracts the relevant data from the assembly
table and stores it in Mapper internal to the $asm_mapper.
It therefore must be called before any mapping is
attempted on that region. Otherwise only gaps will
be returned. Note that the AssemblyMapper automatically
calls this method when the need arises.
Returntype : none
Exceptions : throw if the seq_region to be registered does not exist
or if it associated with multiple assembled pieces (bad data
in assembly table)
Caller : Bio::EnsEMBL::AssemblyMapper
Status : Stable
register_chained
Arg [1] : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
The chained assembly mapper to register regions on
Arg [2] : string $from ('first' or 'last')
The direction we are registering from, and the name of the
internal mapper.
Arg [3] : string $seq_region_name
The name of the seqregion we are registering on
Arg [4] : listref $ranges
A list of ranges to register (in [$start,$end] tuples).
Arg [5] : (optional) $to_slice
Only register those on this Slice.
Description: Registers a set of ranges on a chained assembly mapper.
This function is at the heart of the chained mapping process.
It retrieves information from the assembly table and
dynamically constructs the mappings between two coordinate
systems which are 2 mapping steps apart. It does this by using
two internal mappers to load up a third mapper which is
actually used by the ChainedAssemblyMapper to perform the
mapping.
This method must be called before any mapping is
attempted on regions of interest, otherwise only gaps will
be returned. Note that the ChainedAssemblyMapper automatically
calls this method when the need arises.
Returntype : none
Exceptions : throw if the seq_region to be registered does not exist
or if it associated with multiple assembled pieces (bad data
in assembly table)
throw if the mapping between the coordinate systems cannot
be performed in two steps, which means there is an internal
error in the data in the meta table or in the code that creates
the mapping paths.
Caller : Bio::EnsEMBL::AssemblyMapper
Status : Stable
_register_chained_special
Arg [1] : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
The chained assembly mapper to register regions on
Arg [2] : string $from ('first' or 'last')
The direction we are registering from, and the name of the
internal mapper.
Arg [3] : string $seq_region_name
The name of the seqregion we are registering on
Arg [4] : listref $ranges
A list of ranges to register (in [$start,$end] tuples).
Arg [5] : (optional) $to_slice
Only register those on this Slice.
Description: Registers a set of ranges on a chained assembly mapper.
This function is at the heart of the chained mapping process.
It retrieves information from the assembly table and
dynamically constructs the mappings between two coordinate
systems which are 2 mapping steps apart. It does this by using
two internal mappers to load up a third mapper which is
actually used by the ChainedAssemblyMapper to perform the
mapping.
This method must be called before any mapping is
attempted on regions of interest, otherwise only gaps will
be returned. Note that the ChainedAssemblyMapper automatically
calls this method when the need arises.
Returntype : none
Exceptions : throw if the seq_region to be registered does not exist
or if it associated with multiple assembled pieces (bad data
in assembly table)
throw if the mapping between the coordinate systems cannot
be performed in two steps, which means there is an internal
error in the data in the meta table or in the code that creates
the mapping paths.
Caller : Bio::EnsEMBL::AssemblyMapper
Status : Stable
register_all
Arg [1] : Bio::EnsEMBL::AssemblyMapper $mapper
Example : $mapper = $asm_mapper_adaptor->fetch_by_CoordSystems($cs1,$cs2);
# make cache large enough to hold all of the mappings
$mapper->max_pair_count(10e6);
$asm_mapper_adaptor->register_all($mapper);
# perform mappings as normal
$mapper->map($slice->seq_region_name(), $sr_start, $sr_end,
$sr_strand, $cs1);
...
Description: This function registers the entire set of mappings between
two coordinate systems in an assembly mapper.
This will use a lot of memory but will be much more efficient
when doing a lot of mapping which is spread over the entire
genome.
Returntype : none
Exceptions : none
Caller : specialised prograhsm
Status : Stable
register_all_chained
Arg [1] : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
Example : $mapper = $asm_mapper_adaptor->fetch_by_CoordSystems($cs1,$cs2);
# make the cache large enough to hold all of the mappings
$mapper->max_pair_count(10e6);
# load all of the mapping data
$asm_mapper_adaptor->register_all_chained($mapper);
# perform mappings as normal
$mapper->map($slice->seq_region_name(), $sr_start, $sr_end,
$sr_strand, $cs1);
...
Description: This function registers the entire set of mappings between
two coordinate systems in a chained mapper. This will use a lot
of memory but will be much more efficient when doing a lot of
mapping which is spread over the entire genome.
Returntype : none
Exceptions : throw if mapper is between coord systems with unexpected
mapping paths
Caller : specialised programs doing a lot of genome-wide mapping
Status : Stable
seq_regions_to_ids
Arg [1] : Bio::EnsEMBL::CoordSystem $coord_system
Arg [2] : listref of strings $seq_regions
Example : my @ids = @{$asma->seq_regions_to_ids($coord_sys, \@seq_regs)};
Description: Converts a list of seq_region names to internal identifiers
using the internal cache that has accumulated while registering
regions for AssemblyMappers. If any requested regions are
not found in the cache an attempt is made to retrieve them
from the database.
Returntype : listref of ints
Exceptions : throw if a non-existant seqregion is provided
Caller : general
Status : Stable
seq_ids_to_regions
Arg [1] : listref of seq_region ids
Example : my @ids = @{$asma->ids_to_seq_regions(\@seq_ids)};
Description: Converts a list of seq_region ids to seq region names
using the internal cache that has accumulated while registering
regions for AssemblyMappers. If any requested regions are
not found in the cache an attempt is made to retrieve them
from the database.
Returntype : listref of strings
Exceptions : throw if a non-existant seq_region_id is provided
Caller : general
Status : Stable
delete_cache
Description: Delete all the caches for the mappings/seq_regions
Returntype : none
Exceptions : none
Caller : General
Status : At risk