LICENSE

Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

CONTACT

Please email comments or questions to the public Ensembl
developers list at <http://lists.ensembl.org/mailman/listinfo/dev>.

Questions may also be sent to the Ensembl help desk at
<http://www.ensembl.org/Help/Contact>.

NAME

Bio::EnsEMBL::Utils::AssemblyProjector - utility class to post-process projections from one assembly to another

SYNOPSIS

# connect to an old database
my $dba_old = new Bio::EnsEMBL::DBSQL::DBAdaptor(
  -host   => 'ensembldb.ensembl.org',
  -port   => 3306,
  -user   => ensro,
  -dbname => 'mus_musculus_core_46_36g',
  -group  => 'core_old',
);

# connect to the new database containing the mapping between old and
# new assembly
my $dba_new = new Bio::EnsEMBL::DBSQL::DBAdaptor(
  -host   => 'ensembldb.ensembl.org',
  -port   => 3306,
  -user   => ensro,
  -dbname => 'mus_musculus_core_47_37',
  -group  => 'core_new',
);

my $assembly_projector = Bio::EnsEMBL::Utils::AssemblyProjector->new(
  -OLD_ASSEMBLY    => 'NCBIM36',
  -NEW_ASSEMBLY    => 'NCBIM37',
  -ADAPTOR         => $dba_new,
  -EXTERNAL_SOURCE => 1,
  -MERGE_FRAGMENTS => 1,
  -CHECK_LENGTH    => 0,
);

# fetch a slice on the old assembly
my $slice_adaptor = $dba_old->get_SliceAdaptor;
my $slice =
  $slice_adaptor->fetch_by_region( 'chromosome', 1, undef, undef,
  undef, 'NCBIM36' );

my $new_slice = $assembly_projector->old_to_new($slice);

print $new_slice->name, " (", $assembly_projector->last_status, ")\n";

DESCRIPTION

This class implements some utility functions for converting coordinates between assemblies. A mapping between the two assemblies has to present the database for this to work, see the 'Related Modules' section below on how to generate the mapping.

In addition to the "raw" projecting of features and slices, the methods in this module also apply some sensible rules to the results of the projection (like discarding unwanted results or merging fragmented projections). These are the rules (depending on configuration):

Discard the projected feature/slice if:

1. it doesn't project at all (no segments returned)
2. [unless MERGE_FRAGMENTS is set] the projection is fragmented (more
   than one segment)
3. [if CHECK_LENGTH is set] the projection doesn't have the same
   length as the original feature/slice
4. all segments are on same chromosome and strand

If a projection fails any of these rules, undef is returned instead of a projected feature/slice. You can use the last_status() method to find out about the results of the rules tests.

Also note that when projecting features, only a shallow projection is performed, i.e. other features attached to your features (e.g. the transcripts of a gene) are not projected automatically, so it will be the responsability of the user code project all levels of features involved.

METHODS

new
project
old_to_new
new_to_old
adaptor
external_source
old_assembly
new_assembly
merge_fragments
check_length

RELATED MODULES

The process of creating a whole genome alignment between two assemblies (which is the basis for the use of the methods in this class) is done by a series of scripts. Please see

ensembl/misc-scripts/assembly/README

for a high-level description of this process, and POD in the individual scripts for the details.

new

Arg [ADAPTOR]         : Bio::EnsEMBL::DBSQL::DBAdaptor $adaptor - a db adaptor
                        for a database containing the assembly mapping
Arg [EXTERNAL_SOURCE] : (optional) Boolean $external_source - indicates if
                        source is from a different database
Arg [OLD_ASSEMBLY]    : name of the old assembly
Arg [OLD_ASSEMBLY]    : name of the new assembly
Arg [OBJECT_TYPE]     : (optional) object type ('slice' or 'feature')
Arg [MERGE_FRAGMENTS] : (optional) Boolean - determines if segments are merged
                        to return a single object spanning all segments
                        (default: true)
Arg [CHECK_LENGTH]    : (optional) Boolean - determines if projected objects
                        have to have same length as original (default: false)
Example     : my $ap = Bio::EnsEMBL::Utils::AssemblyProjector->new(
                -DBADAPTOR    => $dba,
                -OLD_ASSEMBLY => NCBIM36,
                -NEW_ASSEMBLY => NCBIM37,
              );
Description : Constructor.
Return type : a Bio::EnsEMBL::Utils::AssemblyProjector object
Exceptions  : thrown on missing arguments
              thrown on invalid OBJECT_TYPE
Caller      : general
Status      : At Risk
            : under development

project

Arg[1]      : Bio::EnsEMBL::Slice or Bio::EnsEMBL::Feature $object -
              the object to project
Arg[2]      : String $to_assembly - assembly to project to
Example     : my $new_slice = $assembly_projector->project($old_slice, 
                'NCBIM37');
Description : Projects a Slice or Feature to the specified assembly.

              Several tests are performed on the result to discard unwanted
              results. All projection segments have to be on the same
              seq_region and strand. If -MERGE_FRAGMENTS is set, gaps will be
              bridged by creating a single object from first_segment_start to
              last_segment_end. If -CHECK_LENGTH is set, the projected object
              will have to have the same length as the original. You can use
              the last_status() method to find out what the result of some of
              these rule tests were. Please see the comments in the code for
              more details about these rules.

              The return value of this method will always be a single object,
              or undef if the projection fails any of the rules.
              
              Note that when projecting features, only a "shallow" projection
              is performed, i.e. attached features aren't projected
              automatically! (e.g. if you project a gene, its transcripts will
              have to be projected manually before storing the new gene)
Return type : same a Arg 1, or undef if projection fails any of the rules
Exceptions  : thrown on invalid arguments
Caller      : general, $self->old_to_new, $self->new_to_old
Status      : At Risk
            : under development

old_to_new

Arg[1]      : Bio::EnsEMBL::Slice or Bio::EnsEMBL::Feature $object -
              the object to project
Example     : my $new_slice = $assembly_projector->old_to_new($old_slice);
Description : Projects a Slice or Feature from old to new assembly.
              This method is just a convenience wrapper for $self->project.
Return type : same a Arg 1, or undef
Exceptions  : none
Caller      : general
Status      : At Risk
            : under development

new_to_old

Arg[1]      : Bio::EnsEMBL::Slice or Bio::EnsEMBL::Feature $object -
              the object to project
Example     : my $old_slice = $assembly_projector->new_to_old($new_slice, 1);
Description : Projects a Slice or Feature from new to old assembly.
              This method is just a convenience wrapper for $self->project.
Return type : same a Arg 1, or undef
Exceptions  : none
Caller      : general
Status      : At Risk
            : under development