LICENSE

Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

CONTACT

Please email comments or questions to the public Ensembl
developers list at <dev@ensembl.org>.

Questions may also be sent to the Ensembl help desk at
<helpdesk@ensembl.org>.

NAME

Bio::EnsEMBL::DBSQL::BaseSequenceAdaptor

DESCRIPTION

The BaseSequenceAdaptor is responsible for the conversion of calls from fetch_by_Slice_start_end_strand() for Sequence data into requests for a backing data store. In Ensembl these are the seqlevel sequence region records held in the MySQL database.

The base adaptor also provides sequence caching based on normalisation technique similar to the UCSC and BAM binning indexes. The code works by right-shifting the requested start and end by a seq chunk power (by default 18 approx. 250,000bp) and then left-shifting by the same value. This means any value within a given window will always result in the same value. Please see the worked examples below:

# Equation
p=position
o=seq chunk power
offset=( (p-1)>>o ) << o

# Using real values
p=1340001
o=18
right_shifted = (1340001-1) >> 18 == 5
offset = 5 << 18 == 1310720

To control the size of the cache and sequences stored you can provide the seq chunk power and the number of sequences cached.

fetch_by_Slice_start_end_strand

Arg  [1]   : Bio::EnsEMBL::Slice slice
             The slice from which you want the sequence
Arg  [2]   : Integer; $strand (optional)
             The start base pair relative to the start of the slice. Negative
             values or values greater than the length of the slice are fine.
             default = 1
Arg  [3]   : (optional) int endBasePair
             The end base pair relative to the start of the slice. Negative
             values or values greater than the length of the slice are fine,
             but the end must be greater than or equal to the start
             count from 1
             default = the length of the slice
Arg  [4]   : Integer; $strand (optional)
             Strand of DNA to fetch
Returntype : StringRef (DNA requested)
Description: Performs the fetching of DNA based upon a Slice. All fetches
             should use this method and no-other.
             
             Implementing classes are responsible for converting the
             given Slice and values into something which can be processed by 
             the underlying storage engine. Implementing class are also
             responsible for the reverse complementing of sequence.
Exceptions : Thrown if not redefined

can_access_Slice

Description : Returns a boolean indiciating if the adaptor understands
              the given Slice.
Returntype  : Boolean; if true you can get sequence for the given Slice
Exceptions  : Thrown if not redefined

expand_Slice

Arg  [1]    : Bio::EnsEMBL::Slice slice
              The slice from which you want the sequence
Arg  [2]    : Integer; $strand (optional)
              The start base pair relative to the start of the slice. Negative
              values or values greater than the length of the slice are fine.
              default = 1
Arg  [3]    : (optional) int endBasePair
              The end base pair relative to the start of the slice. Negative
              values or values greater than the length of the slice are fine,
              but the end must be greater than or equal to the start
              count from 1
              default = the length of the slice
Arg  [4]    : Integer; $strand (optional)
              Strand of DNA to fetch
Returntype  : Bio::EnsEMBL::Slice
Description : Creates a new Slice which represents the requested region. Provides
              logic applicable to all SliceAdaptor instance
Exceptions  : Thrown if the Slice is circular (we currently do not support this as generic logic)

new

Arg [1]    : Int  $chunk_power; sets the size of each element of 
                  the sequence cache. Defaults to 18 which gives 
                  block sizes of ~250Kb (it is actually 2^18)
Arg [2]    : Int  $cache_size; size of the cache. Defaults to 5 meaning
                  a cache of 1Mb if you use default values
Example    : my $sa = $db_adaptor->get_SequenceAdaptor();
Description: Constructor.  Calls superclass constructor and initialises
             internal cache structure.
Returntype : Bio::EnsEMBL::DBSQL::SequenceAdaptor
Exceptions : none
Caller     : DBAdaptor::get_SequenceAdaptor
Status     : Stable

clear_cache

Example    	: $sa->clear_cache();
Description	: Removes all entries from the associcated sequence cache
Returntype 	: None
Exceptions 	: None

_fetch_raw_seq

Arg [1]     : String $id
              The identifier of the sequence to fetch.
Arg [2]     : Integer $start
              Where to start fetching sequence from
Arg [2]     : Integer $length
              Total length of seuqence to fetch
Description : Performs the fetch of DNA from the backing storage 
              engine and provides it to the _fetch_seq() method
              for optional caching.
Returntype  : ScalarRef of DNA fetched. All bases should be uppercased
Exceptions  : Thrown if the method is not reimplemented

_fetch_seq

Arg [1]     : String $id
              The identifier of the sequence to fetch.
Arg [2]     : Integer $start
              Where to start fetching sequence from
Arg [2]     : Integer $length
              Total length of seuqence to fetch
Description	: If the requested region is smaller than our maximum length
              cachable region we will see if the cache already contains
              this chunk. If not we will request the region from C<_fetch_raw_seq()>
              and cache it. If the region requested is larger than 
              the maximum cacheable sequence length we pass the request
              onto C<_fetch_raw_seq()> with no caching layer.
              
              This module is also responsible for the conversion of
              requested regions into normalised region reuqests based
              on C<chunk_power>.
Returntype 	: ScalarRef of DNA fetched. All bases should be uppercased
Exceptions 	: Thrown when C<_fetch_raw_seq()> is not re-implemented