NAME
Crypt::DNASequence - Encrypt and decrypt strings to DNA Sequences
SYNOPSIS
use Crypt::DNASequence;
my $text = "hello world!";
my $encrypted = Crypt::DNASequence->encrypt($text);
print $encypted."\n";
my $decrypted = Crypt::DNASequence->decrypt($encrypted);
print $decrypted."\n";
DESCRIPTION
The module is naiive and just for fun. It transforms text strings into DNA sequences. A DNA sequence is composed of four nucleotides which are represented as A, T, C, G. If we transform "abcdefghijklmnopqistuvwxyzABCDEFGHIJKLMNOPQISTUVWXYZ", the corresponding sequence would be:
GTCGACCGAGCGATCGCACGCCCGCGCGCTCGGACGGCCGGGCGGTCGTACGTCCGTGCGTTCTAACTAC
CGGCCTATCTCACTCCCTCGCTCTCTGACTGCCTGGCAACCAAGCAATCACACACCCACGCACTCAGACA
GCCAGGCAGTCATACATCCATGCATTCCAACCACCAGCCCATCCCACCCCCCCGCCCTCCGACCGCCCGG
AC
or
CAGCTGGCTCGCTAGCGTGCGGGCGCGCGAGCCTGCCGGCCCGCCAGCATGCAGGCACGCAAGATTGATG
GCCGGATAGAGTGAGGGAGCGAGAGACTGACGGACCGTTGGTTCGTTAGTGTGTGGGTGCGTGAGTCTGT
CGGTCCGTCAGTATGTAGGTACGTAAGGTTGGTGGTCGGGTAGGGTGGGGGGGCGGGAGGCTGGCGGGCC
TG
The transformation is not unique due to a random mapping, but all the transformed sequences can be decrypted correctly to the origin string.
ALGORITHM
The algorithm behind the module is simple. Two binary bits are used to represent a nucleotide such as '00' for A, '01' for C. If you have some knowledge of molecular biology, you would know that A only matches to T and C only matches to G. So if '00' is choosen to be A, then '11' should be used to represent 'T'. In the module, the correspondence between binary bits and nucleotides are applied randomly. The information of the correspondence dictionary is also stored in the finnal sequence.
Here is the procedure for encryption. 1. Split a string into a set of letters or charactors. 2. For each letter, convert to its binary form and transform to ATCG every two bits using a randomly generated dictionary. The dictionary may looks like:
$dict = { '00' => 'A',
'11' => 'T',
'01' => 'C',
'10' => 'G' };
3. Join the A, T, G, C as a single sequence. 4. Find the first nucleotide of the sequence. 5, Find the number of the first nucleotide in the sequence. 6. There is a database storing all arrangements of '00', '11', '01', '10'. 7. Calculate the index value from the number of the first nucleotide by mod calculation. 8. Retrieve the arrangement with the index value, map them to the dictionary and get four nucleotides. E.g. the first nucleotide of the sequence is G. The number of G in the sequence is 40. The number of all arrangement in the database is 24. Then we calculate the index value by 40 % 24 = 16. Then the 16th arrangement is retrieved and may looks like ['01', '11', '10', '00']. The four items in the array are mapped to the dictionary to be four nucleotides such as CTGA. Note this information can be used in the decryption procedure. 9. Put the first two nucleotides at the begining of the sequence and the last two nucleotides at the end of the sequence. 10. That is the finnal seuqence.
Here is the procedure for decryption. 1. Extract the first two and the last two nucleotides fromt the sequence. E.g. CT and GA. 2. Count the number of the first nucleotide in the real sequence, e.g., 40 for G. 3. Use this number to calculate the index in the arrangement database, e.g., 16. 4. find the dictionary, i.e. a dictionary is generated from the 16th arrangement ['01', '11', '10', '00'] and CTGA. 5. Translate the DNA sequence according the dictionary into binary bit form and finnaly to the orgin format.
Subroutines
Crypt::DNASequence-
encrypt($string)>-
encrypt the string to DNA sequence
Crypt::DNASequence-
decrypt($encrypted)>-
decrypt the DNA sequence to the origin string
AUTHOR
Zuguang Gu <jokergoo@gmail.com>
COPYRIGHT AND LICENSE
Copyright 2012 by Zuguang Gu
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.1 or, at your option, any later version of Perl 5 you may have available.