I need to get subsequences from a long stretch of DNA sequence. The idea is to print out all possible subsequences beginning with (TTA¦CTA¦CTG¦TTG¦CTC) and ending with (TAG¦TGA¦TAA).
E.g; in a Dna like below:
$dna = "GGGCTACCCCGCCTCAAAGGGGGGTTACCCGGCCCGTTGAAACCCGGTCCGGGCTTAAAAGGGTAA"
only these subsequences can be obtained:
CTACCCCGCCTCAAAGGGGGGTTACCCGGCCCGTTGAAACCCGGTCCGGGCTTAAAAGGGTAA
CTCAAAGGGGGGTTACCCGGCCCGTTGAAACCCGGTCCGGGCTTAAAAGGGTAA
TTACCCGGCCCGTTGAAACCCGGTCCGGGCTTAAAAGGGTAA
TTAAAAGGGTAA
So, the point the subsequences are chopped of their positions in the DNA where there is any start codon and must end at the next stop codon(in this case there's only 1 stop codon "TAA")
please post your best effort for that code snippet so we can discuss the specific problem are you having with the code.
please check this for me.
my $seq = "AAAAATGAAAATAAGGGAAATGAAAAAAAAAAGGGGGGGACGGG"
my $gene = "AAATGAAAAAAA"
if I match gene from my sequence like so:
if($seq =~ /$gene/g){
#pos($seq) will give me 1st position after the match
#$` will hold the upstream sequence in this case:AAAAATGAAAATAAGGG
I am trying to find the position of the last stop codon in $` and assume that any of 3 possible stop codons are in the seq.
Notice the first 2 Adenines in the seq. It should be that the seq must be read in the correct frame as that of the match( In this case the frame should be 3rd frame; but we assume we dont know for some other sequence because this is just an eaxmple)
The correct position should return the position of TAA in the above