Welcome to WebmasterWorld Guest from 54.163.34.237

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

subsequences!

     
11:45 am on Dec 4, 2008 (gmt 0)

New User

5+ Year Member

joined:Nov 13, 2008
posts: 4
votes: 0


I have a long string of letters, in this, case DNA. My intention is to find particular start triplets to begin and stp triplets to end the strings in the subsequence.the substring within these starts and stops triplets(with start and stop riplets inclusive) are then kept in an array in array.

For example my $string = "ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA"
should produce the substrings below and stored in an array

@whatever =("ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA",
"GTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTGGTTGGAAATAA",
"ATTGGTTGGAAATAA");

I have this as part of my entire code as my best effort:

while ($seq =~ m/ATG¦TTG¦CTG¦ATT¦CTA¦GTG¦ATT/gi){
my $matchPosition = pos($seq) - 3;
if (($matchPosition % 3) == 0) {
push (@startsRF1, $matchPosition);
}


while ($seq =~ m/TAG¦TAA¦TGA/gi){
my $matchPosition = pos($seq);
if (($matchPosition % 3) == 0) {
push (@stopsRF1, $matchPosition);
}

my $codonRange = "";
my $startPosition = 0;
my $stopPosition = 0;

@startsRF1 = reverse(@startsRF1);
@stopsRF1 = reverse(@stopsRF1);
while (scalar(@startsRF1) > 0) {
$codonRange = "";
$startPosition = pop(@startsRF1);
if ($startPosition < $stopPosition) {
next;
}

my $ORFseq = "";

while (scalar(@stopsRF1) > 0) {
$stopPosition = pop(@stopsRF1);
if ($stopPosition > $startPosition) {

my $difF = $stopPosition - $startPosition;
$ORFseq = substr($seq, $startPosition,(length($seq)-(length($seq)-$difF)));
push (@arrayOfORFs, $ORFseq);

}

5:02 pm on Dec 4, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Sept 8, 2006
posts:1230
votes: 0


my question is what does this have to do with web analytics or tracking/logging?
2:33 pm on Dec 11, 2008 (gmt 0)

Senior Member

joined:Mar 8, 2002
posts:2897
votes: 0


So I moved thios thread... but if it isn't perl, I apologize! I figured the perl guys in here would read this like I read the morning paper.
7:53 pm on Dec 11, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:May 31, 2008
posts:661
votes: 0


if it looks hard to read, it's usually perl ;)

I'm not sure if I got your idea completely, because my try reaches different results then you give in your post.

use strict;
my $string = "ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA";
my @array = ();
my @starts = qw(ATG TTG CTG ATT CTA GTG);
my @stops = qw(TAG TAA TGA);
for my $start (@starts)
{
for my $stop (@stops)
{
while($string =~ m/$start(.*)$stop/g)
{
push @array, $start . $1 . $stop;
}

}
}

print join("\n", @array);

results in

ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA
ATGAAAGTGAAAGGGAAAGGGGTGA
TTGGGTATTGGTTGGAAATAA
ATTGGTTGGAAATAA
GTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA
GTGAAAGGGAAAGGGGTGA

but maybe I got something wrong, I've never been into the Bio-Stuff.

If that's not what you needed, please elaborate for a guy who knows he should have DNA somewhere in his body but not much more than that.

Also, did you check the modules available at cpan? I hear there are quite a few for dealing with DNA. Maybe one of these can do the job much cleaner: [search.cpan.org...]

9:15 am on Dec 18, 2008 (gmt 0)

New User

5+ Year Member

joined:Nov 13, 2008
posts:4
votes: 0


ok Thanks. Can you offer any hints to this line of code:

I have a sting of letters and would like to use regex to check the availabilty of these letters in a text.

bbbb either cg or gc or cc or gg then followed by a t. So the regex should match any of 4 possiblities like either:

bbbbcgt or bbbbgct or bbbbcct or bbbbggt.

Regards,
Emmanuel

11:42 am on Dec 18, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:May 31, 2008
posts:661
votes: 0


you can use parentheses in regexps for two reasons: to catch parts of the match and work with it and to group things.

in your case, /bbbb(cg¦gc¦cc¦gg)t/ would work and, if the string is bbbbcgt, $1 would contain cg. the ¦ in the parentheses tells the regexp-machine that any one of those strings can match at this position.
if you don't need to know which of the four possibilites matched, you could also say /bbbb(?:cg¦gc¦cc¦gg)t/ to indicate that you just want to group them, not save them.

5:30 am on Dec 19, 2008 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10550
votes: 10


welcome to WebmasterWorld [webmasterworld.com], Emmanuel !

you should learn the basics of regular expressions:
[perldoc.perl.org...]

the knowlege is essential to perl and can be transferred to many other disciplines.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members