homepage Welcome to WebmasterWorld Guest from 23.20.19.131
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
subsequences!
ojefua




msg:3799884
 11:45 am on Dec 4, 2008 (gmt 0)

I have a long string of letters, in this, case DNA. My intention is to find particular start triplets to begin and stp triplets to end the strings in the subsequence.the substring within these starts and stops triplets(with start and stop riplets inclusive) are then kept in an array in array.

For example my $string = "ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA"
should produce the substrings below and stored in an array

@whatever =("ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA",
"GTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTGGTTGGAAATAA",
"ATTGGTTGGAAATAA");

I have this as part of my entire code as my best effort:

while ($seq =~ m/ATG¦TTG¦CTG¦ATT¦CTA¦GTG¦ATT/gi){
my $matchPosition = pos($seq) - 3;
if (($matchPosition % 3) == 0) {
push (@startsRF1, $matchPosition);
}


while ($seq =~ m/TAG¦TAA¦TGA/gi){
my $matchPosition = pos($seq);
if (($matchPosition % 3) == 0) {
push (@stopsRF1, $matchPosition);
}

my $codonRange = "";
my $startPosition = 0;
my $stopPosition = 0;

@startsRF1 = reverse(@startsRF1);
@stopsRF1 = reverse(@stopsRF1);
while (scalar(@startsRF1) > 0) {
$codonRange = "";
$startPosition = pop(@startsRF1);
if ($startPosition < $stopPosition) {
next;
}

my $ORFseq = "";

while (scalar(@stopsRF1) > 0) {
$stopPosition = pop(@stopsRF1);
if ($stopPosition > $startPosition) {

my $difF = $stopPosition - $startPosition;
$ORFseq = substr($seq, $startPosition,(length($seq)-(length($seq)-$difF)));
push (@arrayOfORFs, $ORFseq);

}

 

tonynoriega




msg:3800077
 5:02 pm on Dec 4, 2008 (gmt 0)

my question is what does this have to do with web analytics or tracking/logging?

Receptional




msg:3805140
 2:33 pm on Dec 11, 2008 (gmt 0)

So I moved thios thread... but if it isn't perl, I apologize! I figured the perl guys in here would read this like I read the morning paper.

janharders




msg:3805430
 7:53 pm on Dec 11, 2008 (gmt 0)

if it looks hard to read, it's usually perl ;)

I'm not sure if I got your idea completely, because my try reaches different results then you give in your post.
use strict;
my $string = "ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA";
my @array = ();
my @starts = qw(ATG TTG CTG ATT CTA GTG);
my @stops = qw(TAG TAA TGA);
for my $start (@starts)
{
for my $stop (@stops)
{
while($string =~ m/$start(.*)$stop/g)
{
push @array, $start . $1 . $stop;
}

}
}

print join("\n", @array);

results in
ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA
ATGAAAGTGAAAGGGAAAGGGGTGA
TTGGGTATTGGTTGGAAATAA
ATTGGTTGGAAATAA
GTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA
GTGAAAGGGAAAGGGGTGA

but maybe I got something wrong, I've never been into the Bio-Stuff.

If that's not what you needed, please elaborate for a guy who knows he should have DNA somewhere in his body but not much more than that.

Also, did you check the modules available at cpan? I hear there are quite a few for dealing with DNA. Maybe one of these can do the job much cleaner: [search.cpan.org...]

ojefua




msg:3810264
 9:15 am on Dec 18, 2008 (gmt 0)

ok Thanks. Can you offer any hints to this line of code:

I have a sting of letters and would like to use regex to check the availabilty of these letters in a text.

bbbb either cg or gc or cc or gg then followed by a t. So the regex should match any of 4 possiblities like either:

bbbbcgt or bbbbgct or bbbbcct or bbbbggt.

Regards,
Emmanuel

janharders




msg:3810328
 11:42 am on Dec 18, 2008 (gmt 0)

you can use parentheses in regexps for two reasons: to catch parts of the match and work with it and to group things.

in your case, /bbbb(cg¦gc¦cc¦gg)t/ would work and, if the string is bbbbcgt, $1 would contain cg. the ¦ in the parentheses tells the regexp-machine that any one of those strings can match at this position.
if you don't need to know which of the four possibilites matched, you could also say /bbbb(?:cg¦gc¦cc¦gg)t/ to indicate that you just want to group them, not save them.

phranque




msg:3810999
 5:30 am on Dec 19, 2008 (gmt 0)

welcome to WebmasterWorld [webmasterworld.com], Emmanuel !

you should learn the basics of regular expressions:
[perldoc.perl.org...]

the knowlege is essential to perl and can be transferred to many other disciplines.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved