Forum Moderators: coopster & phranque

Message Too Old, No Replies

Perl substrings

         

ktsirig

10:21 am on Dec 13, 2005 (gmt 0)

10+ Year Member



Hello everybody,
I have been dealing with this problem in PERL a few days now, and it has get into my nerves, so, if anyone has a hint on how to procceed, it would be more than welcome.
Say you have one file that contains a sequence of letters, eg :
SGFEFHGYARSGVIMNDSGASTKSGAYITPAGETGGAIGRLGNQADTYVEMNLEHKQTLDN [file 1]

the same sequence is also in the [file 2], but, with "." and "-" in it, like:
...---SGFEF....HG-.--YARSGVI---MNDSGAS..--TKSGAY--....--ITPAG--ETGGAI..GRLGN--Q..AD---TY--V..EMNL--EHKQTLDN [file 2]

Let's say I want to check how two substrings (namely SGASTK and GNQADT) have become in file 2 compared to what they were in file1.

I see that SGASTK, that was substring 19-24 in file1 is now SGAS..--TK and substring 36-41.
GNQADT which was substring 42-48 in file1 is now GN--Q..AD---T and substring 76-82.

My question is how can i find the old substrings from file 1 in file 2 and how can i store the new begginings and endings of the new substrings in file2...

simon2263

4:49 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



I guess the answer is to turn the substrings you are looking for into regular expressions that accommodate the presence of '-' and '.' between the letters. For example,

#!/usr/local/bin/perl

$str = "a[-\.]*?b[-\.]*?c";
$str2 = "zzzza---..bcddddd";
$str2 =~ /($str)/;
print $1,"\n";

The use of the grouped regexp between /.../ allows you to remember the substring that matched the regexp.