regex to match everything up to a newline followed by a special sequen

Forum Moderators: coopster

Message Too Old, No Replies

regex to match everything up to a newline followed by a special sequen

regular expression help

MountainX

9:27 pm on Nov 15, 2009 (gmt 0)

I hope I'm posting in the right place. I saw most of the regex questions were in this forum, so I put my question here even though I'm using python.

I have a python script with the following regex:


regExpression = r'(?P<comment>[^<]*)'

It is used to parse text like this:


testing blah blah blah can include anything such as < or , or 123, etc.
<CR>

It has 2 problems:
1. If the comment contains "<" it doesn't work correctly.
2. The comment includes the trailing newline, which is incorrect.

So what I need to change is that the regex should parse anything and everything up to and before a newline (Win or Linux) followed by <CR> or the end of the file.

To further clarify, the regex processes * but stops when it encounters the sequence newline<CR> -- or the end of the file. The newline<CR> sequence is **not** consumed.

Can anyone suggest a regex for doing this? Thanks.

MountainX

3:19 am on Nov 16, 2009 (gmt 0)

Solved:

before parsing:
input += "\n<CR>"

readingExpression = r'(?P<comment>.*?)(?=\s*^<CR>)'

and used re.MULTILINE¦re.DOTALL

[edited by: eelixduppy at 3:18 pm (utc) on Nov. 16, 2009]
[edit reason] disabled smileys [/edit]

TheMadScientist

3:29 am on Nov 16, 2009 (gmt 0)

I can't tell you the regex, because this isn't a language I work with, but I can tell you what I would consider doing, in theory...

1.) Write a regex to replace < or > following <comment> before the end of the line with &lg; & > respectively. (Basically, change what you have now that breaks at the end of the line to change from <> to the HTML equiv.)

2.) In PHP's preg regexs you can define a multiline match with the m modifier, which is placed after the end delimiter (most people use /) so you would have /regex_here/m and you can detect a new line with \n and a carriage return with \r I would look for the equivalents in Python. (In PHP you would use [\r\n]{1,2} which would match either: \r, \n, or \r\n.)

3.) Replace the incorrect line breaks.

4.) Replace the &lg; & > with < or > up to the <CR> using the same expression you are using.

5.) If you need to replace <CR> with whatever you like after you have the rest done, because it gives you a 'stopping point' for the rest of your expressions.

BTW: Welcome to WebmasterWorld!
And we were posting at the same time...
Glad you got it working.

MountainX

4:33 am on Nov 16, 2009 (gmt 0)

Thank you!