Forum Moderators: coopster
I have a python script with the following regex:
regExpression = r'(?P<comment>[^<]*)'
It is used to parse text like this:
testing blah blah blah can include anything such as < or , or 123, etc.
<CR>
It has 2 problems:
1. If the comment contains "<" it doesn't work correctly.
2. The comment includes the trailing newline, which is incorrect.
So what I need to change is that the regex should parse anything and everything up to and before a newline (Win or Linux) followed by <CR> or the end of the file.
To further clarify, the regex processes * but stops when it encounters the sequence newline<CR> -- or the end of the file. The newline<CR> sequence is **not** consumed.
Can anyone suggest a regex for doing this? Thanks.
1.) Write a regex to replace < or > following <comment> before the end of the line with ≶ & > respectively. (Basically, change what you have now that breaks at the end of the line to change from <> to the HTML equiv.)
2.) In PHP's preg regexs you can define a multiline match with the m modifier, which is placed after the end delimiter (most people use /) so you would have /regex_here/m and you can detect a new line with \n and a carriage return with \r I would look for the equivalents in Python. (In PHP you would use [\r\n]{1,2} which would match either: \r, \n, or \r\n.)
3.) Replace the incorrect line breaks.
4.) Replace the ≶ & > with < or > up to the <CR> using the same expression you are using.
5.) If you need to replace <CR> with whatever you like after you have the rest done, because it gives you a 'stopping point' for the rest of your expressions.
BTW: Welcome to WebmasterWorld!
And we were posting at the same time...
Glad you got it working.