Welcome to WebmasterWorld Guest from 54.227.5.198

Forum Moderators: incrediBILL

Message Too Old, No Replies

regular expression find across line breaks

text between two strings

     

santapaws

7:47 pm on Sep 26, 2012 (gmt 0)

5+ Year Member



I am trying to load a string into a variable. The string im trying to pick out is between a div and its closing tag. The text after the div may span line breaks. Im trying to come up with an expression to capture the text between <div class="tag"> and its closing </div>

i want to include the div in the variable. I have come up with
(?=<div\sclass="tag")(.|\n)*.(?<=</div>)

this is to cover instances where the text between the tags may be on a different line.
but when i run it the variable includes text after the prefix (</div>), so i realise its matching to the end of the page and then the last closing div on that page instead of the one straight after the suffix (<div class="tag">.
So my question is how to match between that suffix and the next closing div instead of last closing div.
Thanks for any help.

g1smd

8:33 pm on Sep 26, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



One thing you need to make sure you avoid is excessive backtracking:
[regular-expressions.info...]

lucy24

12:13 am on Sep 27, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You left out one rather crucial bit of information: a variable in what? Obviously not in HTML itself; it doesn't speak RegEx and isn't technically a language at all. No two languages speak exactly the same RegEx dialect.

Many programs distinguish between single-line and multi-line mode: the ^ and $ anchors can refer either to the very beginning and end of the whole document, or to every single line break.

The string im trying to pick out is between a div and its closing tag.

Er, you mean the string is within the div? i.e. between its opening and closing tags?

Do you know anything about what's inside the div, like what other tags might occur? I'm thinking something like

<div class = "blahblah">(([^<]*</?(?:p|i|span)(?: class = "\w+(?: \w+)*")?>)*[^<]*)</div>


but don't take my word for it without counting parentheses on your own fingers. If there are subsidiary divs, nest deeper. It can still be done.


Tip: "Disable graphic smile faces for this post" will look as if it isn't working in Preview, but in the real post it's just what you need. "Code" tags achieve the same purpose.

santapaws

10:14 pm on Sep 27, 2012 (gmt 0)

5+ Year Member



thanks for the replys. I am trying use regex to store variables in winautomation. I managed to sort the original problem by adding a '?' to the end of (.|\n)*.

Lucy thanks for the code. I will try and understand it and use that.

swa66

10:57 pm on Sep 27, 2012 (gmt 0)

WebmasterWorld Senior Member swa66 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I've no clue what type of regexps "winautomation" uses (whatever it is).

In perl regular expression:

*? is typically an ungreedy match instead of the default greedy match.

e.g.:
data: = "mississippi";
regexp = "/(i.*s)/"
would give "ississ"

data = "mississippi";
regexp = "/(i.*?s)/"
would give "is"

There's a modifier to treat the data as a whole as one line it's "m":

data= "a\nb\nc\n"
regexp = "/a\nb/m"
would match

The easier way to deal with complex data is to parse the html as xml ... but that requires good xhtml which is far too rare. That's how I do things - it's also much more efficient than very complex regexps will be.

santapaws

6:57 pm on Sep 29, 2012 (gmt 0)

5+ Year Member



swa66 thanks for that. Winautomation is a macro builder for guys like me who cant program.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month