I am trying to load a string into a variable. The string im trying to pick out is between a div and its closing tag. The text after the div may span line breaks. Im trying to come up with an expression to capture the text between <div class="tag"> and its closing </div>
i want to include the div in the variable. I have come up with (?=<div\sclass="tag")(.|\n)*.(?<=</div>)
this is to cover instances where the text between the tags may be on a different line. but when i run it the variable includes text after the prefix (</div>), so i realise its matching to the end of the page and then the last closing div on that page instead of the one straight after the suffix (<div class="tag">. So my question is how to match between that suffix and the next closing div instead of last closing div. Thanks for any help.
Msg#: 4500354 posted 12:13 am on Sep 27, 2012 (gmt 0)
You left out one rather crucial bit of information: a variable in what? Obviously not in HTML itself; it doesn't speak RegEx and isn't technically a language at all. No two languages speak exactly the same RegEx dialect.
Many programs distinguish between single-line and multi-line mode: the ^ and $ anchors can refer either to the very beginning and end of the whole document, or to every single line break.
The string im trying to pick out is between a div and its closing tag.
Er, you mean the string is within the div? i.e. between its opening and closing tags?
Do you know anything about what's inside the div, like what other tags might occur? I'm thinking something like
<div class = "blahblah">(([^<]*</?(?:p|i|span)(?: class = "\w+(?: \w+)*")?>)*[^<]*)</div>
but don't take my word for it without counting parentheses on your own fingers. If there are subsidiary divs, nest deeper. It can still be done.
Tip: "Disable graphic smile faces for this post" will look as if it isn't working in Preview, but in the real post it's just what you need. "Code" tags achieve the same purpose.
I've no clue what type of regexps "winautomation" uses (whatever it is).
In perl regular expression:
*? is typically an ungreedy match instead of the default greedy match.
e.g.: data: = "mississippi"; regexp = "/(i.*s)/" would give "ississ"
data = "mississippi"; regexp = "/(i.*?s)/" would give "is"
There's a modifier to treat the data as a whole as one line it's "m":
data= "a\nb\nc\n" regexp = "/a\nb/m" would match
The easier way to deal with complex data is to parse the html as xml ... but that requires good xhtml which is far too rare. That's how I do things - it's also much more efficient than very complex regexps will be.