homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

regular expression find across line breaks
text between two strings

 7:47 pm on Sep 26, 2012 (gmt 0)

I am trying to load a string into a variable. The string im trying to pick out is between a div and its closing tag. The text after the div may span line breaks. Im trying to come up with an expression to capture the text between <div class="tag"> and its closing </div>

i want to include the div in the variable. I have come up with

this is to cover instances where the text between the tags may be on a different line.
but when i run it the variable includes text after the prefix (</div>), so i realise its matching to the end of the page and then the last closing div on that page instead of the one straight after the suffix (<div class="tag">.
So my question is how to match between that suffix and the next closing div instead of last closing div.
Thanks for any help.



 8:33 pm on Sep 26, 2012 (gmt 0)

One thing you need to make sure you avoid is excessive backtracking:


 12:13 am on Sep 27, 2012 (gmt 0)

You left out one rather crucial bit of information: a variable in what? Obviously not in HTML itself; it doesn't speak RegEx and isn't technically a language at all. No two languages speak exactly the same RegEx dialect.

Many programs distinguish between single-line and multi-line mode: the ^ and $ anchors can refer either to the very beginning and end of the whole document, or to every single line break.

The string im trying to pick out is between a div and its closing tag.

Er, you mean the string is within the div? i.e. between its opening and closing tags?

Do you know anything about what's inside the div, like what other tags might occur? I'm thinking something like

<div class = "blahblah">(([^<]*</?(?:p|i|span)(?: class = "\w+(?: \w+)*")?>)*[^<]*)</div>

but don't take my word for it without counting parentheses on your own fingers. If there are subsidiary divs, nest deeper. It can still be done.

Tip: "Disable graphic smile faces for this post" will look as if it isn't working in Preview, but in the real post it's just what you need. "Code" tags achieve the same purpose.


 10:14 pm on Sep 27, 2012 (gmt 0)

thanks for the replys. I am trying use regex to store variables in winautomation. I managed to sort the original problem by adding a '?' to the end of (.|\n)*.

Lucy thanks for the code. I will try and understand it and use that.


 10:57 pm on Sep 27, 2012 (gmt 0)

I've no clue what type of regexps "winautomation" uses (whatever it is).

In perl regular expression:

*? is typically an ungreedy match instead of the default greedy match.

data: = "mississippi";
regexp = "/(i.*s)/"
would give "ississ"

data = "mississippi";
regexp = "/(i.*?s)/"
would give "is"

There's a modifier to treat the data as a whole as one line it's "m":

data= "a\nb\nc\n"
regexp = "/a\nb/m"
would match

The easier way to deal with complex data is to parse the html as xml ... but that requires good xhtml which is far too rare. That's how I do things - it's also much more efficient than very complex regexps will be.


 6:57 pm on Sep 29, 2012 (gmt 0)

swa66 thanks for that. Winautomation is a macro builder for guys like me who cant program.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved