homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
regular expression find across line breaks
text between two strings
santapaws

5+ Year Member



 
Msg#: 4500354 posted 7:47 pm on Sep 26, 2012 (gmt 0)

I am trying to load a string into a variable. The string im trying to pick out is between a div and its closing tag. The text after the div may span line breaks. Im trying to come up with an expression to capture the text between <div class="tag"> and its closing </div>

i want to include the div in the variable. I have come up with
(?=<div\sclass="tag")(.|\n)*.(?<=</div>)

this is to cover instances where the text between the tags may be on a different line.
but when i run it the variable includes text after the prefix (</div>), so i realise its matching to the end of the page and then the last closing div on that page instead of the one straight after the suffix (<div class="tag">.
So my question is how to match between that suffix and the next closing div instead of last closing div.
Thanks for any help.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4500354 posted 8:33 pm on Sep 26, 2012 (gmt 0)

One thing you need to make sure you avoid is excessive backtracking:
[regular-expressions.info...]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4500354 posted 12:13 am on Sep 27, 2012 (gmt 0)

You left out one rather crucial bit of information: a variable in what? Obviously not in HTML itself; it doesn't speak RegEx and isn't technically a language at all. No two languages speak exactly the same RegEx dialect.

Many programs distinguish between single-line and multi-line mode: the ^ and $ anchors can refer either to the very beginning and end of the whole document, or to every single line break.

The string im trying to pick out is between a div and its closing tag.

Er, you mean the string is within the div? i.e. between its opening and closing tags?

Do you know anything about what's inside the div, like what other tags might occur? I'm thinking something like

<div class = "blahblah">(([^<]*</?(?:p|i|span)(?: class = "\w+(?: \w+)*")?>)*[^<]*)</div>

but don't take my word for it without counting parentheses on your own fingers. If there are subsidiary divs, nest deeper. It can still be done.


Tip: "Disable graphic smile faces for this post" will look as if it isn't working in Preview, but in the real post it's just what you need. "Code" tags achieve the same purpose.

santapaws

5+ Year Member



 
Msg#: 4500354 posted 10:14 pm on Sep 27, 2012 (gmt 0)

thanks for the replys. I am trying use regex to store variables in winautomation. I managed to sort the original problem by adding a '?' to the end of (.|\n)*.

Lucy thanks for the code. I will try and understand it and use that.

swa66

WebmasterWorld Senior Member swa66 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4500354 posted 10:57 pm on Sep 27, 2012 (gmt 0)

I've no clue what type of regexps "winautomation" uses (whatever it is).

In perl regular expression:

*? is typically an ungreedy match instead of the default greedy match.

e.g.:
data: = "mississippi";
regexp = "/(i.*s)/"
would give "ississ"

data = "mississippi";
regexp = "/(i.*?s)/"
would give "is"

There's a modifier to treat the data as a whole as one line it's "m":

data= "a\nb\nc\n"
regexp = "/a\nb/m"
would match

The easier way to deal with complex data is to parse the html as xml ... but that requires good xhtml which is far too rare. That's how I do things - it's also much more efficient than very complex regexps will be.

santapaws

5+ Year Member



 
Msg#: 4500354 posted 6:57 pm on Sep 29, 2012 (gmt 0)

swa66 thanks for that. Winautomation is a macro builder for guys like me who cant program.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved