homepage Welcome to WebmasterWorld Guest from 54.145.209.80
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
looking to strip only certain tag
regex noob
LifeinAsia

WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4208400 posted 9:12 pm on Sep 28, 2010 (gmt 0)

I have a large file that I need to parse, but first I want to get rid of some extraneous tags. For example:
<GOODTAG>
Good stuff.
<TAG attribute="blah blah>
yadya yada yada <INNERTAG attribute="more blah" />
</TAG>
<GOODTAG>

I want to be left with just:
<GOODTAG>
Good stuff.
<GOODTAG>

The following regex isn't working:
<TAG\b[^>]*>(.*?)</TAG>

What am I doing wrong?

 

Frank_Rizzo

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4208400 posted 9:16 pm on Sep 28, 2010 (gmt 0)

Is this a problem with regex not matching because of tabs and new lines?

Have you tried replacing \t and \n

LifeinAsia

WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4208400 posted 9:24 pm on Sep 28, 2010 (gmt 0)

No- that's not the issue. Maybe I should have given a better example:
<GOODTAG>
Good stuff.
<TAG attribute="blah blah>yadya yada yada<INNERTAG attribute="more blah" /> </TAG>
<GOODTAG>

The "offending" tag and everything in between are on one line.

jcaron

5+ Year Member



 
Msg#: 4208400 posted 10:12 pm on Sep 28, 2010 (gmt 0)

Your regex works perfectly with your example in perl. Maybe if you told us with language/tool you are using, and the full syntax you use?

Jacques.

LifeinAsia

WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4208400 posted 12:33 am on Sep 29, 2010 (gmt 0)

I'm using ColdFusion with the REReplace tag:
<cfset NewXML=REReplace(OldXML,"<ListingTag\b[^>]*>(.*?)</ListingTag>","","ALL")>

A sample line:
<ListingTag type='PROPERTY_AMENITY'><tag> High Speed Internet Access</tag></ListingTag>

The format for REReplace is REReplace(string,regular expression,substring,scope)

I've even tried REReplaceNoCase, which ignores case- still no replacements.

jcaron

5+ Year Member



 
Msg#: 4208400 posted 1:44 am on Sep 29, 2010 (gmt 0)

Does CF support the *? notation? I don't see it in the documentation. It shouldn't matter much though (it wouldn't necessarily give the intended result, but here it should work). Also the () aren't useful though it shouldn't change the result.

Never used CF, but the \b might need to be double-escaped, i.e. \\b (first escape in string context, second escape in RE context)? Documentation does seem to imply it's not the case though.

Maybe someone else knows more about CF...

Jacques.

LifeinAsia

WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4208400 posted 12:10 pm on Sep 29, 2010 (gmt 0)

Ah, it looks like the version of CF we are running may not support the *? notation (or some other part of the regular expression). I tried it on a newer version that we have running on a Development server and it worked as advertised.

Thanks for the help.

Now if I can just find some glue to put back all the hair I pulled out...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved