homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

looking to strip only certain tag
regex noob

 9:12 pm on Sep 28, 2010 (gmt 0)

I have a large file that I need to parse, but first I want to get rid of some extraneous tags. For example:
Good stuff.
<TAG attribute="blah blah>
yadya yada yada <INNERTAG attribute="more blah" />

I want to be left with just:
Good stuff.

The following regex isn't working:

What am I doing wrong?



 9:16 pm on Sep 28, 2010 (gmt 0)

Is this a problem with regex not matching because of tabs and new lines?

Have you tried replacing \t and \n


 9:24 pm on Sep 28, 2010 (gmt 0)

No- that's not the issue. Maybe I should have given a better example:
Good stuff.
<TAG attribute="blah blah>yadya yada yada<INNERTAG attribute="more blah" /> </TAG>

The "offending" tag and everything in between are on one line.


 10:12 pm on Sep 28, 2010 (gmt 0)

Your regex works perfectly with your example in perl. Maybe if you told us with language/tool you are using, and the full syntax you use?



 12:33 am on Sep 29, 2010 (gmt 0)

I'm using ColdFusion with the REReplace tag:
<cfset NewXML=REReplace(OldXML,"<ListingTag\b[^>]*>(.*?)</ListingTag>","","ALL")>

A sample line:
<ListingTag type='PROPERTY_AMENITY'><tag> High Speed Internet Access</tag></ListingTag>

The format for REReplace is REReplace(string,regular expression,substring,scope)

I've even tried REReplaceNoCase, which ignores case- still no replacements.


 1:44 am on Sep 29, 2010 (gmt 0)

Does CF support the *? notation? I don't see it in the documentation. It shouldn't matter much though (it wouldn't necessarily give the intended result, but here it should work). Also the () aren't useful though it shouldn't change the result.

Never used CF, but the \b might need to be double-escaped, i.e. \\b (first escape in string context, second escape in RE context)? Documentation does seem to imply it's not the case though.

Maybe someone else knows more about CF...



 12:10 pm on Sep 29, 2010 (gmt 0)

Ah, it looks like the version of CF we are running may not support the *? notation (or some other part of the regular expression). I tried it on a newer version that we have running on a Development server and it worked as advertised.

Thanks for the help.

Now if I can just find some glue to put back all the hair I pulled out...

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved