homepage Welcome to WebmasterWorld Guest from 54.196.194.204
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
looking to strip only certain tag
regex noob
LifeinAsia




msg:4208402
 9:12 pm on Sep 28, 2010 (gmt 0)

I have a large file that I need to parse, but first I want to get rid of some extraneous tags. For example:
<GOODTAG>
Good stuff.
<TAG attribute="blah blah>
yadya yada yada <INNERTAG attribute="more blah" />
</TAG>
<GOODTAG>

I want to be left with just:
<GOODTAG>
Good stuff.
<GOODTAG>

The following regex isn't working:
<TAG\b[^>]*>(.*?)</TAG>

What am I doing wrong?

 

Frank_Rizzo




msg:4208403
 9:16 pm on Sep 28, 2010 (gmt 0)

Is this a problem with regex not matching because of tabs and new lines?

Have you tried replacing \t and \n

LifeinAsia




msg:4208410
 9:24 pm on Sep 28, 2010 (gmt 0)

No- that's not the issue. Maybe I should have given a better example:
<GOODTAG>
Good stuff.
<TAG attribute="blah blah>yadya yada yada<INNERTAG attribute="more blah" /> </TAG>
<GOODTAG>

The "offending" tag and everything in between are on one line.

jcaron




msg:4208439
 10:12 pm on Sep 28, 2010 (gmt 0)

Your regex works perfectly with your example in perl. Maybe if you told us with language/tool you are using, and the full syntax you use?

Jacques.

LifeinAsia




msg:4208482
 12:33 am on Sep 29, 2010 (gmt 0)

I'm using ColdFusion with the REReplace tag:
<cfset NewXML=REReplace(OldXML,"<ListingTag\b[^>]*>(.*?)</ListingTag>","","ALL")>

A sample line:
<ListingTag type='PROPERTY_AMENITY'><tag> High Speed Internet Access</tag></ListingTag>

The format for REReplace is REReplace(string,regular expression,substring,scope)

I've even tried REReplaceNoCase, which ignores case- still no replacements.

jcaron




msg:4208493
 1:44 am on Sep 29, 2010 (gmt 0)

Does CF support the *? notation? I don't see it in the documentation. It shouldn't matter much though (it wouldn't necessarily give the intended result, but here it should work). Also the () aren't useful though it shouldn't change the result.

Never used CF, but the \b might need to be double-escaped, i.e. \\b (first escape in string context, second escape in RE context)? Documentation does seem to imply it's not the case though.

Maybe someone else knows more about CF...

Jacques.

LifeinAsia




msg:4208671
 12:10 pm on Sep 29, 2010 (gmt 0)

Ah, it looks like the version of CF we are running may not support the *? notation (or some other part of the regular expression). I tried it on a newer version that we have running on a Development server and it worked as advertised.

Thanks for the help.

Now if I can just find some glue to put back all the hair I pulled out...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved