Welcome to WebmasterWorld Guest from 54.234.38.8

Forum Moderators: incrediBILL

Message Too Old, No Replies

looking to strip only certain tag

regex noob

     
9:12 pm on Sep 28, 2010 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5551
votes: 24


I have a large file that I need to parse, but first I want to get rid of some extraneous tags. For example:
<GOODTAG>
Good stuff.
<TAG attribute="blah blah>
yadya yada yada <INNERTAG attribute="more blah" />
</TAG>
<GOODTAG>

I want to be left with just:
<GOODTAG>
Good stuff.
<GOODTAG>

The following regex isn't working:
<TAG\b[^>]*>(.*?)</TAG>

What am I doing wrong?
9:16 pm on Sept 28, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 17, 2002
posts:1181
votes: 5


Is this a problem with regex not matching because of tabs and new lines?

Have you tried replacing \t and \n
9:24 pm on Sept 28, 2010 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5551
votes: 24


No- that's not the issue. Maybe I should have given a better example:
<GOODTAG>
Good stuff.
<TAG attribute="blah blah>yadya yada yada<INNERTAG attribute="more blah" /> </TAG>
<GOODTAG>

The "offending" tag and everything in between are on one line.
10:12 pm on Sept 28, 2010 (gmt 0)

New User

10+ Year Member

joined:July 15, 2005
posts: 23
votes: 0


Your regex works perfectly with your example in perl. Maybe if you told us with language/tool you are using, and the full syntax you use?

Jacques.
12:33 am on Sept 29, 2010 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5551
votes: 24


I'm using ColdFusion with the REReplace tag:
<cfset NewXML=REReplace(OldXML,"<ListingTag\b[^>]*>(.*?)</ListingTag>","","ALL")>

A sample line:
<ListingTag type='PROPERTY_AMENITY'><tag> High Speed Internet Access</tag></ListingTag>

The format for REReplace is REReplace(string,regular expression,substring,scope)

I've even tried REReplaceNoCase, which ignores case- still no replacements.
1:44 am on Sept 29, 2010 (gmt 0)

New User

10+ Year Member

joined:July 15, 2005
posts: 23
votes: 0


Does CF support the *? notation? I don't see it in the documentation. It shouldn't matter much though (it wouldn't necessarily give the intended result, but here it should work). Also the () aren't useful though it shouldn't change the result.

Never used CF, but the \b might need to be double-escaped, i.e. \\b (first escape in string context, second escape in RE context)? Documentation does seem to imply it's not the case though.

Maybe someone else knows more about CF...

Jacques.
12:10 pm on Sept 29, 2010 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5551
votes: 24


Ah, it looks like the version of CF we are running may not support the *? notation (or some other part of the regular expression). I tried it on a newer version that we have running on a Development server and it worked as advertised.

Thanks for the help.

Now if I can just find some glue to put back all the hair I pulled out...