Welcome to WebmasterWorld Guest from 54.145.166.96

Forum Moderators: incrediBILL

Message Too Old, No Replies

looking to strip only certain tag

regex noob

   
9:12 pm on Sep 28, 2010 (gmt 0)

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



I have a large file that I need to parse, but first I want to get rid of some extraneous tags. For example:
<GOODTAG>
Good stuff.
<TAG attribute="blah blah>
yadya yada yada <INNERTAG attribute="more blah" />
</TAG>
<GOODTAG>

I want to be left with just:
<GOODTAG>
Good stuff.
<GOODTAG>

The following regex isn't working:
<TAG\b[^>]*>(.*?)</TAG>

What am I doing wrong?
9:16 pm on Sep 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is this a problem with regex not matching because of tabs and new lines?

Have you tried replacing \t and \n
9:24 pm on Sep 28, 2010 (gmt 0)

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



No- that's not the issue. Maybe I should have given a better example:
<GOODTAG>
Good stuff.
<TAG attribute="blah blah>yadya yada yada<INNERTAG attribute="more blah" /> </TAG>
<GOODTAG>

The "offending" tag and everything in between are on one line.
10:12 pm on Sep 28, 2010 (gmt 0)

5+ Year Member



Your regex works perfectly with your example in perl. Maybe if you told us with language/tool you are using, and the full syntax you use?

Jacques.
12:33 am on Sep 29, 2010 (gmt 0)

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



I'm using ColdFusion with the REReplace tag:
<cfset NewXML=REReplace(OldXML,"<ListingTag\b[^>]*>(.*?)</ListingTag>","","ALL")>

A sample line:
<ListingTag type='PROPERTY_AMENITY'><tag> High Speed Internet Access</tag></ListingTag>

The format for REReplace is REReplace(string,regular expression,substring,scope)

I've even tried REReplaceNoCase, which ignores case- still no replacements.
1:44 am on Sep 29, 2010 (gmt 0)

5+ Year Member



Does CF support the *? notation? I don't see it in the documentation. It shouldn't matter much though (it wouldn't necessarily give the intended result, but here it should work). Also the () aren't useful though it shouldn't change the result.

Never used CF, but the \b might need to be double-escaped, i.e. \\b (first escape in string context, second escape in RE context)? Documentation does seem to imply it's not the case though.

Maybe someone else knows more about CF...

Jacques.
12:10 pm on Sep 29, 2010 (gmt 0)

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Ah, it looks like the version of CF we are running may not support the *? notation (or some other part of the regular expression). I tried it on a newer version that we have running on a Development server and it worked as advertised.

Thanks for the help.

Now if I can just find some glue to put back all the hair I pulled out...
 

Featured Threads

Hot Threads This Week

Hot Threads This Month