homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / XML Development
Forum Library, Charter, Moderators: httpwebwitch

XML Development Forum

Replace special characters in xml but do not replace tags

 8:23 am on Apr 16, 2010 (gmt 0)

Hi friends,

My requirement is that, i recieve a xml file that may contain characters like "<" or ">".

for example like this
<?xml version="1.0" encoding="ISO-8859-1" ?>
<RecordDivision> abc>efg</RecordDivision>

now i want to replace the ">" character in the string abc>efg.

this is just a sample xml. i may recieve any type of xml.

So when i use some other methods i found by googling, they are replacing the xml tags characters "<" ,">" also.

My requirement is tags should not be replaced but the content should be.

Any idea?

i am using c#




 3:40 am on Apr 18, 2010 (gmt 0)

Sorry, it's impossible.

Consider, if you loop a character at a time through the string, and flag if you're "inside" a tag or "outside" a tag. "<" and ">" characters can act like a switch going from one state to the other. If you're outside and encounter a ">", then you could safely replace it with "&gt;". If you're inside and encounter a "<", then that can be replaced.

But what if you're outside, and encounter a "<" in the content? And what if your content contains both a "<" and a ">"?

<equation>Let a < b and c > d.</equation>

Maybe you'll expect that an element only contains a-Z, and can't begin with whitespace... that will help you in some situations. But not all, and not reliably.

You can try lots of parsing gymnastics figuring out which parts are elements and which are not... but

>>> i may recieve any type of xml

that's the problem. If you could expect that bad characters might appear only within certain kinds of elements, then maybe you could solve this with some crafty regular expressions and such. But a generic solution to fix any bad XML does not exist.

there is a good reason why special chars are escaped in XML, and why XML parsers do not accept invalid syntax.

perhaps this is a moot question, but... why are you being given invalid XML?


 8:09 am on Apr 18, 2010 (gmt 0)

Have you looked into CDATA?



 1:43 pm on Apr 19, 2010 (gmt 0)

DC is right

if you know that there's an element named "whatever"

replace "<whatever>" with "<whatever><![CDATA["
and "</whatever>" with "]]></whatever>"

<whatever><![CDATA[ a<b<c<d<e<f<g>h>i>j>k>l>m>N>o>P ]]></whatever>

It'd be helpful if you have a DTD or Schema for the XML

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / XML Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved