Welcome to WebmasterWorld Guest from 54.158.36.59

Forum Moderators: httpwebwitch

Message Too Old, No Replies

Replace special characters in xml but do not replace tags

     
8:23 am on Apr 16, 2010 (gmt 0)

5+ Year Member



Hi friends,

My requirement is that, i recieve a xml file that may contain characters like "<" or ">".

for example like this
<?xml version="1.0" encoding="ISO-8859-1" ?>
<Incident>
<RecordDivision> abc>efg</RecordDivision>
<Incident>

now i want to replace the ">" character in the string abc>efg.

this is just a sample xml. i may recieve any type of xml.

So when i use some other methods i found by googling, they are replacing the xml tags characters "<" ,">" also.

My requirement is tags should not be replaced but the content should be.

Any idea?

i am using c#

Thanks
3:40 am on Apr 18, 2010 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Sorry, it's impossible.

Consider, if you loop a character at a time through the string, and flag if you're "inside" a tag or "outside" a tag. "<" and ">" characters can act like a switch going from one state to the other. If you're outside and encounter a ">", then you could safely replace it with "&gt;". If you're inside and encounter a "<", then that can be replaced.

But what if you're outside, and encounter a "<" in the content? And what if your content contains both a "<" and a ">"?

example:
<equation>Let a < b and c > d.</equation>

Maybe you'll expect that an element only contains a-Z, and can't begin with whitespace... that will help you in some situations. But not all, and not reliably.

You can try lots of parsing gymnastics figuring out which parts are elements and which are not... but

>>> i may recieve any type of xml

that's the problem. If you could expect that bad characters might appear only within certain kinds of elements, then maybe you could solve this with some crafty regular expressions and such. But a generic solution to fix any bad XML does not exist.

there is a good reason why special chars are escaped in XML, and why XML parsers do not accept invalid syntax.

perhaps this is a moot question, but... why are you being given invalid XML?
8:09 am on Apr 18, 2010 (gmt 0)

WebmasterWorld Senior Member dreamcatcher is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Have you looked into CDATA?
[w3schools.com...]

dc
1:43 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



DC is right

if you know that there's an element named "whatever"

replace "<whatever>" with "<whatever><![CDATA["
and "</whatever>" with "]]></whatever>"

<whatever><![CDATA[ a<b<c<d<e<f<g>h>i>j>k>l>m>N>o>P ]]></whatever>

It'd be helpful if you have a DTD or Schema for the XML
 

Featured Threads

Hot Threads This Week

Hot Threads This Month