Welcome to WebmasterWorld Guest from 54.167.46.29

Forum Moderators: httpwebwitch

Message Too Old, No Replies

Replace special characters in xml but do not replace tags

     
8:23 am on Apr 16, 2010 (gmt 0)

New User

5+ Year Member

joined:Apr 16, 2010
posts:1
votes: 0


Hi friends,

My requirement is that, i recieve a xml file that may contain characters like "<" or ">".

for example like this
<?xml version="1.0" encoding="ISO-8859-1" ?>
<Incident>
<RecordDivision> abc>efg</RecordDivision>
<Incident>

now i want to replace the ">" character in the string abc>efg.

this is just a sample xml. i may recieve any type of xml.

So when i use some other methods i found by googling, they are replacing the xml tags characters "<" ,">" also.

My requirement is tags should not be replaced but the content should be.

Any idea?

i am using c#

Thanks
3:40 am on Apr 18, 2010 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


Sorry, it's impossible.

Consider, if you loop a character at a time through the string, and flag if you're "inside" a tag or "outside" a tag. "<" and ">" characters can act like a switch going from one state to the other. If you're outside and encounter a ">", then you could safely replace it with "&gt;". If you're inside and encounter a "<", then that can be replaced.

But what if you're outside, and encounter a "<" in the content? And what if your content contains both a "<" and a ">"?

example:
<equation>Let a < b and c > d.</equation>

Maybe you'll expect that an element only contains a-Z, and can't begin with whitespace... that will help you in some situations. But not all, and not reliably.

You can try lots of parsing gymnastics figuring out which parts are elements and which are not... but

>>> i may recieve any type of xml

that's the problem. If you could expect that bad characters might appear only within certain kinds of elements, then maybe you could solve this with some crafty regular expressions and such. But a generic solution to fix any bad XML does not exist.

there is a good reason why special chars are escaped in XML, and why XML parsers do not accept invalid syntax.

perhaps this is a moot question, but... why are you being given invalid XML?
8:09 am on Apr 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member dreamcatcher is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 30, 2003
posts:3719
votes: 0


Have you looked into CDATA?
[w3schools.com...]

dc
1:43 pm on Apr 19, 2010 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


DC is right

if you know that there's an element named "whatever"

replace "<whatever>" with "<whatever><![CDATA["
and "</whatever>" with "]]></whatever>"

<whatever><![CDATA[ a<b<c<d<e<f<g>h>i>j>k>l>m>N>o>P ]]></whatever>

It'd be helpful if you have a DTD or Schema for the XML