Msg#: 4116763 posted 3:40 am on Apr 18, 2010 (gmt 0)
Sorry, it's impossible.
Consider, if you loop a character at a time through the string, and flag if you're "inside" a tag or "outside" a tag. "<" and ">" characters can act like a switch going from one state to the other. If you're outside and encounter a ">", then you could safely replace it with ">". If you're inside and encounter a "<", then that can be replaced.
But what if you're outside, and encounter a "<" in the content? And what if your content contains both a "<" and a ">"?
example: <equation>Let a < b and c > d.</equation>
Maybe you'll expect that an element only contains a-Z, and can't begin with whitespace... that will help you in some situations. But not all, and not reliably.
You can try lots of parsing gymnastics figuring out which parts are elements and which are not... but
>>> i may recieve any type of xml
that's the problem. If you could expect that bad characters might appear only within certain kinds of elements, then maybe you could solve this with some crafty regular expressions and such. But a generic solution to fix any bad XML does not exist.
there is a good reason why special chars are escaped in XML, and why XML parsers do not accept invalid syntax.
perhaps this is a moot question, but... why are you being given invalid XML?