Welcome to WebmasterWorld Guest from 54.166.191.159

Forum Moderators: ocean10000

Message Too Old, No Replies

Linq / Reading XML results in 'little boxes' for UTF-8 characters

     
10:14 pm on Sep 16, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I am using Linq-to-SQL in Visual Studio 2008 in order to parse out an XML file so I can store it in a database. My problem is that while reading in the XML file, UTF-8 characters are not recognized and are interpreted as square boxes.

This is my first time using Linq-to-SQL so Iím probably missing an important step that tells .NET that this is a UTF-8 file. I am using Visual Basic, although I can do this in C# if that makes a difference.

SampleFile.xml:
<?xml version="1.0" encoding="UTF-8"?>
<houses>
<house>Bob’s House</house>
</houses>

The VB code that reads in SampleFile.xml is as follows:

Dim _sampleFile As XDocument = XDocument.Load("C:\XML\SampleFile.xml")

I set a breakpoint immediately after the above line. Here is the value in _sampleFile:
<houses>
<house>Bob[]s House</house>
</houses>
(This forum will not allow me to paste the actual box character, but I'm sure you've seen them in other UTF-8 encoding issues. So in my sample above, I represented the box character with two brackets, like this: [] ).

Does anyone know how to prevent this from happening? I have searched for a couple of days but I do not see others mention this as a problem, so I couldnít find a solution. This indicates to me that Iím missing a fundamental step in the process. Can anyone offer a suggestion?

Thanks!
8:17 pm on Sep 20, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I figured out the solution to this, so I'll post it here in case someone else has the same problem and finds this thread in the future.

The issue was not caused by VB code at all. In fact, it was the text file itself that contained the XML which caused the problem. Although the encoding was declared in XML as "UTF-8," the text file was saved as ASCII. The simple solution to this was to open the XML file in Wordpad, and then Save as type "Unicode Text Document." After that, VB/Linq was able to recognize the Unicode characters.
8:40 am on Sep 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for letting us know the solution to the problem!
 

Featured Threads

Hot Threads This Week

Hot Threads This Month