I am using Linq-to-SQL in Visual Studio 2008 in order to parse out an XML file so I can store it in a database. My problem is that while reading in the XML file, UTF-8 characters are not recognized and are interpreted as square boxes.
This is my first time using Linq-to-SQL so I’m probably missing an important step that tells .NET that this is a UTF-8 file. I am using Visual Basic, although I can do this in C# if that makes a difference.
SampleFile.xml:
<?xml version="1.0" encoding="UTF-8"?>
<houses>
<house>BobÂ’s House</house>
</houses>
The VB code that reads in SampleFile.xml is as follows:
Dim _sampleFile As XDocument = XDocument.Load("C:\XML\SampleFile.xml")
I set a breakpoint immediately after the above line. Here is the value in _sampleFile:
<houses>
<house>Bob[]s House</house>
</houses>
(This forum will not allow me to paste the actual box character, but I'm sure you've seen them in other UTF-8 encoding issues. So in my sample above, I represented the box character with two brackets, like this: [] ).
Does anyone know how to prevent this from happening? I have searched for a couple of days but I do not see others mention this as a problem, so I couldn’t find a solution. This indicates to me that I’m missing a fundamental step in the process. Can anyone offer a suggestion?
Thanks!