I am using Linq-to-SQL in Visual Studio 2008 in order to parse out an XML file so I can store it in a database. My problem is that while reading in the XML file, UTF-8 characters are not recognized and are interpreted as square boxes.
This is my first time using Linq-to-SQL so Iím probably missing an important step that tells .NET that this is a UTF-8 file. I am using Visual Basic, although I can do this in C# if that makes a difference.
The VB code that reads in SampleFile.xml is as follows:
Dim _sampleFile As XDocument = XDocument.Load("C:\XML\SampleFile.xml")
I set a breakpoint immediately after the above line. Here is the value in _sampleFile:
<houses> <house>Bobs House</house> </houses>
(This forum will not allow me to paste the actual box character, but I'm sure you've seen them in other UTF-8 encoding issues. So in my sample above, I represented the box character with two brackets, like this:  ).
Does anyone know how to prevent this from happening? I have searched for a couple of days but I do not see others mention this as a problem, so I couldnít find a solution. This indicates to me that Iím missing a fundamental step in the process. Can anyone offer a suggestion?
I figured out the solution to this, so I'll post it here in case someone else has the same problem and finds this thread in the future.
The issue was not caused by VB code at all. In fact, it was the text file itself that contained the XML which caused the problem. Although the encoding was declared in XML as "UTF-8," the text file was saved as ASCII. The simple solution to this was to open the XML file in Wordpad, and then Save as type "Unicode Text Document." After that, VB/Linq was able to recognize the Unicode characters.