homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Microsoft / Microsoft IIS Web Server and ASP.NET
Forum Library, Charter, Moderators: ocean10000

Microsoft IIS Web Server and ASP.NET Forum

Linq / Reading XML results in 'little boxes' for UTF-8 characters

 10:14 pm on Sep 16, 2010 (gmt 0)

I am using Linq-to-SQL in Visual Studio 2008 in order to parse out an XML file so I can store it in a database. My problem is that while reading in the XML file, UTF-8 characters are not recognized and are interpreted as square boxes.

This is my first time using Linq-to-SQL so Iím probably missing an important step that tells .NET that this is a UTF-8 file. I am using Visual Basic, although I can do this in C# if that makes a difference.

<?xml version="1.0" encoding="UTF-8"?>
<house>Bob’s House</house>

The VB code that reads in SampleFile.xml is as follows:

Dim _sampleFile As XDocument = XDocument.Load("C:\XML\SampleFile.xml")

I set a breakpoint immediately after the above line. Here is the value in _sampleFile:
<house>Bob[]s House</house>
(This forum will not allow me to paste the actual box character, but I'm sure you've seen them in other UTF-8 encoding issues. So in my sample above, I represented the box character with two brackets, like this: [] ).

Does anyone know how to prevent this from happening? I have searched for a couple of days but I do not see others mention this as a problem, so I couldnít find a solution. This indicates to me that Iím missing a fundamental step in the process. Can anyone offer a suggestion?




 8:17 pm on Sep 20, 2010 (gmt 0)

I figured out the solution to this, so I'll post it here in case someone else has the same problem and finds this thread in the future.

The issue was not caused by VB code at all. In fact, it was the text file itself that contained the XML which caused the problem. Although the encoding was declared in XML as "UTF-8," the text file was saved as ASCII. The simple solution to this was to open the XML file in Wordpad, and then Save as type "Unicode Text Document." After that, VB/Linq was able to recognize the Unicode characters.


 8:40 am on Sep 21, 2010 (gmt 0)

Thanks for letting us know the solution to the problem!

Global Options:
 top home search open messages active posts  

Home / Forums Index / Microsoft / Microsoft IIS Web Server and ASP.NET
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved