homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Microsoft / Microsoft IIS Web Server and ASP.NET
Forum Library, Charter, Moderators: ocean10000

Microsoft IIS Web Server and ASP.NET Forum

    
Linq / Reading XML results in 'little boxes' for UTF-8 characters
tim222

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4203016 posted 10:14 pm on Sep 16, 2010 (gmt 0)

I am using Linq-to-SQL in Visual Studio 2008 in order to parse out an XML file so I can store it in a database. My problem is that while reading in the XML file, UTF-8 characters are not recognized and are interpreted as square boxes.

This is my first time using Linq-to-SQL so Iím probably missing an important step that tells .NET that this is a UTF-8 file. I am using Visual Basic, although I can do this in C# if that makes a difference.

SampleFile.xml:
<?xml version="1.0" encoding="UTF-8"?>
<houses>
<house>Bob’s House</house>
</houses>

The VB code that reads in SampleFile.xml is as follows:

Dim _sampleFile As XDocument = XDocument.Load("C:\XML\SampleFile.xml")

I set a breakpoint immediately after the above line. Here is the value in _sampleFile:
<houses>
<house>Bob[]s House</house>
</houses>
(This forum will not allow me to paste the actual box character, but I'm sure you've seen them in other UTF-8 encoding issues. So in my sample above, I represented the box character with two brackets, like this: [] ).

Does anyone know how to prevent this from happening? I have searched for a couple of days but I do not see others mention this as a problem, so I couldnít find a solution. This indicates to me that Iím missing a fundamental step in the process. Can anyone offer a suggestion?

Thanks!

 

tim222

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4203016 posted 8:17 pm on Sep 20, 2010 (gmt 0)

I figured out the solution to this, so I'll post it here in case someone else has the same problem and finds this thread in the future.

The issue was not caused by VB code at all. In fact, it was the text file itself that contained the XML which caused the problem. Although the encoding was declared in XML as "UTF-8," the text file was saved as ASCII. The simple solution to this was to open the XML file in Wordpad, and then Save as type "Unicode Text Document." After that, VB/Linq was able to recognize the Unicode characters.

marcel

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4203016 posted 8:40 am on Sep 21, 2010 (gmt 0)

Thanks for letting us know the solution to the problem!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Microsoft IIS Web Server and ASP.NET
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved