Welcome to WebmasterWorld Guest from 54.167.155.147

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

LWP::UserAgent malformed text

   
12:09 pm on Nov 12, 2010 (gmt 0)

10+ Year Member



Hey guys, I am using LWP::UserAgent to import data from a website into a database but I get malformed text. When I look in the source of the original I see chars like ok but when imported I get weird chars like below. Any ideas?

Wir knnen -> Wir können
TV absolvieren -> TÜV absolvieren

Cheers,

Ton
3:52 pm on Nov 12, 2010 (gmt 0)

5+ Year Member



You receive data in utf8 format. See [perldoc.perl.org...] [search.cpan.org...]

If you don;t activate utf8 support in DB connection, Db will not know that you receive utf8.

Possible solution is to use Encode module [search.cpan.org].
4:17 pm on Nov 12, 2010 (gmt 0)

10+ Year Member



Hey Chorny, thank you for your reply. I just found the solution.

I changed:

$Source2 = $res->content;

into:

$Source2 = $res->decoded_content;

Works great now;-)

Cherio!
5:25 pm on Nov 12, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Yupp, that's what I was going to suggest when I read your first post.
I ran into that a few times. Does anyone know if ->decoded_content has any major drawbacks? Most people use ->content, even in documentation, and don't worry about character encoding.