Welcome to WebmasterWorld Guest from 107.20.54.98

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

LWP::UserAgent malformed text

     
12:09 pm on Nov 12, 2010 (gmt 0)

New User

10+ Year Member

joined:Feb 21, 2005
posts: 38
votes: 0


Hey guys, I am using LWP::UserAgent to import data from a website into a database but I get malformed text. When I look in the source of the original I see chars like ok but when imported I get weird chars like below. Any ideas?

Wir knnen -> Wir können
TV absolvieren -> TÜV absolvieren

Cheers,

Ton
3:52 pm on Nov 12, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:May 8, 2008
posts: 74
votes: 0


You receive data in utf8 format. See [perldoc.perl.org...] [search.cpan.org...]

If you don;t activate utf8 support in DB connection, Db will not know that you receive utf8.

Possible solution is to use Encode module [search.cpan.org].
4:17 pm on Nov 12, 2010 (gmt 0)

New User

10+ Year Member

joined:Feb 21, 2005
posts: 38
votes: 0


Hey Chorny, thank you for your reply. I just found the solution.

I changed:

$Source2 = $res->content;

into:

$Source2 = $res->decoded_content;

Works great now;-)

Cherio!
5:25 pm on Nov 12, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:May 31, 2008
posts:661
votes: 0


Yupp, that's what I was going to suggest when I read your first post.
I ran into that a few times. Does anyone know if ->decoded_content has any major drawbacks? Most people use ->content, even in documentation, and don't worry about character encoding.