Forum Moderators: open

Message Too Old, No Replies

cc0.inktomi.com

ua = libwww-perl/5.49

         

misosoph

3:17 pm on Jun 15, 2002 (gmt 0)

10+ Year Member



cc0.inktomi.com - - [15/Jun/2002:07:42:07 -0700] "GET /folder/filename.html HTTP/1.0" 200 80679 "-" "libwww-perl/5.49"

The IP address is 209.131.48.104 which is registered to Inktomi Corporation

No comment beyond: I've never seen this before.

volatilegx

3:35 pm on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The whole class C 209.131.48 belongs to Inktomi

misosoph

4:45 pm on Jun 17, 2002 (gmt 0)

10+ Year Member



But why is inktomi not using its slurp robot in this case? Isn't this strange?

volatilegx

4:54 pm on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The "libwww-perl/5.49" is the agent used when you use the libwww modules for Perl. I'll bet somebody over at Ink is running some custom Perl app that is doing some spidering. It's probably not for their SE spider. I'd say that if they really wanted to hide a spider, they'd be using a standard browser user agent, and doing it from a rented IP address.

littleman

5:05 am on Jun 18, 2002 (gmt 0)



It could be an investigatory bot. A bot to grab your page so a human could view it later -- just speculation.

I use to see google do this, and I couldn't decide if it was better to feed that bat the 'human' or the 'bot' page.

volatilegx

6:34 pm on Jun 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's a tough call, but I think you'd be safer showing the "bot" page :)

makemetop

7:24 pm on Jun 18, 2002 (gmt 0)



I've banned libwww-perl in robots.txt. Used by a lot of strange people as well as known ones ;)

volatilegx

10:59 pm on Jun 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



banning in robots.txt does little if the bot doesn't obey robots.txt conventions. It's better to ban in a .htaccess file with either mod_rewrite or allow,deny

misosoph

6:51 am on Jun 24, 2002 (gmt 0)

10+ Year Member



Now I have this combination also from FAST:

fpcr002.sac2.fastsearch.net - - [23/Jun/2002:19:45:10 -0700] "GET / HTTP/1.0" 206 7682 "-" "libwww-perl/5.52 FP/4.0"

fpcr002.sac2.fastsearch.net leads directly to an advanced-search FAST page, not to an AllTheWeb page.

wilderness

10:06 pm on Jun 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



misosoph
I had a similar thing occur the other day with Google also.
The Google reference was for those cell phone/pda IP's which I had previously denied.

Google followed immediately with their standard bot access. As a result I removed the denies from those previous IP's.

I don't recall the thread this was discussed. It was some time ago. Although there was not any mention of these "libwww-perl/5.52 FP/4.0"

Personally for the volume of text my pages have, I don't see the benefit of allowing acces to something that can read less than 300 characters.