Forum Moderators: open

Message Too Old, No Replies

YahooSeeker Bot does not obey robots.txt

bad yahoo, no cookie.

         

farside847

5:33 pm on Aug 27, 2003 (gmt 0)

10+ Year Member



Just had to block their bot, they were hitting my site
10 times per second, and accessing https and other banned
pages from my robots.txt

Anyone else see this?

I sent an email to abuse@yahoo.com and abuse@yahoo-inc.com -
does anyone know if this is the right address to contact them about it?

While I like the idea of all my producs added to their
beta site, I dont want them to run crazy like all over my
site...

[edit]
Now that I search through the logs, the bot never even
attempted to read my robots.txt

dragonlady7

7:15 pm on Aug 27, 2003 (gmt 0)

10+ Year Member



I've never even seen the YahooSeeker bot. Is my ignorance showing or does it not crawl much?

farside847

8:40 pm on Aug 27, 2003 (gmt 0)

10+ Year Member



I saw it for the first time last week, it is part of the
beta product search page (that works very similar to froogle IMHO)

farside847

4:08 pm on Aug 28, 2003 (gmt 0)

10+ Year Member



I got a very nice reply from abuse@yahoo-inc.com


Shawn, we've identified a bug in our crawling
program - a fix should go out tonight w/r/t
the robots.txt problem.

Our crawler fetches 1 request/sec and sleeps
every 5 seconds for a few seconds. We've
slowed the fetching down a tad.

EliteWeb

4:13 pm on Aug 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What was the full browser tags and identity of this spider including IP address.

farside847

7:05 pm on Aug 28, 2003 (gmt 0)

10+ Year Member



YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; [search.yahoo.com...]

216.109.126.* (I saw 30 or so unique IPs)

fiestagirl

3:51 pm on Sep 17, 2003 (gmt 0)

10+ Year Member



I've seen it coming around with this former Inktomi address which now resolves to:

yj1001.search.sc5.yahoo.com

66.196.93.*

YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http: //search.yahoo.com/yahooseeker.html)