Forum Moderators: open

Message Too Old, No Replies

Yahoo! Mindset

Didn't ask for robots.txt

         

Pfui

5:02 am on Apr 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Yahoo! Mindset
Host: rlx-2-2-10.labs.corp.yahoo.com

First visit, went straight to a sub-dir file:

rlx-2-2-10.labs.corp.yahoo.com - - [06/Apr/2006:19:43:32 -0700]
"GET /dir/file.html HTTP/1.1" [...] "-" "Yahoo! Mindset"

More info about this here [mindset.research.yahoo.com] (Yahoo) and here [askdavetaylor.com] (Ask Dave Taylor).

Alas, Yet Another Yahoo bot/crawler/spider/whatever that doesn't ask for robots.txt.

Ban-worthy in my book.

volatilegx

3:35 am on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seen coming from:

66.228.182.177
66.228.182.183
66.228.182.187
66.228.182.188
66.228.182.190

volatilegx

3:48 am on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



AND...

Date: 04/2/2006, 19:52:46
IP: 66.228.182.185
Host: rlx-2-2-5.labs.corp.yahoo.com
UA: Mozilla/4.0

Suspicious. Wonder what's going on?

GaryK

3:33 pm on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Same here.

Yahoo! Mindset
04/07/2006 05:33:10

No robots.txt. It went right to my tutorials section and stole everything.

Pfui

3:35 pm on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



More sightings -- still NO robots.txt:

q02.yrl.dcn.yahoo.com
Yahoo! Mindset
04/09 15:24:31 /
04/09 15:24:31 /

(Two hits in one second.)

Only hitting my largest, DMOZ'd site (fwiw).

fiestagirl

5:56 pm on Apr 14, 2006 (gmt 0)

10+ Year Member



Possibility:
May 27, 2005

"Often, we come across a web page that hasn't been classified yet. In those cases, Mindset tries to classify that web page in the background, so it'll be classified along with the rest of the results next time you do the same query."

from ysearchblog.com

volatilegx

10:32 pm on Apr 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It may not be asking for robots.txt, but is it obeying it? Maybe it gets the robots.txt from Slurp requests.

Pfui

11:07 pm on Apr 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To me, if "Yahoo! Mindset" doesn't ask, "Yahoo! Mindset" doesn't obey.

And "Yahoo! Mindset" doesn't ask:

q02.yrl.dcn.yahoo.com - - [09/Apr/2006:15:24:31 -0700]
"GET / HTTP/1.1" [...] "-" "Yahoo! Mindset"

q02.yrl.dcn.yahoo.com - - [09/Apr/2006:15:24:31 -0700]
"GET / HTTP/1.1" [...] "-" "Yahoo! Mindset"

(Two separate, completely identical hits.)

Also using "Yahoo! Mindset" and also not asking:

rlx-2-2-10.labs.corp.yahoo.com (see my initial post, above)
rlx-2-2-2.labs.corp.yahoo.com

In addition to 'regular' Slurp (as opposed to 'China' Slurp), Yahoo has waaaay too many UAs (and inktomisearch and yahoo and who-knows-what-all domains) for me to even begin tracking which Y! UAs retrieving robots.txt might be sharing the info with this new one.

Imho, Yahoo knows how to 'do' robots.txt. They're just of a Mindset not to.

: )