Forum Moderators: open

Message Too Old, No Replies

Slurp IPs?

for a penalized site

         

bcc1234

12:08 am on Jul 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One of my sites has been penalized since the begining of the new Yahoo, and before that with Inktomi.

I always saw requests for robots.txt - at least 3 a day, and nothing else.

For the past few days, I started seeing requests for random pages, but from a different IP block.

The regular robots.txt requests come from 66.*, while the new requests came from 209.*. The block of that IP is owned by Yahoo, so it's them. I just can't figure out if it's a new spider location or if it's a human editor reviewing our site.
User agent is always the same "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

enotalone

11:54 am on Jul 26, 2004 (gmt 0)

10+ Year Member



Hi bcc1234, I am seeing the exact same thing you describe for the past few days. Like yours my site had some kind of penalty too i think with yahoo, not sure about ink. last 3-4 days i see it being spidered from the same range of IPs you described and i don’t think it is human because it crawls a lot pages and is too fast to be a human editor.

and in my case it is always 209.131.40.31
while i don't think it is human, i belive that yahoo editors use 209.131* blocks too.

bcc1234

3:33 pm on Jul 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I only got a few hits, maybe a total of 10.
So it might have been a human in my case.
Can you check how many hits you got on your first day?

enotalone

4:13 pm on Jul 26, 2004 (gmt 0)

10+ Year Member



If I count by hits and take only hits from 209.131.40.31 because Slurp bots form other ip addresses just hit the robots file and 2 pages i have in sitematch it was about 100 hits.

enotalone

12:56 pm on Aug 6, 2004 (gmt 0)

10+ Year Member



209.131.40.31 still spiders daily, multiple times a day.
anyone has a clue what 209.131.40.31 is about?

bcc1234

12:57 pm on Aug 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep, the same thing. I get hits from that range every day, but nothing seems to happen.

enotalone

1:09 pm on Aug 6, 2004 (gmt 0)

10+ Year Member



the only difference i see after it started to crawl is that now even if you search for domain.com nothing shows up.


We didn't find any Web pages matching the following criteria:

* Containing this query term: domain.com

before at least domain.com search would show the site.

enotalone

1:16 pm on Aug 6, 2004 (gmt 0)

10+ Year Member



the other thing i see unique about this bot is that i dont think it comes to my site following links on the web. i do 301 redirect with .htaccess for all requests to domain.com and redirect them to www.domain.com
if it came from a link it had to get at least some 301s, but it only gets 200 code, which to me means it comes to the site directly and starts crawling from within the site.

johnlim

1:50 pm on Aug 6, 2004 (gmt 0)

10+ Year Member



I see the same yahoo slurp visits on my banned site also. It always clawls the same pages everyday. It has been like this for more than 2 weeks already. And my site cannot be found in Yahoo.

kazonik

3:03 am on Aug 7, 2004 (gmt 0)

10+ Year Member



FWIW, heres some info I've collected:

NetBlock ¦ IP Address ¦HTTP Accept¦User Agent
Yahoo ¦ 209.131.40.67 ¦ */* ¦ Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)
Yahoo ¦ 209.131.40.82 ¦ */* ¦ Mozilla/4.0 (compatible; MSIE 5.0; Windows NT)
Yahoo ¦ 209.131.40.83 ¦ */* ¦ Mozilla/5.0 (compatible; Yahoo! Slurp/si-emb; [help.yahoo.com...]
Yahoo ¦ 209.131.40.132 ¦ */* ¦ Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
Yahoo ¦ 209.131.40.179 ¦ */* ¦ Slurp/si-emb (slurp@inktomi.com; [inktomi.com...]
Yahoo ¦ 209.131.40.155 ¦ ¦ Randy
LocalLink¦ 209.131.227.250 ¦ */* ¦ Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Who's RANDY?
Randy was the only user agent not to specify an HTTP Accept type.

It looks like the address block belonging to Yahoo is 209.131.40....

Peace,
Kaz

bcc1234

2:15 pm on Aug 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just got hundreds of hits from Slurp on 66.*.

Looks like the regular crawl. The site is not in the index (yet). But if it does get back then I guess that Yahoo re-review requests do work, once you fix all the stuff you might have missed.

DaveAtIFG

2:24 pm on Aug 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Could this IP be a "prefetch spider?" Perhaps to bring pages into a cache for a reviewer's convenience prior to review? I'm thinking there may be a prefetch mechanism for sites that are queued for human review...

bcc1234

2:30 pm on Aug 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dave, are you talking about 209 or 66?

enotalone

2:48 pm on Aug 8, 2004 (gmt 0)

10+ Year Member



bcc1234, same is happening to me!
66.* robots started to crawl from yesterday

209.131.40.31 still crawls too.

DaveAtIFG

2:53 pm on Aug 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry bcc, I was thinking 209.* may be a prefetcher...

Where's that darn red faced *embarrassed* smiley when you really need it! :)

bcc1234

2:57 pm on Aug 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, I still see hits from 209.