Welcome to WebmasterWorld Guest from 54.221.28.179

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Yahoo! Slurp China

     

MaxM

4:00 pm on Nov 15, 2005 (gmt 0)

10+ Year Member



Some heavy spidering from: Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]

Comes from variuous IP's beginning with 202.160.178. and 202.160.179. and resolving to *.inktomisearch.com.

wilderness

5:57 pm on Nov 15, 2005 (gmt 0)

GaryK

1:38 pm on Nov 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]
202.160.180.9

It's been visiting my sites every 2-3 days to retrieve the same file.

I really don't want to discriminate against China, and yet if Inktomi isn't going to at least read robots.txt they really leave me no choice but to ban them.

Staffa

6:16 pm on Nov 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My sites have nothing in particular to offer to China.

Viewing the help page in the UA shows, obviously, a page in a language that my browser cannot read, though where it mentions robots.txt the exclusion is set for Slurp so following that all Y! bots would be banned.

I've banned them by full user agent.

the_nerd

10:23 am on Nov 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've banned them by full user agent.

What is the full user agent in this case?
Mozilla/5.0+(compatible;+Yahoo!+Slurp+China;+http://misc.yahoo.com.cn/help.html)?

nerd

Staffa

12:41 pm on Nov 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]

is what I have, the + + + in your string may have been added by your logfile.

jdMorgan

8:35 pm on Nov 29, 2005 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Slurp treats

User-agent: slurp
and
User-agent: slurp china

as the same thing, so you can't use robots.txt to disallow Slurp China without also Disallowing the "U.S." slurp.

So, for now, I've had to block Slurp China with a 403 in .htaccess. :(

Another approach that some Webmasters can use is to serve an alternate robots.txt to slurp china; You can internally rewrite slurp china's requests to a secondary robots.txt file that Disallows it. That's not an option for me, since my host -- for some reason -- grabs robots.txt requests and diverts them to a script which then serves the site's robots.txt file before my .htaccess processing can have any effect.

It strikes me as odd that the big search engine players put so little thought into making life easier for Webmasters who target only domestic markets... not to mention all the other problems we've seen -- like redirect handling -- with the fundamental function of search engines -- the robots themselves.

slurp china, tsai chien! (goodbye)

Jim

GaryK

6:35 pm on Nov 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It wouldn't surprise me a bit if the decision to use essentially one robot name was intentional. This way we're forced into an all or nothing dilemma. Either accept everything from Slurp or ban everything. Unless you're willing to use more sophisticated means to stop the China bot, and I'd venture a guess that most webmasters aren't able or willing to do that.

Staffa

9:26 pm on Nov 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



GaryK, guess again ;o)

GaryK

1:00 am on Dec 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I knew I should have been more specific. ;)

To me it was a given that the folks here would know how to do that. I was referring to the rest of the webmasters in the world.

kevinpate

2:50 am on Dec 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have only one heartburn to the continuous serving up of 403's to Slurp China, and it's not at all unique to that one bot.

What possibly bennie is there for any bot to come back and hit a 403 multi times a day/week/month/etc.? It's like every bot has a line of code that says never give up, never surrender.

jdMorgan

3:51 am on Dec 1, 2005 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



"Never give up, never surrender..."

Yes, that's the "Churchill" subroutine... ;)

I have a special place for those... I rewrite their requests to subdirectory where all access is bloked except for a custom 403 page. And that custom 403 page is two bytes long... It contains only "no". So, this at least minimizes the bandwidth they waste. Of course, I'd rather block them at the router, but alas, I haven't purchased my own data center yet.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month