wilderness

msg:398588 | 5:57 pm on Nov 15, 2005 (gmt 0) |
[webmasterworld.com...]
|
GaryK

msg:398589 | 1:38 pm on Nov 20, 2005 (gmt 0) |
Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...] 202.160.180.9 It's been visiting my sites every 2-3 days to retrieve the same file. I really don't want to discriminate against China, and yet if Inktomi isn't going to at least read robots.txt they really leave me no choice but to ban them.
|
Staffa

msg:398590 | 6:16 pm on Nov 20, 2005 (gmt 0) |
My sites have nothing in particular to offer to China. Viewing the help page in the UA shows, obviously, a page in a language that my browser cannot read, though where it mentions robots.txt the exclusion is set for Slurp so following that all Y! bots would be banned. I've banned them by full user agent.
|
the_nerd

msg:398591 | 10:23 am on Nov 29, 2005 (gmt 0) |
| I've banned them by full user agent. |
| What is the full user agent in this case? Mozilla/5.0+(compatible;+Yahoo!+Slurp+China;+http://misc.yahoo.com.cn/help.html)? nerd
|
Staffa

msg:398592 | 12:41 pm on Nov 29, 2005 (gmt 0) |
Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...] is what I have, the + + + in your string may have been added by your logfile.
|
jdMorgan

msg:398593 | 8:35 pm on Nov 29, 2005 (gmt 0) |
Slurp treats User-agent: slurp and User-agent: slurp china as the same thing, so you can't use robots.txt to disallow Slurp China without also Disallowing the "U.S." slurp. So, for now, I've had to block Slurp China with a 403 in .htaccess. :( Another approach that some Webmasters can use is to serve an alternate robots.txt to slurp china; You can internally rewrite slurp china's requests to a secondary robots.txt file that Disallows it. That's not an option for me, since my host -- for some reason -- grabs robots.txt requests and diverts them to a script which then serves the site's robots.txt file before my .htaccess processing can have any effect. It strikes me as odd that the big search engine players put so little thought into making life easier for Webmasters who target only domestic markets... not to mention all the other problems we've seen -- like redirect handling -- with the fundamental function of search engines -- the robots themselves. slurp china, tsai chien! (goodbye) Jim
|
GaryK

msg:398594 | 6:35 pm on Nov 30, 2005 (gmt 0) |
It wouldn't surprise me a bit if the decision to use essentially one robot name was intentional. This way we're forced into an all or nothing dilemma. Either accept everything from Slurp or ban everything. Unless you're willing to use more sophisticated means to stop the China bot, and I'd venture a guess that most webmasters aren't able or willing to do that.
|
Staffa

msg:398595 | 9:26 pm on Nov 30, 2005 (gmt 0) |
GaryK, guess again ;o)
|
GaryK

msg:398596 | 1:00 am on Dec 1, 2005 (gmt 0) |
I knew I should have been more specific. ;) To me it was a given that the folks here would know how to do that. I was referring to the rest of the webmasters in the world.
|
kevinpate

msg:398597 | 2:50 am on Dec 1, 2005 (gmt 0) |
I have only one heartburn to the continuous serving up of 403's to Slurp China, and it's not at all unique to that one bot. What possibly bennie is there for any bot to come back and hit a 403 multi times a day/week/month/etc.? It's like every bot has a line of code that says never give up, never surrender.
|
jdMorgan

msg:398598 | 3:51 am on Dec 1, 2005 (gmt 0) |
"Never give up, never surrender..." Yes, that's the "Churchill" subroutine... ;) I have a special place for those... I rewrite their requests to subdirectory where all access is bloked except for a custom 403 page. And that custom 403 page is two bytes long... It contains only "no". So, this at least minimizes the bandwidth they waste. Of course, I'd rather block them at the router, but alas, I haven't purchased my own data center yet. Jim
|
|