Welcome to WebmasterWorld Guest from 184.108.40.206
Comes from variuous IP's beginning with 202.160.178. and 202.160.179. and resolving to *.inktomisearch.com.
It's been visiting my sites every 2-3 days to retrieve the same file.
I really don't want to discriminate against China, and yet if Inktomi isn't going to at least read robots.txt they really leave me no choice but to ban them.
Viewing the help page in the UA shows, obviously, a page in a language that my browser cannot read, though where it mentions robots.txt the exclusion is set for Slurp so following that all Y! bots would be banned.
I've banned them by full user agent.
User-agent: slurp china
as the same thing, so you can't use robots.txt to disallow Slurp China without also Disallowing the "U.S." slurp.
So, for now, I've had to block Slurp China with a 403 in .htaccess. :(
Another approach that some Webmasters can use is to serve an alternate robots.txt to slurp china; You can internally rewrite slurp china's requests to a secondary robots.txt file that Disallows it. That's not an option for me, since my host -- for some reason -- grabs robots.txt requests and diverts them to a script which then serves the site's robots.txt file before my .htaccess processing can have any effect.
It strikes me as odd that the big search engine players put so little thought into making life easier for Webmasters who target only domestic markets... not to mention all the other problems we've seen -- like redirect handling -- with the fundamental function of search engines -- the robots themselves.
slurp china, tsai chien! (goodbye)
What possibly bennie is there for any bot to come back and hit a 403 multi times a day/week/month/etc.? It's like every bot has a line of code that says never give up, never surrender.
Yes, that's the "Churchill" subroutine... ;)
I have a special place for those... I rewrite their requests to subdirectory where all access is bloked except for a custom 403 page. And that custom 403 page is two bytes long... It contains only "no". So, this at least minimizes the bandwidth they waste. Of course, I'd rather block them at the router, but alas, I haven't purchased my own data center yet.