Forum Moderators: DixonJones

Message Too Old, No Replies

SlurpConfirm - Enough Alreay!

         

keyplyr

4:40 am on Jan 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anyone else getting tired of these "tests to confirm how your server handles 404 file not found errors"?

Getting about a dozen each visit, once or more per week. Does this really need to continue indefinitely?

66.196.65.51 - - [29/Jan/2006:12:37:11 -0800] "GET /SlurpConfirm404/healtale/page.htm HTTP/1.0" 404 813 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

perfectlover

8:09 pm on Feb 4, 2006 (gmt 0)

10+ Year Member



Yes I am getting same. I am curious to know what it is?

keyplyr

12:02 am on Feb 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I know what it is. The bot requests nonexistant files to see how the server handles it. I'm just tired of my error logs filling up again and again. I would think the bot would get it by now.

jdMorgan

12:24 am on Feb 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm far more concerned with Yahoo! Slurp China using the same robot agent name as Yahoo! Slurp 'international' for robots.txt. I haven't found a way to disallow Yahoo! Slurp China without also blocking Yahoo for the rest of the world -- short of feeding it 403's in .htaccess (Unfortunately, on this site, I can't serve different robots.txt files based on user-agent due to some weird scripting by the host).

I have no interest in wasting bandwidth feeding Yahoo! Slurp China, since I can't practically provide any services to China, and any unfortuate chinese person who visited my site would probably find himself in trouble with the authorities -- Too many highly-dangerous concepts on the site, like liberty, freedom, free speech, and democracy.

Just my little gripe to add to yours... :)

Jim

Key_Master

12:32 am on Feb 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim, have you tried using SSI to exclude Yahoo! China in robots.txt?

jdMorgan

3:19 am on Feb 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, neither that nor trying to mod_rewrite robots.txt works on this host, because they serve hosted sites' robots.txt using a script that injects a Disallow on their e-commerce URLs if one is not already present.

Basically, the hosted site's robots.txt is 'included' as a text-only file by their script, and checked. Apparently, some customers didn't disallow those e-commerce-package-related dynamic URLs, and the robots caused this host a lot of grief, overloading servers and consuming massive bandwidth, due to the fact that the URL-space on those URLs is essentially infinite. I don't want to hijack this thread, but thanks for asking... Despite the fact that the server has been down maybe five minutes in the last seven years, I suppose I'll end up moving this site.

I just think that since 'the rules' for China are different from most of the rest of the world, Slurp China ought to recognnize a different agent name in robots.txt, and save me the bother of sending them a 'special' Falun Gong page so they'll take me off their spider's URL list...

And fix SlurpConfirm for us, too. ;)

Jim

followgreg

11:36 pm on Feb 9, 2006 (gmt 0)

10+ Year Member




I think that Yahoo is an amazing waste of bandwidth lol not only they keep on hitting sites at an incredible rate but apparently they do NOTHING with this data, it is the slowest indexing so far compared to Google and MSN - What a waste of our bandwidth and THEIR money :)