Welcome to WebmasterWorld Guest from 54.163.115.193

Forum Moderators: goodroi

How to stop Yahoo Slurp?

Slurp ingnores my robots.txt

   
12:46 pm on Mar 4, 2005 (gmt 0)

10+ Year Member



That's about it - how can I stop Yahoo from crawling my site? I have the Slurp in my robots.txt, but it reads it, and keep on crawling my site.

I have verified the robots.txt file, and it's fine. Anybody knows?

1:51 pm on Mar 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



you could always ban it in your htaccess file (or in the windows file that's ht's cousin if that's your environment.

I'm not a fan of Slurp myself,as it can't seem to grasp the meaning of 301 or 404 messages, but, as I am happy where I show up in yahoo, I put up with what I don't like about Slurp.

2:24 pm on Mar 4, 2005 (gmt 0)

10+ Year Member



Would you guide me through the process of the ip blocking, becuase I am quite a newbie in the matter, plus there are a whole bunch of other unidentified bots I would like to block by ip. PM would be fine.
My site is on Linux OS.
Thanks.
2:30 pm on Mar 4, 2005 (gmt 0)

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Slurp obeys robots.txt, however it may take a while between your making a change and Slurp recognizing the fact and stopping indexing.

If it has been a while, or if Slurp has fetched the new robots.txt, then you may have a syntax or other problem with the file. Have you tried validating [searchengineworld.com] it?

What exact syntax are you using to block Slurp?

2:52 pm on Mar 4, 2005 (gmt 0)

10+ Year Member



Yes, I have validated the robots.txt and it's just fine. The Slurp got it today with "200" a few times, but it's still crawling like crazy - no indexing though - so I see no reason to let it eat my bwidth, if it's just going to show my home page :) but it's not obeying the robots.txt
3:10 pm on Mar 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Uh, Altrus?

Why would you (or anyone else) want to stop Yahoo from spidering your site?

Just curious. - Larry

2:16 am on Mar 6, 2005 (gmt 0)

10+ Year Member



I have a ban, but the Slurp still eats my website at the speed of 50-60 pages/minute...
4:25 am on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK. I don't know how big your site is.
If its in the 1000s of pages you have bandwith costs and concerns.

I welcome Slurp because Y gives me fairly decent SERPS positioning. - Larry

10:35 am on Mar 12, 2005 (gmt 0)

10+ Year Member



I will complete go with Larry, You some want to stop slurp. y is one of the major search engine. It good to hear that Y is crawling the site frequently. Make some change in the SE preference area which will helpful in the search result.
2:16 am on Mar 22, 2005 (gmt 0)

WebmasterWorld Senior Member whoisgregg is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Interestingly, Slurp caches robots.txt and also caches DNS for at least five days. It was my sole visitor to the IP a webserver was running on that had "misplaced" ::cough, cough:: it's DNS record. :(
9:35 pm on Mar 29, 2005 (gmt 0)

10+ Year Member



Odd that it caches robots.txt, since it seems to grab that from my site twice a day or more at times! Maybe that's not Slurp. :O
9:57 pm on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



maybe someone is downloading your site using slurp as the user agent? Never had problems with them not obeying robots.txt
Sure it's not their shopping bot YahooSeeker?
10:37 pm on Mar 29, 2005 (gmt 0)

10+ Year Member



Lemme correct my previous post - Slurp's grabbed my robots.txt 1012 times so far this month, an average of about 33 times a day. Compare this with msnbot (7 times a day) and Googebot (twice a day) and I'm beginning to wonder what Slurp finds to fascinating about my robots.txt :)
7:32 pm on Apr 26, 2005 (gmt 0)

10+ Year Member



Ok, you all are going to laugh at me I know. I am new to the game of webmastering and search engine opptomizing. But where do I find my robots.txt in my directories. I cant seem to find it. I did a validation test on my site and it came up with 48 warnings a 117 errors. alot of "capitolize this and that's" and some other weird "grammer" stuff. I am going to be reformating the site and it's text so that isnt a problem. But what I have a question about is this. Where is my robot.txt file and how do I fix the warnings and errors when I get them? Also, since I don't write html code what would you suggest, as far as reading material to understand more on the line of what is discussed here?
9:22 pm on Apr 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You HAVE no robots.txt file until you yourself create it and upload it
to your host isp. It goes in the same directory as your regular website pages.
I would do a lot of spell checking to reduce 'grammar' errors too. -Larry
9:47 pm on Apr 26, 2005 (gmt 0)

10+ Year Member



Thanks!
That would explain alot.
By the way, I would have to look in my aws stats to see who/what has been looking at my sites correct?
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month