Welcome to WebmasterWorld Guest from 50.19.135.67

Forum Moderators: goodroi

Message Too Old, No Replies

How to stop Yahoo Slurp?

Slurp ingnores my robots.txt

     

atlrus

12:46 pm on Mar 4, 2005 (gmt 0)

10+ Year Member



That's about it - how can I stop Yahoo from crawling my site? I have the Slurp in my robots.txt, but it reads it, and keep on crawling my site.

I have verified the robots.txt file, and it's fine. Anybody knows?

kevinpate

1:51 pm on Mar 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



you could always ban it in your htaccess file (or in the windows file that's ht's cousin if that's your environment.

I'm not a fan of Slurp myself,as it can't seem to grasp the meaning of 301 or 404 messages, but, as I am happy where I show up in yahoo, I put up with what I don't like about Slurp.

atlrus

2:24 pm on Mar 4, 2005 (gmt 0)

10+ Year Member



Would you guide me through the process of the ip blocking, becuase I am quite a newbie in the matter, plus there are a whole bunch of other unidentified bots I would like to block by ip. PM would be fine.
My site is on Linux OS.
Thanks.

encyclo

2:30 pm on Mar 4, 2005 (gmt 0)

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Slurp obeys robots.txt, however it may take a while between your making a change and Slurp recognizing the fact and stopping indexing.

If it has been a while, or if Slurp has fetched the new robots.txt, then you may have a syntax or other problem with the file. Have you tried validating [searchengineworld.com] it?

What exact syntax are you using to block Slurp?

atlrus

2:52 pm on Mar 4, 2005 (gmt 0)

10+ Year Member



Yes, I have validated the robots.txt and it's just fine. The Slurp got it today with "200" a few times, but it's still crawling like crazy - no indexing though - so I see no reason to let it eat my bwidth, if it's just going to show my home page :) but it's not obeying the robots.txt

larryhatch

3:10 pm on Mar 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Uh, Altrus?

Why would you (or anyone else) want to stop Yahoo from spidering your site?

Just curious. - Larry

atlrus

2:16 am on Mar 6, 2005 (gmt 0)

10+ Year Member



I have a ban, but the Slurp still eats my website at the speed of 50-60 pages/minute...

larryhatch

4:25 am on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK. I don't know how big your site is.
If its in the 1000s of pages you have bandwith costs and concerns.

I welcome Slurp because Y gives me fairly decent SERPS positioning. - Larry

Rajith

10:35 am on Mar 12, 2005 (gmt 0)

10+ Year Member



I will complete go with Larry, You some want to stop slurp. y is one of the major search engine. It good to hear that Y is crawling the site frequently. Make some change in the SE preference area which will helpful in the search result.

whoisgregg

2:16 am on Mar 22, 2005 (gmt 0)

WebmasterWorld Senior Member whoisgregg is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Interestingly, Slurp caches robots.txt and also caches DNS for at least five days. It was my sole visitor to the IP a webserver was running on that had "misplaced" ::cough, cough:: it's DNS record. :(

abates

9:35 pm on Mar 29, 2005 (gmt 0)

10+ Year Member



Odd that it caches robots.txt, since it seems to grab that from my site twice a day or more at times! Maybe that's not Slurp. :O

The Contractor

9:57 pm on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



maybe someone is downloading your site using slurp as the user agent? Never had problems with them not obeying robots.txt
Sure it's not their shopping bot YahooSeeker?

abates

10:37 pm on Mar 29, 2005 (gmt 0)

10+ Year Member



Lemme correct my previous post - Slurp's grabbed my robots.txt 1012 times so far this month, an average of about 33 times a day. Compare this with msnbot (7 times a day) and Googebot (twice a day) and I'm beginning to wonder what Slurp finds to fascinating about my robots.txt :)

runningwolfe

7:32 pm on Apr 26, 2005 (gmt 0)

10+ Year Member



Ok, you all are going to laugh at me I know. I am new to the game of webmastering and search engine opptomizing. But where do I find my robots.txt in my directories. I cant seem to find it. I did a validation test on my site and it came up with 48 warnings a 117 errors. alot of "capitolize this and that's" and some other weird "grammer" stuff. I am going to be reformating the site and it's text so that isnt a problem. But what I have a question about is this. Where is my robot.txt file and how do I fix the warnings and errors when I get them? Also, since I don't write html code what would you suggest, as far as reading material to understand more on the line of what is discussed here?

larryhatch

9:22 pm on Apr 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You HAVE no robots.txt file until you yourself create it and upload it
to your host isp. It goes in the same directory as your regular website pages.
I would do a lot of spell checking to reduce 'grammar' errors too. -Larry

runningwolfe

9:47 pm on Apr 26, 2005 (gmt 0)

10+ Year Member



Thanks!
That would explain alot.
By the way, I would have to look in my aws stats to see who/what has been looking at my sites correct?
 

Featured Threads

Hot Threads This Week

Hot Threads This Month