homepage Welcome to WebmasterWorld Guest from 54.196.168.78
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to stop Yahoo Slurp?
Slurp ingnores my robots.txt
atlrus




msg:1526115
 12:46 pm on Mar 4, 2005 (gmt 0)

That's about it - how can I stop Yahoo from crawling my site? I have the Slurp in my robots.txt, but it reads it, and keep on crawling my site.

I have verified the robots.txt file, and it's fine. Anybody knows?

 

kevinpate




msg:1526116
 1:51 pm on Mar 4, 2005 (gmt 0)

you could always ban it in your htaccess file (or in the windows file that's ht's cousin if that's your environment.

I'm not a fan of Slurp myself,as it can't seem to grasp the meaning of 301 or 404 messages, but, as I am happy where I show up in yahoo, I put up with what I don't like about Slurp.

atlrus




msg:1526117
 2:24 pm on Mar 4, 2005 (gmt 0)

Would you guide me through the process of the ip blocking, becuase I am quite a newbie in the matter, plus there are a whole bunch of other unidentified bots I would like to block by ip. PM would be fine.
My site is on Linux OS.
Thanks.

encyclo




msg:1526118
 2:30 pm on Mar 4, 2005 (gmt 0)

Slurp obeys robots.txt, however it may take a while between your making a change and Slurp recognizing the fact and stopping indexing.

If it has been a while, or if Slurp has fetched the new robots.txt, then you may have a syntax or other problem with the file. Have you tried validating [searchengineworld.com] it?

What exact syntax are you using to block Slurp?

atlrus




msg:1526119
 2:52 pm on Mar 4, 2005 (gmt 0)

Yes, I have validated the robots.txt and it's just fine. The Slurp got it today with "200" a few times, but it's still crawling like crazy - no indexing though - so I see no reason to let it eat my bwidth, if it's just going to show my home page :) but it's not obeying the robots.txt

larryhatch




msg:1526120
 3:10 pm on Mar 5, 2005 (gmt 0)

Uh, Altrus?

Why would you (or anyone else) want to stop Yahoo from spidering your site?

Just curious. - Larry

atlrus




msg:1526121
 2:16 am on Mar 6, 2005 (gmt 0)

I have a ban, but the Slurp still eats my website at the speed of 50-60 pages/minute...

larryhatch




msg:1526122
 4:25 am on Mar 7, 2005 (gmt 0)

OK. I don't know how big your site is.
If its in the 1000s of pages you have bandwith costs and concerns.

I welcome Slurp because Y gives me fairly decent SERPS positioning. - Larry

Rajith




msg:1526123
 10:35 am on Mar 12, 2005 (gmt 0)

I will complete go with Larry, You some want to stop slurp. y is one of the major search engine. It good to hear that Y is crawling the site frequently. Make some change in the SE preference area which will helpful in the search result.

whoisgregg




msg:1526124
 2:16 am on Mar 22, 2005 (gmt 0)

Interestingly, Slurp caches robots.txt and also caches DNS for at least five days. It was my sole visitor to the IP a webserver was running on that had "misplaced" ::cough, cough:: it's DNS record. :(

abates




msg:1526125
 9:35 pm on Mar 29, 2005 (gmt 0)

Odd that it caches robots.txt, since it seems to grab that from my site twice a day or more at times! Maybe that's not Slurp. :O

The Contractor




msg:1526126
 9:57 pm on Mar 29, 2005 (gmt 0)

maybe someone is downloading your site using slurp as the user agent? Never had problems with them not obeying robots.txt
Sure it's not their shopping bot YahooSeeker?

abates




msg:1526127
 10:37 pm on Mar 29, 2005 (gmt 0)

Lemme correct my previous post - Slurp's grabbed my robots.txt 1012 times so far this month, an average of about 33 times a day. Compare this with msnbot (7 times a day) and Googebot (twice a day) and I'm beginning to wonder what Slurp finds to fascinating about my robots.txt :)

runningwolfe




msg:1526128
 7:32 pm on Apr 26, 2005 (gmt 0)

Ok, you all are going to laugh at me I know. I am new to the game of webmastering and search engine opptomizing. But where do I find my robots.txt in my directories. I cant seem to find it. I did a validation test on my site and it came up with 48 warnings a 117 errors. alot of "capitolize this and that's" and some other weird "grammer" stuff. I am going to be reformating the site and it's text so that isnt a problem. But what I have a question about is this. Where is my robot.txt file and how do I fix the warnings and errors when I get them? Also, since I don't write html code what would you suggest, as far as reading material to understand more on the line of what is discussed here?

larryhatch




msg:1526129
 9:22 pm on Apr 26, 2005 (gmt 0)

You HAVE no robots.txt file until you yourself create it and upload it
to your host isp. It goes in the same directory as your regular website pages.
I would do a lot of spell checking to reduce 'grammar' errors too. -Larry

runningwolfe




msg:1526130
 9:47 pm on Apr 26, 2005 (gmt 0)

Thanks!
That would explain alot.
By the way, I would have to look in my aws stats to see who/what has been looking at my sites correct?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved