| Welcome to WebmasterWorld Guest from 18.104.22.168 |
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
|Become a Pro Member|
|Robots.txt - What's the Point?|
I've been devouring every piece of information I can on the site regarding robots.txt, and I still come to the same question: Why bother?
Isn't it better to have no robots.txt at all and let all the spiders in?
Some spiders harvest email addresses, do you want them grabbing yours?
Others bombard your server with requests one after another too fast and bring your server to its knees.
Sometimes you have pseudo-sensitive data you don't want crawled.
Other times you have a development server online that you don't want crawled and indexed, just your production server should be.
depends if you want to let all the spiders in.
there are places that people just do not want to be indexed.
personal data or subscriber data.
You may have pages that you do not want the spiders to bother to index. It's a waste of bandwidth and time.
I have a few. Another reason on our site is we duplicate content (with permission) from another website. That would get us penalised by google if we let them in.
gibble : Some spiders harvest email addresses
and of course they would obey the no robots.txt file ;)
many rogue spiders don't obey robots.txt. you may need to ban the via .htaccess
well...yeah...you have a point .htaccess is much more efficient for actually STOPPING a spider
hehe oops :p
All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved