| Welcome to WebmasterWorld Guest from 18.104.22.168 |
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
|Pubcon Platinum Sponsor 2014|
|Robots.txt - What's the Point?|
| 4:49 pm on May 30, 2003 (gmt 0)|
I've been devouring every piece of information I can on the site regarding robots.txt, and I still come to the same question: Why bother?
Isn't it better to have no robots.txt at all and let all the spiders in?
| 4:51 pm on May 30, 2003 (gmt 0)|
Some spiders harvest email addresses, do you want them grabbing yours?
Others bombard your server with requests one after another too fast and bring your server to its knees.
Sometimes you have pseudo-sensitive data you don't want crawled.
Other times you have a development server online that you don't want crawled and indexed, just your production server should be.
| 4:52 pm on May 30, 2003 (gmt 0)|
depends if you want to let all the spiders in.
there are places that people just do not want to be indexed.
personal data or subscriber data.
| 4:53 pm on May 30, 2003 (gmt 0)|
You may have pages that you do not want the spiders to bother to index. It's a waste of bandwidth and time.
I have a few. Another reason on our site is we duplicate content (with permission) from another website. That would get us penalised by google if we let them in.
| 4:54 pm on May 30, 2003 (gmt 0)|
gibble : Some spiders harvest email addresses
and of course they would obey the no robots.txt file ;)
| 6:56 pm on May 30, 2003 (gmt 0)|
many rogue spiders don't obey robots.txt. you may need to ban the via .htaccess
| 6:59 pm on May 30, 2003 (gmt 0)|
well...yeah...you have a point .htaccess is much more efficient for actually STOPPING a spider
hehe oops :p
All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved