homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Non-white list blocking
With a sting

 12:58 am on Sep 26, 2008 (gmt 0)

For non-white listed bots Iím currently generating (php) a robots.txt file that looks like this:

User-agent: *
Disallow: /

For reasons I prefer to not to explain atm is this allowed:

User-agent: *
Disallow: /
Disallow: /rpqtewz/

Does order make any difference?) - or:

User-agent: *
Disallow: /rpqtewz/
Disallow: /

Or would something like this be better (might give me more options):

User-agent: *
Disallow: /rpqtewz.gif
Disallow: /




 2:09 pm on Sep 29, 2008 (gmt 0)

Hi Phred,

I am not sure I understand what you are trying to do.

When you use "Disallow: /" in your robots.txt it is telling the robots.txt to not visit anything. So it does not matter if you also list specific folders to disallow since you have already told the robots to disallow every folder.


 8:55 pm on Sep 29, 2008 (gmt 0)

Bot trap - anyone hitting the file or directory could have only known about them from robots.txt - take appropriate action. A unique generated name that allows tracking back to, among other things, date, time, ip, ua.



 12:51 pm on Sep 30, 2008 (gmt 0)

I would argue that you have made your entire site into a bot trap. I can understand why you would want to list one specific folder to make it easier to identify bad bots. Since you are looking for bad bots the order does not matter. Bad bots do not honor robots.txt and often are looking to exploit it.

In the past I have had fun with creating bot traps. I create folders that human spies and bad bots would love to get into that do not really exist on my sites. Here is a quick list of folder names I have used for bot traps:

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved