homepage Welcome to WebmasterWorld Guest from 54.226.180.223
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Non-white list blocking
With a sting
phred




msg:3752297
 12:58 am on Sep 26, 2008 (gmt 0)

For non-white listed bots Iím currently generating (php) a robots.txt file that looks like this:

User-agent: *
Disallow: /

For reasons I prefer to not to explain atm is this allowed:

User-agent: *
Disallow: /
Disallow: /rpqtewz/

Does order make any difference?) - or:

User-agent: *
Disallow: /rpqtewz/
Disallow: /

Or would something like this be better (might give me more options):

User-agent: *
Disallow: /rpqtewz.gif
Disallow: /

Thanks,
Phred

 

goodroi




msg:3754465
 2:09 pm on Sep 29, 2008 (gmt 0)

Hi Phred,

I am not sure I understand what you are trying to do.

When you use "Disallow: /" in your robots.txt it is telling the robots.txt to not visit anything. So it does not matter if you also list specific folders to disallow since you have already told the robots to disallow every folder.

phred




msg:3754996
 8:55 pm on Sep 29, 2008 (gmt 0)

Bot trap - anyone hitting the file or directory could have only known about them from robots.txt - take appropriate action. A unique generated name that allows tracking back to, among other things, date, time, ip, ua.

Phred

goodroi




msg:3755503
 12:51 pm on Sep 30, 2008 (gmt 0)

I would argue that you have made your entire site into a bot trap. I can understand why you would want to list one specific folder to make it easier to identify bad bots. Since you are looking for bad bots the order does not matter. Bad bots do not honor robots.txt and often are looking to exploit it.

In the past I have had fun with creating bot traps. I create folders that human spies and bad bots would love to get into that do not really exist on my sites. Here is a quick list of folder names I have used for bot traps:
/creditcardnumbers/
/customerdatabase/
/salesreport/
/passwords/
/private/
/ssn-data/
/secret/

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved