Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Anyone know the name of Wayback Machine's robot?


incrediBILL - 8:46 pm on Dec 27, 2012 (gmt 0)


Bad bots never even look at robots.txt


Actually they do at times. There's one bot that uses a blank user agent when reading the robots.txt to see which spider names you've allowed. Then it switches to one of those spider names to make sure it gets access to your site.

Another reason I do a dynamic robots.txt file that tells everyone to go away except my whitelist and serves up a custom file per request so that I don't expose the bot names that are allowed.

In case you're wondering, ia_archive is NOT allowed ;)


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4530575.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com