Forum Moderators: goodroi
I was just wondering - is there a reference robots.txt file out there that a person can use that filters all the junk visitors?
Kind of like an anti-spam list that's kept up to date excluding search engines of questionable value.
Make sense? Already been done? or way off base? ;-)
Thanks & Merry Christmas everyone!
You could start with something simple that allows well-known robots to fetch all of your pages/resources and disallows all others. For example:
# Allow major, Internet Archiver, and ODP 'bots
User-agent: Googlebot
User-agent: ia_archiver
User-agent: msnbot
User-agent: Robozilla
User-agent: Slurp
User-agent: Teoma
Disallow:# Disallow all others
User-agent: *
Disallow: /
ia_archiver (for the Internet Archive "Wayback Machine") and Robozilla (for DMOZ/ODP) are optional; I included them because --depending on your site and your interests-- leaving them out might be disastrous. Consider also adding some of the less-well-known 'bots used by meta-search providers such as Dogpile and IxQuick, plus any 'bots used by Pay-Per-Click and local directory services you may use or be interested in -- For example, Verizon SuperPages. If you have a mobile version of your site, then you'll want to add the mobile specialty 'bots as well... Example: YahooSeeker/M1A1-R2D2
As stated above, your robots.txt may change dramatically from this example depending on your site and your needs.
Jim