Forum Moderators: goodroi

Message Too Old, No Replies

Reference robots.txt file?

Does anyone have one?

         

mmiller

6:48 pm on Dec 25, 2007 (gmt 0)

10+ Year Member



Hi Folks;

I was just wondering - is there a reference robots.txt file out there that a person can use that filters all the junk visitors?

Kind of like an anti-spam list that's kept up to date excluding search engines of questionable value.

Make sense? Already been done? or way off base? ;-)

Thanks & Merry Christmas everyone!

jdMorgan

9:28 pm on Dec 25, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Because opinios, sites, and ROI on bandwidth differ, there's no such thing as a 'reference' robots.txt. Also, be aware that malicious or incompetent robots won't fetch and/or won't obey robots.txt.

You could start with something simple that allows well-known robots to fetch all of your pages/resources and disallows all others. For example:

# Allow major, Internet Archiver, and ODP 'bots
User-agent: Googlebot
User-agent: ia_archiver
User-agent: msnbot
User-agent: Robozilla
User-agent: Slurp
User-agent: Teoma
Disallow:

# Disallow all others
User-agent: *
Disallow: /


Use the exact format shown, including the blank line at the end. Do not add or remove blank lines, and do not consider the file to be 'free-form' in any way; Some robots are VERY picky about exact compliance with the syntax given in the Standard.

ia_archiver (for the Internet Archive "Wayback Machine") and Robozilla (for DMOZ/ODP) are optional; I included them because --depending on your site and your interests-- leaving them out might be disastrous. Consider also adding some of the less-well-known 'bots used by meta-search providers such as Dogpile and IxQuick, plus any 'bots used by Pay-Per-Click and local directory services you may use or be interested in -- For example, Verizon SuperPages. If you have a mobile version of your site, then you'll want to add the mobile specialty 'bots as well... Example: YahooSeeker/M1A1-R2D2

As stated above, your robots.txt may change dramatically from this example depending on your site and your needs.

Jim

mmiller

11:30 pm on Dec 25, 2007 (gmt 0)

10+ Year Member



Thank you Jim - I never thought about malicious sites not obeying the robots.txt - dOh!

Merry Christmas - and thanks for you reply and the example. :-)