Forum Moderators: goodroi
User-agent:googlebot
Disallow:/page.asp?*
User-agent:*
Disallow:
This will allow all for all user agents except googlebot which will disallow all page.asp with a?
Make sure you use the googlebot directive first else googlebot will follow the User-agent:* directive.
User-agent: Badbot
Disallow: /file.html
Disallow: /directory
2.) A blank Disallow is the same as saying Allow. Adding a forward slash will turn away all crawlers inclined to heed the most basic robots.txt:
User-agent: *
Disallow: /
3.) Nowadays bots' preferences can be (head-bangingly) unique and specific, so it's always a good idea to go to the source:
[search.msn.com...]
See also: EXAMPLES [search.msn.com] (including wildcards)
[robotstxt.org...]
[google.com...]
See also: EXAMPLES [google.com] (including wildcards)
When creating your robots.txt file, please keep the following in mind: When deciding which pages to crawl on a particular host, Googlebot will obey the first record in the robots.txt file with a User-agent starting with "Googlebot." If no such entry exists, it will obey the first entry with a User-agent of "*".
(Odd. I coudn't edit my just-made post to add that bit.)
Googlebot will obey the first record in the robots.txt file with a User-agent starting with "Googlebot." If no such entry exists, it will obey the first entry with a User-agent of "*".
User-agent: *
Disallow: /
-- followed by all other entries in alphabetical order, some very detailed (msnbot, Googlebot, Slurp), some simple. All specifically identified major SE bots find 'their' instructions and follow them 99% of the time.
Now if only they'd all follow the same conventions! (See: AdsBot-Google's robots.txt specs [webmasterworld.com].)