tells bots not to fetch eight specific URL-paths, but then overrides that by telling them to fetch "everything." The end result is that this policy record accomplishes nothing at all.
I would suggest leaving out the "Allow" completely as, if I understand your intent, it is not needed.
Your file then spends many lines disallowing bad-bots which will not pay any attention to robots.txt. I'd suggest that you monitor all the 'bots in your list, and delete the disallows for the ones that don't obey them anyway. You can and should take care of them in other ways -- such as serving them a 403-Forbidden response using code in .htaccess or in your scripts (Be sure to allow all clients (including bad-bots) to fetch robots.txt itself, and if you use a custom 403 error document, be sure to allow all clients (even bad-bots) to fetch that page, otherwise, you create an "infinite loop" which is NOT good for your server...)
You should put your policy records in order from most-specific to least, with specific 'bots listed first, and ending up with the "User-agent: *" record.
Be aware that not all 'bots understand "Allow," "Host," "Crawl-delay" and other semi-proprietary directives; Although robots are *supposed to* ignore directives that they do not understand, these semi-proprietary directives should be included only in policy records directed to the robots that do understand them if you want your site's robots.txt implementation to be robust.
Msg#: 4144378 posted 8:38 pm on Jun 6, 2010 (gmt 0)
Might I suggest a white list approach? Managing a list of bad bots with disallows they won't honor is a significant use of time. White list the bots allowed and disallow all other bots. Bots I let in is a pretty short list! Then looking at your logs for a few weeks will tell you which non-compliant bots need to be banned via .htaccess
I spent two years chasing bad bots and got ulcers. Three years ago switched to white listing and sleep so much better!