ergophobe - 7:29 pm on Dec 21, 2010 (gmt 0)
>>I'm seeing evidence of leeching from images ~ hotlinking, scraping
Robots.txt in now way protects you from this.
>>Pretty much the same on every site
Then it's boilerplate? Probably safe to exclude. I have a "site" that is mostly just a FAQ which is a rewritten New York Times article into FAQ form. It earns about $10/month in Adsense and has been sitting there with almost no change for a few years.
So it depends on what your FAQ is.
>>Disallow: /*?* -> Again, just a "nothing there for you to index from my POV" situation.
Sure, but if somehow, some user somewhere has you linked that way, and Google comes a crawlin, why not let it in, and then 301 the request when it arrives to your canonical page? Actually, with WP, that should happen anyway.
Fire up HTTPLiveHeaders and request a page in the example.com/?p=123 form and see what happens. If it doesn't send a 301, fix it. If it does, let Google crawl it and give it a 301 or 404 as need be to force an index update.
I don't see the upside of stopping the Googlebot unless your 301s are faulty.