I just took a look - it's pretty rudimentary, although it does generate Allow rules, as well as Disallow rules. Wild card pattern matching is not there, and this is the spot where many get intimidated. Still, coupled with the already present robots.txt analysis tool, I can see that this generator might be a welcome thing for some people.
Wildcards are pivotal in correctly indexing my forum content and it works well to prevent dupe issues. I'm guessing they'll add that feature at a later date.... or maybe they didn't add it because it's difficult to comprehend and easy to accidentally block huge swathes of content you DO want indexing?
To my limited understanding there is no official "allow"-statement according to the rfc-specifications? Only disallow. But of course google may define it's own rules for it's own robots.
I think, those webmasters, who feel the need to exclude robots from certain parts of their own website (for bandwidth- or whatever other reasons) are strongly recommended to acquire decent knowledge on robots.txt-syntax on their own, and I assume most definitely have done so.
So the only reason, why this tool makes sense, is probably google's own bandwidth: If webmasters help to lead adsbot to only the relevant parts, there is no need for these bots to crawl the whole site. But do their bots anyway? Does adsbot follow links and crawl pages other than those defined as target URLs in the campaigns? Did you experience images-bot crawl the whole webmasterworld-site in search for thousands of copies of this beautiful world-map- or visa-logo?;)
".....thus eliminating the pesky use of notepad."
|there is no official "allow"-statement |
You are correct, Allow rules are an extension to the original robots.txt protocol, as are pattern matching wild cards [google.com] and indicating your Sitemap location [google.com]. The article mentions this:
|The Robots.txt Generator creates files that Googlebot will understand, and most other major robots will understand them too. But it's possible that some robots won't understand all of the robots.txt features that the generator uses. |
I'm thinking that the robots.txt generator in Webmaster Tools might also help people avoid situations like this one:
Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]
Does anyone know how to keep the google bot from clicking ads on a publisher site? Whenever Google bot comes to the site it pumps up the number of page views and clicks for the advertiser - any way restrict them from incfluencing those ad stats?
I knew my post from 20 months ago would come in handy one day. :-)
|Does anyone know how to keep the google bot from clicking ads on a publisher site? |
Maybe move all their URL's to one directory, and disallow Gbot to that directory,
via the robots.txt?
Thanks, Chameleon. I will look into that now.
I have a client who publishes ads - not adsense - they have propietary ad serving technology and have a number of local advertisers as customers. The issue is that bots are indexing the pages of the site and then clicking on the ads and therefore adding fake cost to the advertisers who aren't happy about it.