Welcome to WebmasterWorld Guest from 54.163.23.73

Message Too Old, No Replies

Robots.txt generator added to Webmaster Tools

     

tedster

3:57 am on Mar 28, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Reaching out a helping hand to the roboto-phobic website owner, Webmaster Tools launches a tool to generate robots.txt file.

...what if you're intimidated by the idea of communicating directly with Googlebot? After all, not all of us are fluent in the language of robots.txt. This is why we're pleased to introduce you to your personal robot translator: the Robots.txt Generator in Webmaster Tools. It's designed to give you an easy and interactive way to build a robots.txt file. It can be as simple as entering the files and directories you don't want crawled by any robots.

[googlewebmastercentral.blogspot.com...]

tedster

4:45 am on Mar 28, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I just took a look - it's pretty rudimentary, although it does generate Allow rules, as well as Disallow rules. Wild card pattern matching is not there, and this is the spot where many get intimidated. Still, coupled with the already present robots.txt analysis tool, I can see that this generator might be a welcome thing for some people.

Asia_Expat

5:20 am on Mar 28, 2008 (gmt 0)

5+ Year Member



Wildcards are pivotal in correctly indexing my forum content and it works well to prevent dupe issues. I'm guessing they'll add that feature at a later date.... or maybe they didn't add it because it's difficult to comprehend and easy to accidentally block huge swathes of content you DO want indexing?

Oliver Henniges

12:08 pm on Mar 31, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To my limited understanding there is no official "allow"-statement according to the rfc-specifications? Only disallow. But of course google may define it's own rules for it's own robots.

I think, those webmasters, who feel the need to exclude robots from certain parts of their own website (for bandwidth- or whatever other reasons) are strongly recommended to acquire decent knowledge on robots.txt-syntax on their own, and I assume most definitely have done so.

So the only reason, why this tool makes sense, is probably google's own bandwidth: If webmasters help to lead adsbot to only the relevant parts, there is no need for these bots to crawl the whole site. But do their bots anyway? Does adsbot follow links and crawl pages other than those defined as target URLs in the campaigns? Did you experience images-bot crawl the whole webmasterworld-site in search for thousands of copies of this beautiful world-map- or visa-logo?;)

zuko105

2:14 pm on Mar 31, 2008 (gmt 0)

10+ Year Member



".....thus eliminating the pesky use of notepad."

tedster

3:02 pm on Mar 31, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



there is no official "allow"-statement

You are correct, Allow rules are an extension to the original robots.txt protocol, as are pattern matching wild cards [google.com] and indicating your Sitemap location [google.com]. The article mentions this:

The Robots.txt Generator creates files that Googlebot will understand, and most other major robots will understand them too. But it's possible that some robots won't understand all of the robots.txt features that the generator uses.

I'm thinking that the robots.txt generator in Webmaster Tools might also help people avoid situations like this one:

Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]

Bern

4:49 pm on Apr 14, 2008 (gmt 0)

5+ Year Member



Does anyone know how to keep the google bot from clicking ads on a publisher site? Whenever Google bot comes to the site it pumps up the number of page views and clicks for the advertiser - any way restrict them from incfluencing those ad stats?

g1smd

6:55 pm on Apr 14, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I knew my post from 20 months ago would come in handy one day. :-)

a_chameleon

7:30 pm on Apr 14, 2008 (gmt 0)

10+ Year Member



Does anyone know how to keep the google bot from clicking ads on a publisher site?

Maybe move all their URL's to one directory, and disallow Gbot to that directory,
via the robots.txt?

Bern

8:10 pm on Apr 14, 2008 (gmt 0)

5+ Year Member



Thanks, Chameleon. I will look into that now.

I have a client who publishes ads - not adsense - they have propietary ad serving technology and have a number of local advertisers as customers. The issue is that bots are indexing the pages of the site and then clicking on the ads and therefore adding fake cost to the advertisers who aren't happy about it.