homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Robots.txt generator added to Webmaster Tools
tedster




msg:3612943
 3:57 am on Mar 28, 2008 (gmt 0)

Reaching out a helping hand to the roboto-phobic website owner, Webmaster Tools launches a tool to generate robots.txt file.

...what if you're intimidated by the idea of communicating directly with Googlebot? After all, not all of us are fluent in the language of robots.txt. This is why we're pleased to introduce you to your personal robot translator: the Robots.txt Generator in Webmaster Tools. It's designed to give you an easy and interactive way to build a robots.txt file. It can be as simple as entering the files and directories you don't want crawled by any robots.

[googlewebmastercentral.blogspot.com...]


 

tedster




msg:3612964
 4:45 am on Mar 28, 2008 (gmt 0)

I just took a look - it's pretty rudimentary, although it does generate Allow rules, as well as Disallow rules. Wild card pattern matching is not there, and this is the spot where many get intimidated. Still, coupled with the already present robots.txt analysis tool, I can see that this generator might be a welcome thing for some people.

Asia_Expat




msg:3612973
 5:20 am on Mar 28, 2008 (gmt 0)

Wildcards are pivotal in correctly indexing my forum content and it works well to prevent dupe issues. I'm guessing they'll add that feature at a later date.... or maybe they didn't add it because it's difficult to comprehend and easy to accidentally block huge swathes of content you DO want indexing?

Oliver Henniges




msg:3615274
 12:08 pm on Mar 31, 2008 (gmt 0)

To my limited understanding there is no official "allow"-statement according to the rfc-specifications? Only disallow. But of course google may define it's own rules for it's own robots.

I think, those webmasters, who feel the need to exclude robots from certain parts of their own website (for bandwidth- or whatever other reasons) are strongly recommended to acquire decent knowledge on robots.txt-syntax on their own, and I assume most definitely have done so.

So the only reason, why this tool makes sense, is probably google's own bandwidth: If webmasters help to lead adsbot to only the relevant parts, there is no need for these bots to crawl the whole site. But do their bots anyway? Does adsbot follow links and crawl pages other than those defined as target URLs in the campaigns? Did you experience images-bot crawl the whole webmasterworld-site in search for thousands of copies of this beautiful world-map- or visa-logo?;)

zuko105




msg:3615347
 2:14 pm on Mar 31, 2008 (gmt 0)

".....thus eliminating the pesky use of notepad."

tedster




msg:3615387
 3:02 pm on Mar 31, 2008 (gmt 0)

there is no official "allow"-statement

You are correct, Allow rules are an extension to the original robots.txt protocol, as are pattern matching wild cards [google.com] and indicating your Sitemap location [google.com]. The article mentions this:

The Robots.txt Generator creates files that Googlebot will understand, and most other major robots will understand them too. But it's possible that some robots won't understand all of the robots.txt features that the generator uses.

I'm thinking that the robots.txt generator in Webmaster Tools might also help people avoid situations like this one:

Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]

Bern




msg:3626551
 4:49 pm on Apr 14, 2008 (gmt 0)

Does anyone know how to keep the google bot from clicking ads on a publisher site? Whenever Google bot comes to the site it pumps up the number of page views and clicks for the advertiser - any way restrict them from incfluencing those ad stats?

g1smd




msg:3626644
 6:55 pm on Apr 14, 2008 (gmt 0)

I knew my post from 20 months ago would come in handy one day. :-)

a_chameleon




msg:3626659
 7:30 pm on Apr 14, 2008 (gmt 0)

Does anyone know how to keep the google bot from clicking ads on a publisher site?

Maybe move all their URL's to one directory, and disallow Gbot to that directory,
via the robots.txt?

Bern




msg:3626697
 8:10 pm on Apr 14, 2008 (gmt 0)

Thanks, Chameleon. I will look into that now.

I have a client who publishes ads - not adsense - they have propietary ad serving technology and have a number of local advertisers as customers. The issue is that bots are indexing the pages of the site and then clicking on the ads and therefore adding fake cost to the advertisers who aren't happy about it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved