homepage Welcome to WebmasterWorld Guest from 54.198.224.121
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Googlebot disobeys the conditions in robots.txt
googlebot disobeys conditions set under user-agent: *
alanedward16




msg:4202693
 11:28 am on Sep 16, 2010 (gmt 0)

The GWT robots.txt test shows that googlebot disobeys the conditions specified under user-agent: * and only takes in to account the conditions specified under user-agent: googlebot.

For example, I disallow 3 folders namely x.y & z from all bots and the folder w from only googlebot. But the test robots.txt option available in the Google webmaster tool disobeys the condition specified to all robots (ie under *).

Is this a bug or this how the robots directive work? Would appreciate id anyone sheds some light on this.

Thanks in advance.

 

goodroi




msg:4202695
 11:31 am on Sep 16, 2010 (gmt 0)

99% of google disobeying robots.txt turns out to be an error with the robots.txt file (typo, incorrectly formatted etc) and not google making a mistake.

i would suggest you double check your robots.txt to make sure it is ordered properly, has zero typos, has the necessary extra spaces & line breaks, uploaded to the right space and is 100% error free.

alanedward16




msg:4202699
 11:58 am on Sep 16, 2010 (gmt 0)

goodroi, thanks for the response.

I think you have misunderstood what I meant. To be clear, from the above example, the test robots.txt option in the webmaster tools shows that google only reads the conditions specified under googlebot and ignores those specified under *.

For example, check on this sample.

user-agent: *

Disallow: /x
Disallow: /y
Disallow: /z

user-agent: googlebot

Disallow: /w

So my question here is, will googlbot disallow only w or all 4 (x,y,z & w)?

The tool in GWT says only W.

Please clarify. Will be of huge help.

Thanks.

phranque




msg:4202713
 12:17 pm on Sep 16, 2010 (gmt 0)

Block or remove pages using a robots.txt file - Webmaster Tools Help -
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449&from=40360&rd=1 [google.com]:
Each section in the robots.txt file is separate and does not build upon previous sections.

For example:
User-agent: *
Disallow: /folder1/

User-Agent: Googlebot
Disallow: /folder2/

In this example only the URLs matching /folder2/ would be disallowed for Googlebot.

alanedward16




msg:4202718
 12:37 pm on Sep 16, 2010 (gmt 0)

Hi phranque,

Thanks for the clarification.

Highly appreciated :))

grandma genie




msg:4221457
 5:34 am on Oct 25, 2010 (gmt 0)

I thought Googlebot was doing something similar on my site today. I have my robots.txt set up like this for all bots:
User-Agent: *
Disallow: /osc/popup_image.php

But Googlebot showed up today doing this:

66.249.71.nn - - [24/Oct/2010:06:25:48 -0400] "GET /osc/popup_image.php?pID=3339 HTTP/1.1" 200 709 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:53 -0400] "GET /osc/popup_image.php?pID=1775 HTTP/1.1" 200 703 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:58 -0400] "GET /osc/popup_image.php?pID=3105 HTTP/1.1" 200 736 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I also have this further down where I list all the specific bots:

User-agent: Googlebot-Image
Disallow: /

What am I doing wrong? Googlebot totally descended on my site today like gang busters.

Grandma_genie

phranque




msg:4221465
 5:59 am on Oct 25, 2010 (gmt 0)

do you have a section for regular "Googlebot"?
have you tested your robots.txt in the Crawler access section of GWT?

grandma genie




msg:4221670
 4:38 pm on Oct 25, 2010 (gmt 0)

I do not have a section for the regular Googlebot. I have tested my robots.txt in the Crawler access section of GWT and it is working. Googlebot has access to most of my site, but the specific file referenced is blocked by robots.txt. Do I need to put an asterisk behind the popup_image.php* ?

Staffa




msg:4221803
 7:55 pm on Oct 25, 2010 (gmt 0)

I used to have my robots.txt file with some bots totally disallowed and one single section for those parts of my site which were disallowed to everyone.

It must now be about a year ago, Google started poking into the disallowed parts, even picking up images etc, (though Image bot is entirely disallowed) so I wrote a specific section for Gbot alone detailing everything it has to keep its hands off and that works.

g1smd




msg:4221845
 9:20 pm on Oct 25, 2010 (gmt 0)

When you have a section for Googlebot, Google reads ONLY that section.

Here's a detailed thread from 4 years ago... [webmasterworld.com...]

phranque




msg:4221941
 1:34 am on Oct 26, 2010 (gmt 0)

Do I need to put an asterisk behind the popup_image.php* ?

the robots exclusion protocol for robots.txt matches left-to-right, so the wildcard should not be necessary.
perhaps the Googlebot-Image section is triggering something.
i would try repeating the wildcard User-Agent exclusion in a Googlebot-specific section.

grandma genie




msg:4221965
 2:05 am on Oct 26, 2010 (gmt 0)

I tried creating a section just for the googlebot and it looks like that did the trick. I'll see if they are still obeying the robots.txt tomorrow. Thanks all.

grandma genie




msg:4223098
 1:23 am on Oct 28, 2010 (gmt 0)

Just a follow up to my last post. Yes, the changes in robots.txt did the trick. Googlebot is now obeying all directives. Thanks for all the help.

phranque




msg:4223305
 11:26 am on Oct 28, 2010 (gmt 0)

it's good that you were able to fix that, but the GWT Help documentation i linked to above doesn't match the results you observed.

grandma genie




msg:4224128
 12:16 am on Oct 30, 2010 (gmt 0)

Perhaps Google should change their docs. I just did the same thing with MSN and that works, too.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved