Welcome to WebmasterWorld Guest from 54.144.126.195

Forum Moderators: goodroi

Message Too Old, No Replies

Googlebot disobeys the conditions in robots.txt

googlebot disobeys conditions set under user-agent: *

     

alanedward16

11:28 am on Sep 16, 2010 (gmt 0)

5+ Year Member



The GWT robots.txt test shows that googlebot disobeys the conditions specified under user-agent: * and only takes in to account the conditions specified under user-agent: googlebot.

For example, I disallow 3 folders namely x.y & z from all bots and the folder w from only googlebot. But the test robots.txt option available in the Google webmaster tool disobeys the condition specified to all robots (ie under *).

Is this a bug or this how the robots directive work? Would appreciate id anyone sheds some light on this.

Thanks in advance.

goodroi

11:31 am on Sep 16, 2010 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



99% of google disobeying robots.txt turns out to be an error with the robots.txt file (typo, incorrectly formatted etc) and not google making a mistake.

i would suggest you double check your robots.txt to make sure it is ordered properly, has zero typos, has the necessary extra spaces & line breaks, uploaded to the right space and is 100% error free.

alanedward16

11:58 am on Sep 16, 2010 (gmt 0)

5+ Year Member



goodroi, thanks for the response.

I think you have misunderstood what I meant. To be clear, from the above example, the test robots.txt option in the webmaster tools shows that google only reads the conditions specified under googlebot and ignores those specified under *.

For example, check on this sample.

user-agent: *

Disallow: /x
Disallow: /y
Disallow: /z

user-agent: googlebot

Disallow: /w

So my question here is, will googlbot disallow only w or all 4 (x,y,z & w)?

The tool in GWT says only W.

Please clarify. Will be of huge help.

Thanks.

phranque

12:17 pm on Sep 16, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Block or remove pages using a robots.txt file - Webmaster Tools Help -
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449&from=40360&rd=1 [google.com]:
Each section in the robots.txt file is separate and does not build upon previous sections.

For example:
User-agent: *
Disallow: /folder1/

User-Agent: Googlebot
Disallow: /folder2/

In this example only the URLs matching /folder2/ would be disallowed for Googlebot.

alanedward16

12:37 pm on Sep 16, 2010 (gmt 0)

5+ Year Member



Hi phranque,

Thanks for the clarification.

Highly appreciated :))

grandma genie

5:34 am on Oct 25, 2010 (gmt 0)

5+ Year Member



I thought Googlebot was doing something similar on my site today. I have my robots.txt set up like this for all bots:
User-Agent: *
Disallow: /osc/popup_image.php

But Googlebot showed up today doing this:

66.249.71.nn - - [24/Oct/2010:06:25:48 -0400] "GET /osc/popup_image.php?pID=3339 HTTP/1.1" 200 709 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:53 -0400] "GET /osc/popup_image.php?pID=1775 HTTP/1.1" 200 703 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:58 -0400] "GET /osc/popup_image.php?pID=3105 HTTP/1.1" 200 736 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I also have this further down where I list all the specific bots:

User-agent: Googlebot-Image
Disallow: /

What am I doing wrong? Googlebot totally descended on my site today like gang busters.

Grandma_genie

phranque

5:59 am on Oct 25, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



do you have a section for regular "Googlebot"?
have you tested your robots.txt in the Crawler access section of GWT?

grandma genie

4:38 pm on Oct 25, 2010 (gmt 0)

5+ Year Member



I do not have a section for the regular Googlebot. I have tested my robots.txt in the Crawler access section of GWT and it is working. Googlebot has access to most of my site, but the specific file referenced is blocked by robots.txt. Do I need to put an asterisk behind the popup_image.php* ?

Staffa

7:55 pm on Oct 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I used to have my robots.txt file with some bots totally disallowed and one single section for those parts of my site which were disallowed to everyone.

It must now be about a year ago, Google started poking into the disallowed parts, even picking up images etc, (though Image bot is entirely disallowed) so I wrote a specific section for Gbot alone detailing everything it has to keep its hands off and that works.

g1smd

9:20 pm on Oct 25, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



When you have a section for Googlebot, Google reads ONLY that section.

Here's a detailed thread from 4 years ago... [webmasterworld.com...]

phranque

1:34 am on Oct 26, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Do I need to put an asterisk behind the popup_image.php* ?

the robots exclusion protocol for robots.txt matches left-to-right, so the wildcard should not be necessary.
perhaps the Googlebot-Image section is triggering something.
i would try repeating the wildcard User-Agent exclusion in a Googlebot-specific section.

grandma genie

2:05 am on Oct 26, 2010 (gmt 0)

5+ Year Member



I tried creating a section just for the googlebot and it looks like that did the trick. I'll see if they are still obeying the robots.txt tomorrow. Thanks all.

grandma genie

1:23 am on Oct 28, 2010 (gmt 0)

5+ Year Member



Just a follow up to my last post. Yes, the changes in robots.txt did the trick. Googlebot is now obeying all directives. Thanks for all the help.

phranque

11:26 am on Oct 28, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



it's good that you were able to fix that, but the GWT Help documentation i linked to above doesn't match the results you observed.

grandma genie

12:16 am on Oct 30, 2010 (gmt 0)

5+ Year Member



Perhaps Google should change their docs. I just did the same thing with MSN and that works, too.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month