Forum Moderators: goodroi

Message Too Old, No Replies

Disagreement between Google and a validator

Robots.txt Checker says it's so, Google robots anayler says the other way

         

Gusgsm

11:23 am on Dec 10, 2006 (gmt 0)

10+ Year Member



I have a robots text that goes:


User-agent: *
# Directories
Disallow: /cgi-bin/
Disallow: /database/
Disallow: /db/
Disallow: /dumper/
Disallow: /estilos/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
Disallow: /tmp/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Paths (Clean URLs)
Disallow: /admin/
Disallow: /node/add/
Disallow: /search/
Disallow: /comment/reply/
Disallow: /contact
Disallow: /user/register
Disallow: /user/password
Disallow: /logout

These are all files and directories to be excluded from crawling. Robots.txt Checker (http://tool.motoricerca.info/robots-checker.phtml) says it's all right. But when I use the robots analyzer from Google (https://www.google.com/webmasters/tools/robots) says they are allowed.

I am completely lost. Could anybody possibly tell me how would it be right to disallow those folders and files?

Thank you so much :)

PS. I have returned to a slim robots.txt meanwhile.

leadegroot

12:52 pm on Dec 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can't reproduce this - when I drop your robots file in the sitemaps robots.txt test thingy (thats a technical term ;)) and test all the URLs listed they are all blocked.

Are you sure you aren't getting weird characters in your test URLs so they look the same but aren't really?

Lea

Gusgsm

7:58 pm on Dec 10, 2006 (gmt 0)

10+ Year Member



Lea,

No. I get the same blocking in the robots validator (er... well, the 'thingy' ;) ) and that's theoretically what I want.

But the Google Sitemap robots.txt validator says me they have free access.

That's the odd thing.

(The original robots.txt is a proposal made in Drupal.org that I have modified a bit to suit my needs).

leadegroot

10:59 pm on Dec 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well when i test your robots.txt in Google's sitemap validator they are all correctly blocked.
I don't know of an external robots validator to check against :(

Which of the files does Google claim they will be able to get to? Only some? Or All?

Gusgsm

8:38 am on Dec 11, 2006 (gmt 0)

10+ Year Member



Lea,

Now I am completely lost :¦ After reading your post, I have retried today the Google sitemap robots validator and now it's says me they are blocked, as I wished.

I am flabbergasted and a bit sorry for the inconvenience.

Thanks a lot for you attention and patience.

Best wishes :)

leadegroot

9:49 am on Dec 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No problem :)