Forum Moderators: goodroi
User-agent: *
# Directories
Disallow: /cgi-bin/
Disallow: /database/
Disallow: /db/
Disallow: /dumper/
Disallow: /estilos/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
Disallow: /tmp/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Paths (Clean URLs)
Disallow: /admin/
Disallow: /node/add/
Disallow: /search/
Disallow: /comment/reply/
Disallow: /contact
Disallow: /user/register
Disallow: /user/password
Disallow: /logout
These are all files and directories to be excluded from crawling. Robots.txt Checker (http://tool.motoricerca.info/robots-checker.phtml) says it's all right. But when I use the robots analyzer from Google (https://www.google.com/webmasters/tools/robots) says they are allowed.
I am completely lost. Could anybody possibly tell me how would it be right to disallow those folders and files?
Thank you so much :)
PS. I have returned to a slim robots.txt meanwhile.
No. I get the same blocking in the robots validator (er... well, the 'thingy' ;) ) and that's theoretically what I want.
But the Google Sitemap robots.txt validator says me they have free access.
That's the odd thing.
(The original robots.txt is a proposal made in Drupal.org that I have modified a bit to suit my needs).