Forum Moderators: open
There's one ressource on my site that I don't want Google to index, as it is a user registration form. The ressource is usually requested with one url-parameter, urls look like that:
www.example.org/thatscript.php?key=value
Now Google indexed 400 different urls of that kind, resulting from 400 different key=value parameters.
My robots.txt (untouched for months):
User-agent: *
Disallow: /thatscript.php
Is there any problem with my robots.txt or is it Googlebot's fault?
regards
Martin
It can be a bit slow with housekeeping though. If no links point to those pages, they might not get visited anytime soon.
I changed the tag to noindex,nofollow for a bunch of pages last year and they stayed in the index for months until I linked to them (in an unconspicious way) from one of my regular pages.
It reminds me a bit of this other problem:
[webmasterworld.com...]
where someone had to resurrect an old server to get Google to spider the new server location. (Even though that was a different technical issue.)
Maybe Googlebot is sentimental, and can't let go of the past easily? ;)
Not positive if it will work for your case (typically people use it for 1-2 urls), but you could check it out.
(I don't recall off-hand if that robots.txt will match your urls. We do support wildcards in the Disallow field though. I would check out our webmaster section for some examples.)