Forum Moderators: open
Anyway, I have a little hobby site that I put up. I put up a robot.txt that disallows a certain directory. I have triple checked the robot.txt and I have put it through the validator. Everything checks out, but googlebot still crawls the pages in that directory. Nobody else crawls it, just googlebot.
Does googlebot crawl everything and just index whats allowed or is something wrong here?
Does google list the pages? If so, how long has the robots.txt been in place?
(no offense, but we've seen a few dozen of these claims and all but 1 [webmasterworld.com] have turned out to be webmaster error).
more reading [google.com]...
[edited by: DanA at 1:38 pm (utc) on Nov. 30, 2003]
Did we actually fetch pages from that directory, or do we just show titles without any cached page link (we can do that for urls that we see referenced but didn't crawl)?
How can we prevent these sites/directorys to be listed/spidered at all?
@hannamyluv
It's called robots.txt and not robot.txt. Maybe the problem is the name of the file?
I'll check the file, perhaps it is singular so I will take a look at that. Thanks for the help.
*sigh, I feel like a newbie. I kinda wish I had a programmer/techie at home like I do at work.*