I see all these directories, but none of them work, could these all be a result of google spidering websites that use www.example.com as an example and adding the folder used in the example?
yup yup, just wondering why there are folders off the top level, cause when you try and visit one of them, nothing there, but if you do a search on the path, such as "www.example.com/bmw", you will see a website using it as an example, which google sees, and attempts? to spider, but nothing is there, but still includes it in the index? Or was there at one point a /bmw and its no longer there? heeh, I was just bored and noticed it.
encyclo
3:55 pm on Jul 13, 2007 (gmt 0)
The robots.txt for example.com blocks Googlebot, so it has no way of knowing if the files/directories exist or not. It just sees a reference to a resource that it it cannot access, and leaves an URL-only result.
youfoundjake
10:03 pm on Jul 14, 2007 (gmt 0)
Encylo, good point,I hadn't thought that far ahead yet, just kind of raised the original question. Now it appears other questions are raised. Do the links towards example.com and its subdirectories count as inbound links even though robots.txt is blocking all bots, and do those count for pagerank in google, and what ever else in the other search engines? Now, in practicality... if the directory does not exist, and a search engine spiders it and gets a 404, could that hurt the site? Just some questions that popped up. Hope everyone is having a good weekend. Jake