Forum Moderators: open

Message Too Old, No Replies

Why is google indexing non existent pages?

Need I have index.html in every directory.

         

Powdork

10:41 pm on Aug 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am seeing a whole lot of
www.example.com/category/subcategory/ results for my site when doing a site:www.example.com search on google. These are to index pages which don't exist. Photoshop's web gallery automation creates a directory with all the html pages. This directory does not include the gallery index page which resides one level up. I'm concerned about whether Google is spidering my site correctly since the appearance of these means the actual % of pages they have indexed is lower than I thought. I'm also concerned about this combining with or being a result of any sandboxing. The returned pages are actually my servers indices of the directory.
Anyone?

[edited by: ciml at 10:51 am (utc) on Aug. 30, 2004]
[edit reason] Examplified. [/edit]

doc_z

5:52 pm on Aug 30, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does Google show any backlinks for these 'pages'? (There might be external links to the directories.)

Powdork

5:58 am on Aug 31, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does Google show any backlinks for these 'pages'? (There might be external links to the directories.)

Funny thing that.
Some of the internal pages did have links previously to
www.example.com/category/subcategory/index.htm which would yield a 404. i discovered a number of these problems on the site because msnbot was following the links and they were showing up in the error logs. Googlebot was not showing up in the error logs, because she was following the link but without the index.htm, which was how the link was coded. So does that mean that Googlebot always goes to
www.example.com/category/subcategory/ when the link is really to
www.example.com/category/subcategory/index.htm
If so, does that tell us anything useful?

highman

7:20 am on Aug 31, 2004 (gmt 0)

10+ Year Member



>So does that mean that Googlebot always goes to
>www.example.com/category/subcategory/ when the link is really to
>www.example.com/category/subcategory/index.htm
>If so, does that tell us anything useful?

This has been the case for years, very apparent if you use the 404 model where no files actually exist, a link to index.htm / html, default.htm / html / asp etc seems to be ignored and the default doc for that folder is requested

HTH

dirkz

10:45 am on Aug 31, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have seen that before (maybe GB is testing the existance of directories) but couldn't derive any useful knowledge of it.

Marcia

8:58 pm on Aug 31, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You have to exclude empty directories without an html file from being accessed yourself. Some hosts have something automatic in the control panel to do it.

Google will show a file in the index, just url only, even with a robots.txt exluded file or domain - even without a link to it. I've got a domain right now that's completely excluded from indexing and the url for the root index page in there.