Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Would deleting obsolete index.htm hurt indexing

Site now invisible to Google

         

majac

10:48 am on Sep 20, 2005 (gmt 0)

10+ Year Member



Hi... recently we deleted some obsolete urls from Googles listings using the Google tool and changing robot.txt... this seems to have messed up indexing of this long established site that had PR of 4. Its now n/a.

One of the pages deleted was an "index.htm" file. It was there because when we moved servers a couple of years ago, the default had to be changed to "index.html", and some other Search Engines were still requesting the now defunct "index.htm".

Site is being spidered every few days but is not indexed

Thanks muchly in advance...

Eltiti

12:28 pm on Sep 21, 2005 (gmt 0)

10+ Year Member



Instead of blocking index.htm using robots.txt, I'd set up a permanent redirect (301) from index.htm to the home page of my site, www.example.com (I wouldn't redirect to index.html).

kaled

2:53 pm on Sep 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google typically treats index.htm, index.php, index.asp, index.html, index.shtml, etc. as identical (being the default page). When you removed index.htm you probably removed index.html with all the horrible consequences that would follow.

Kaled.

walkman

3:47 pm on Sep 21, 2005 (gmt 0)



>> Google typically treats index.htm, index.php, index.asp, index.html, index.shtml, etc. as identical (being the default page).

ummm....not really. if you have both the domain.com/ and domain.com/index.htm indexed, you might have probelms. They're different pages to Google as far as I know

Wizard

3:59 pm on Sep 21, 2005 (gmt 0)

10+ Year Member



There was a thread here a long time ago, where someone proved an error in Google by removing microsoft.com page from Google index.

Google used to treat '/', '/index.html' and '/index.htm' as the same URL, so as Microsoft server (using 'default.asp' as main index) returned 404 for '/index.html' request, it was possible to remove it's main page by removing '/index.html' with URL Console.

After this incident, the bug was said to be corrected immediately, but maybe under certain circumstances removing '/index.htm' still causes removal of '/'? There is no doubt removal of 'www.example.com/' removes also 'example.com/', so it's not uncommon that one URL causes other to be removed, no matter if they appear as separate ones in search results.

If you're not Microsoft, they won't do anything immediate, and your site is likely out of the index for six months. It will be crawled as often as it always been, it's PageRank will go gray, but it will be internally remembered and will reappear in toolbar immediately when it goes back. It will go back with fresh snippet, as it will be crawled despite of removal. And new links added there will be followed normally, even if the page will be out of the index.

Let's hope I'm wrong, but that's how it happen with accidentally removed pages, and many things point you probably did it.

301 would have been much safer.

kaled

4:29 pm on Sep 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ummm....not really. if you have both the domain.com/ and domain.com/index.htm indexed, you might have probelms. They're different pages to Google as far as I know
If you can find an example, I'll be impressed.

The fix to prevent incorrect url removal was, so far as I am aware, unrelated to Google's treatment of default pages.

Kaled.

twebdonny

4:31 pm on Sep 21, 2005 (gmt 0)



Be wary of trying to remove anything, we followed instructions to the letter and 1 week later our main page, index.htm is missing, creating havoc for all the PR it passes to the rest of our pages. Writing to Google about
this has to date proven fruitless. We never once mentioned anything in removal about removing index.htm.

majac

4:18 am on Sep 22, 2005 (gmt 0)

10+ Year Member



Thanks everyone....

Finally got a rather vague reply from Google... didnt answer specific index.htm question, but said just wait for the latest reindexing. I guess they are being deluged and so have a standard reply.

Our client has to have a Google presence, so we are going to use Adsense to at least get a presence back... though if Google doesnt think the site exists (site: returns nothing) it will be intersting to see what happens