Forum Moderators: Robert Charlton & goodroi
Prior to this error my www.mydomain.com/ and www.mydomain.com/index.htm both had Google toolbar pagerank = 5 across all datacenters. All my internal pages linked to from my homepage had PR4.
I use a website uptime monitor which didn't detect the substitution of my index.htm (it only reports website up or down - not page changes) so I didn't realise the error had occurred for 6 days - during which time Google spidered and indexed the unrelated index.htm. I put the original index.htm back and Google has spidered and indexed it.
My SERPS positions do not appear to be affected and traffic quickly recovered after I replaced the correct index.htm.
But here's the problem:
In the last week www.mydomain.com/ and www.mydomain.com/index.htm have both fallen to PR3 across almost all datacenters even though all the internal pages still have PR4.
I think the substitution of the unrelated index.htm has caused Google to think that www.mydomain.com/ and www.mydomain.com/index.htm are distinct pages and is in effect hitting me with a duplicate content penalty by splitting my PR.
How do I put this back together and get my PR5 back?
My first instinct was to:
redirect 301 www.mydomain.com www.mydomain.com/index.htm
redirect 301 www.mydomain.com/ www.mydomain.com/index.htm
in my .htaccess file. Unfortunately this does not work as it does not force
www.mydomain.com/
to redirect to
www.mydomain.com/index.htm
with my hosting provider. I could write a long essay on the technical illiteracy of their helpdesk staff, the online session transcripts make comical reading.
My question, what should I do to get my PR5 back?
Thanks
[edited by: tedster at 5:30 pm (utc) on Aug. 9, 2007]
Domain Root vs. index.html - another kind of duplicate [webmasterworld.com]
As a general rule, I would choose to redirect FROM index.htm to the domain root, and not from the domain root to the index.htm url. Since both urls seem to be indexed by Google for you, this is probably the best.
As for seeing PR5 in the toolbar again, that will probably take another PR update, whenever that happens - 3 months or so. But the real PR should go into effect quickly for ranking purposes. So if you have identified the actual reason for this drop, the redirect should fix it.
You might also change any "Home" links on the site to point to the domain root instead of index.htm -- that will also speed things along.
redirect 301 /index.htm http://www.example.com/
my homepage hangs. Firefox gives this error message: 'The page isn't redirecting properly. Firefox has detected that the server is redirecting the request for this address in a way that will never complete.' I suspect the redirection invokes a recursive loop and when I
redirect 301 http://www.example.com/index.htm http://www.example.com/
nothing happens, that is, no redirection occurs.
Any recommendations - anyone?
Many thanks
[edited by: tedster at 7:21 pm (utc) on Aug. 11, 2007]
[edit reason] switch to example.com - it can never be owned [/edit]
This can be easily avoided by checking to make sure that the request for "index.html" is coming from a client, rather than being the result of the internal DirectoryIndex rewrite function. Checking the server variable %{THE_REQUEST} (as demonstrated in the cited thread) is the key to making this work.
Jim
Many thanks for your reply but is there a potential problem with that approach?
Will this technique permanently inform Google that www.example.com/ and www.example.com/index.htm are exactly the same location and that the PR should be permanently merged?
What I need is a cast-iron method of informing Google that this is the case. For that I think I need a 301 and I'm uncertain that a 301 on-the-fly will do the trick.
In my first post I said that my index.htm was substituted with a completely unrelated (i.e. default) index.htm. In fact this is not quite the case. Looking again, I see that my index.htm was substituted with a default index.html. I think the index.htm vs index.html issue has confused Google.
The external backlinks mostly point to www.domain.com/ and the internal links point to www.domain.com/index.htm.
Frankly I'd much prefer to migrate to another host *if* that will solve the problem - it's too important to be left as it is.
So my question is what's the most foolproof way out of this mess?
Thanks
Graham
[edited by: tedster at 7:48 pm (utc) on Aug. 11, 2007]
[edit reason] switch to example.com - it can never be owned [/edit]
Will this technique permanently inform Google that www.example.com/ and www.example.com/index.htm are exactly the same location and that the PR should be permanently merged?
Yes.
In my first post I said that my index.htm was substituted with a completely unrelated (i.e. default) index.htm. In fact this is not quite the case. Looking again, I see that my index.htm was substituted with a default index.html. I think the index.htm vs index.html issue has confused Google.The external backlinks mostly point to www.domain.com/ and the internal links point to www.domain.com/index.htm.
So my question is what's the most foolproof way out of this mess?
With your added problem description, you'll need to:
1) Correct all internal home-page links to point to the URL "/".
2) Declare a DirectoryIndex citing only your 'real' index file.
3) Add the code in the second post of the cited thread.
That is, in example.com/.htaccess:
DirectoryIndex index.htm
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?\ HTTP/
RewriteRule ^index\.html?$ http://www.example.com/$1 [R=301,L]
Jim
However, since the root document should always be available by simply requesting "/" there's not much incentive for you to take exceptional measures to hide it, or for anyone else to try to find it... Unless you want to cloak the page, and the hypothetical attacker wants to find your cloaked page(s). However, a version of the "index.htm"-to-"/" redirect code posted above, modified to be sensitive to user-agent strings and/or requestor IP addresses, might come in handy in that case.
Jim
If they do find the document location (using a method noted above, by Jim), as soon as the document supplying information to the root is requested from the server, your server will send a header stating 'the document requested has been permanently relocated' the new location is http://www.example.com/.
The requested document will not open for an 'external' (browser, search engine, text reader, etc.) request, but the information will be served from an 'internal' (your server opens the file and serves the information from index.htm OR MyNewDirectoryIndex.html OR blah.php to http://www.example.com/) request.
Basically it should not matter if someone has the URL of the document supplying the information for the directory index... no one, except your server can open the document anyway. Any other attempt will be redirected to the new location. This includes attempts via inbound links. (Link weight will be passed through one redirect, so you should get SE credit for your inbound links if your .htaccess is structured properly.)
Justin
Many thanks for your code snippet. After switching the RewriteEngine on it worked fine out of the box - which is just as well as my mod_rewrite and regex are (ahem) rusty.
I notice that prior to the substitution, my site was showing as www.example.com/index.htm in Google's SERPS, but that after the substitution it is listed as www.example.com/ which I think lends weight to the view that Google has become confused about my homepage and has split my PR accordingly. Traffic and SERPS positions remain unaffacted however.
Anyhow, I owe you a bucketful of beer.
Thanks
Graham