Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Split pagerank on index.htm

Split pagerank on index.htm

         

grahame54

11:36 am on Aug 9, 2007 (gmt 0)

10+ Year Member



A few weeks ago, my webhost committed an incredible error by resetting my subdirectory mapping. This substituted my index.htm with a completely unrelated index.htm.

Prior to this error my www.mydomain.com/ and www.mydomain.com/index.htm both had Google toolbar pagerank = 5 across all datacenters. All my internal pages linked to from my homepage had PR4.

I use a website uptime monitor which didn't detect the substitution of my index.htm (it only reports website up or down - not page changes) so I didn't realise the error had occurred for 6 days - during which time Google spidered and indexed the unrelated index.htm. I put the original index.htm back and Google has spidered and indexed it.

My SERPS positions do not appear to be affected and traffic quickly recovered after I replaced the correct index.htm.

But here's the problem:

In the last week www.mydomain.com/ and www.mydomain.com/index.htm have both fallen to PR3 across almost all datacenters even though all the internal pages still have PR4.

I think the substitution of the unrelated index.htm has caused Google to think that www.mydomain.com/ and www.mydomain.com/index.htm are distinct pages and is in effect hitting me with a duplicate content penalty by splitting my PR.

How do I put this back together and get my PR5 back?

My first instinct was to:

redirect 301 www.mydomain.com www.mydomain.com/index.htm
redirect 301 www.mydomain.com/ www.mydomain.com/index.htm

in my .htaccess file. Unfortunately this does not work as it does not force

www.mydomain.com/

to redirect to

www.mydomain.com/index.htm

with my hosting provider. I could write a long essay on the technical illiteracy of their helpdesk staff, the online session transcripts make comical reading.

My question, what should I do to get my PR5 back?

Thanks

[edited by: tedster at 5:30 pm (utc) on Aug. 9, 2007]

tedster

3:00 am on Aug 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's a thread in our Hot Topics area [webmasterworld.com] that may help you:

Domain Root vs. index.html - another kind of duplicate [webmasterworld.com]

As a general rule, I would choose to redirect FROM index.htm to the domain root, and not from the domain root to the index.htm url. Since both urls seem to be indexed by Google for you, this is probably the best.

As for seeing PR5 in the toolbar again, that will probably take another PR update, whenever that happens - 3 months or so. But the real PR should go into effect quickly for ranking purposes. So if you have identified the actual reason for this drop, the redirect should fix it.

You might also change any "Home" links on the site to point to the domain root instead of index.htm -- that will also speed things along.

grahame54

1:40 pm on Aug 11, 2007 (gmt 0)

10+ Year Member



tedster - many thanks for your reply. In my .htaccess, when I

redirect 301 /index.htm http://www.example.com/

my homepage hangs. Firefox gives this error message: 'The page isn't redirecting properly. Firefox has detected that the server is redirecting the request for this address in a way that will never complete.' I suspect the redirection invokes a recursive loop and when I

redirect 301 http://www.example.com/index.htm http://www.example.com/

nothing happens, that is, no redirection occurs.

Any recommendations - anyone?

Many thanks

[edited by: tedster at 7:21 pm (utc) on Aug. 11, 2007]
[edit reason] switch to example.com - it can never be owned [/edit]

jdMorgan

5:12 pm on Aug 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Review the cited thread carefully, especially the second post. You must use mod_rewrite to avoid an "infinite" loop that will otherwise occur between the DirectoryIndex directive rewriting "/" to "index.html" and your mod_alias Redirect directive attempting to redirect "index.html" to "/". These two functions will interact, resulting in a loop.

This can be easily avoided by checking to make sure that the request for "index.html" is coming from a client, rather than being the result of the internal DirectoryIndex rewrite function. Checking the server variable %{THE_REQUEST} (as demonstrated in the cited thread) is the key to making this work.

Jim

grahame54

7:37 pm on Aug 11, 2007 (gmt 0)

10+ Year Member



Jim

Many thanks for your reply but is there a potential problem with that approach?

Will this technique permanently inform Google that www.example.com/ and www.example.com/index.htm are exactly the same location and that the PR should be permanently merged?

What I need is a cast-iron method of informing Google that this is the case. For that I think I need a 301 and I'm uncertain that a 301 on-the-fly will do the trick.

In my first post I said that my index.htm was substituted with a completely unrelated (i.e. default) index.htm. In fact this is not quite the case. Looking again, I see that my index.htm was substituted with a default index.html. I think the index.htm vs index.html issue has confused Google.

The external backlinks mostly point to www.domain.com/ and the internal links point to www.domain.com/index.htm.

Frankly I'd much prefer to migrate to another host *if* that will solve the problem - it's too important to be left as it is.

So my question is what's the most foolproof way out of this mess?

Thanks

Graham

[edited by: tedster at 7:48 pm (utc) on Aug. 11, 2007]
[edit reason] switch to example.com - it can never be owned [/edit]

jdMorgan

9:47 pm on Aug 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Will this technique permanently inform Google that www.example.com/ and www.example.com/index.htm are exactly the same location and that the PR should be permanently merged?

Yes.

In my first post I said that my index.htm was substituted with a completely unrelated (i.e. default) index.htm. In fact this is not quite the case. Looking again, I see that my index.htm was substituted with a default index.html. I think the index.htm vs index.html issue has confused Google.

The external backlinks mostly point to www.domain.com/ and the internal links point to www.domain.com/index.htm.

So my question is what's the most foolproof way out of this mess?

With your added problem description, you'll need to:

1) Correct all internal home-page links to point to the URL "/".
2) Declare a DirectoryIndex citing only your 'real' index file.
3) Add the code in the second post of the cited thread.

That is, in example.com/.htaccess:


DirectoryIndex index.htm
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?\ HTTP/
RewriteRule ^index\.html?$ http://www.example.com/$1 [R=301,L]

Note that this RewriteRule differs slightly from that posted in the cited thread: It redirects only your site's home page (example.com/index.htm or example.com/index.html), rather than all index.htm/index.html files in *all* (sub)directories. Given both code examples, you should be able to adapt to suit your needs.

Jim

blend27

1:02 am on Aug 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One Big question.

Is there a way for someone to tell the name of default document for the directory or the root?

Lets say I set ksjhdfjshkfkjsdf.html as default document, and never tell anybody or link to it as such.

I mean an OUTSIDER.

jdMorgan

1:23 am on Aug 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are three primary ways the root document could be exposed:
  • An error in the order of execution of your external redirect and internal rewrite directives (mod_rewrite, mod_alias, DirectoryIndex, etc.); If any redirects *follow* the rewrites, then they can potentially expose the internally-rewritten addresses.
  • Errors in the way you handle 403, 404, and 410 errors, etc. I've seen this before, but unfortunately, I cannot remember the exact exposure mechanism.
  • A determined person (or a script) could perform a dictionary attack, requesting any and all possible URLs - A "fishing expedition." Assuming you didn't have measures in place to put a stop to that sort of thing and that you didn't notice all the bogus-URL requests, he/she/it might eventually be successful.

    However, since the root document should always be available by simply requesting "/" there's not much incentive for you to take exceptional measures to hide it, or for anyone else to try to find it... Unless you want to cloak the page, and the hypothetical attacker wants to find your cloaked page(s). However, a version of the "index.htm"-to-"/" redirect code posted above, modified to be sensitive to user-agent strings and/or requestor IP addresses, might come in handy in that case.

    Jim

  • jd01

    6:49 pm on Aug 12, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    By using the 301 Redirect with %{THE_REQUEST} the only way someone should be able to find the directory root (EG index.htm OR MyNewDirectoryIndex.html OR blah.php) is through a hack or an error.

    If they do find the document location (using a method noted above, by Jim), as soon as the document supplying information to the root is requested from the server, your server will send a header stating 'the document requested has been permanently relocated' the new location is http://www.example.com/.

    The requested document will not open for an 'external' (browser, search engine, text reader, etc.) request, but the information will be served from an 'internal' (your server opens the file and serves the information from index.htm OR MyNewDirectoryIndex.html OR blah.php to http://www.example.com/) request.

    Basically it should not matter if someone has the URL of the document supplying the information for the directory index... no one, except your server can open the document anyway. Any other attempt will be redirected to the new location. This includes attempts via inbound links. (Link weight will be passed through one redirect, so you should get SE credit for your inbound links if your .htaccess is structured properly.)

    Justin

    grahame54

    6:48 pm on Aug 15, 2007 (gmt 0)

    10+ Year Member



    Jim

    Many thanks for your code snippet. After switching the RewriteEngine on it worked fine out of the box - which is just as well as my mod_rewrite and regex are (ahem) rusty.

    I notice that prior to the substitution, my site was showing as www.example.com/index.htm in Google's SERPS, but that after the substitution it is listed as www.example.com/ which I think lends weight to the view that Google has become confused about my homepage and has split my PR accordingly. Traffic and SERPS positions remain unaffacted however.

    Anyhow, I owe you a bucketful of beer.

    Thanks

    Graham

    g1smd

    7:07 pm on Aug 18, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    I redirect index.(html?¦php¦asp¦cfm) to "/" for all folders and root, keeping any folder path information intact in the redirect.

    I then don't need to think about which index file filename needs redirecting on the current site. All index filenames get redirected on all sites.