Forum Moderators: Robert Charlton & goodroi
However my site disappeared for good three weeks ago and hasn't come back. I have PR5 on the index page and PR2 on the other pages and all 102 of my pages are indexed.
After reading here for possible solutions I stumbled across the canonical url problem and think it may apply to my site, but before I do anything drastic I'd like some advice! :)
If I search for site:example.com my home page is returned as www.example.com/ and when I click on the cache link it was crawled 31st March.
If I search for site:example.com -inurl:www my home page is returned as example.com/index.html and when I click on the cache it was crawled 29th March.
Also, typing http://example.com/index.html returns PR0 whereas http://www.example.com/ returns PR5.
Am I right in thinking Google thinks both pages are seperate and is penalizing me for duplicate content? If so, what would be my best course of action? And if not, does anyone know why my site keeps disappearing from the SERPs?!
Thanks for any input you can give.
[edited by: engine at 5:04 pm (utc) on April 13, 2007]
[edit reason] Please use example.com [/edit]
Why Does Google Treat "www" & "no-www" As Different? [webmasterworld.com]
I registered the site in Google Webmaster Tools, created a sitemap, set my preference to www and put a 301 redirect with htaccess from non-www to www. Then I sat back and waited...
My site has just today returned to the top two pages for numerous key phrases and I'm hoping it'll be here to stay and hopefully recover its many first page positions. The http://example.com/index.html is still in Google's index but is now a supplemental - I assume it'll drop out naturally in the next few months.
Fingers crossed that'll be the end of the problem. :)
[edited by: engine at 5:05 pm (utc) on April 13, 2007]
[edit reason] Please use example.com [/edit]
My site's on the last page of every search it was ranking well for yesterday. Guess if I'm still having this problem when Google finally drops http://example.com/index.html I'll have to start going through the -950 threads.
While http://example.com/index.html and http://www.example.com/ are still in the index though I have to assume that's the problem and rule that out first (in Webmaster Tools most of my internal pages have two internal backlinks, http://example.com/index.html and http://www.example.com/) :(
[edited by: engine at 5:06 pm (utc) on April 13, 2007]
[edit reason] Please use example.com [/edit]
You need to redirect all versions to www.example.com/ with a 301 redirect. You have done the non-www, now you need to cater for index.html requests. That redirect will test for index.html (and will work for both www and non-www requests) in the request and then preserve the folder names (if there are any) and force the www to be added if it is not already there. The redirect strips the index.html part from the URL. The index.html redirect needs to be placed before the non-www redirect in your .htaccess file in order to avoid a redirection chain.
The other "versions" of your index page will hang around in the index as Supplemental Results for perhaps six months to a year, but that is NOT a problem. If anyone does click those results, then they will still arrive at the correct page of the site via the redirect. I would let Google updates handle this automatically so that PR is correctly calculated and redistributed.
I wouldn't use WMT to do anything at all at this stage, especially not the XML sitemap file stuff.
[edited by: engine at 5:08 pm (utc) on April 13, 2007]
Fix the real problems and then forget about them and go back to site building.
[edited by: SteveWh at 3:39 pm (utc) on April 13, 2007]
Here is my current htaccess file (it also has a 404 redirect as you can see):
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]ErrorDocument 404 http://www.example.com/404.htm
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
ErrorDocument 404 http://www.example.com/404.htm
Thanks. :)
[edited by: engine at 5:08 pm (utc) on April 13, 2007]
[edit reason] Please use example.com, thanks [/edit]
The second one looks like it takes any non-www URL and adds the www to that URL. By then, the redirected "index to /" URLs have already been forced to be www.
That should do the job.
But make sure you test it, and confirm all four possibilities as fixed, by using a HTTP Header Checker of some sort.
For any given URL, non-www with index.html, or www with index.html, or any other non-www URL without index.html, only one of the redirects will be run. This avoids a redirection chain occurring.
Personally, I set things up so that any request for index.html, index.htm, index.php, index.cfm, default.asp, default.cfm, and so on, are all redirected to /. The redirect always preserves any folder names that were present in the originally requested URL too.
ErrorDocument 404 [b]/404.htm[/b]
Jim
Your error document will actually return a 302 response. This is documented on the Apache website. Returning a 302 response is a very bad idea here, as your error page can then be indexed under an infinite number of duplicate content URLs.
This needs fixing before any damage is done.
g1smd & jdMorgan - Thanks for the heads up about the 404, I've now corrected it (as you can tell I'm pretty clueless about all of this stuff!)
BTW g1smd, I didn't use the XML sitemap in WMT, I just did a txt file.
Fix the real problems and then forget about them and go back to site building.
Good idea, think I'll do that now. :)