homepage Welcome to WebmasterWorld Guest from 54.242.18.232
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
www.example.com and example.com both in Google - is this hurting me?
soccer_star




msg:3300657
 1:25 am on Apr 3, 2007 (gmt 0)

First some background in case it's relevant (please excuse the long post but I want to give as much detail as possible). My site launched in June 2006 and started getting traffic from Google in October. By February I was first page for a number of search terms, but I would disappear from the SERPs completely only to come back a few days later. This was frustrating but I put it down to being a new site.

However my site disappeared for good three weeks ago and hasn't come back. I have PR5 on the index page and PR2 on the other pages and all 102 of my pages are indexed.

After reading here for possible solutions I stumbled across the canonical url problem and think it may apply to my site, but before I do anything drastic I'd like some advice! :)

If I search for site:example.com my home page is returned as www.example.com/ and when I click on the cache link it was crawled 31st March.

If I search for site:example.com -inurl:www my home page is returned as example.com/index.html and when I click on the cache it was crawled 29th March.

Also, typing http://example.com/index.html returns PR0 whereas http://www.example.com/ returns PR5.

Am I right in thinking Google thinks both pages are seperate and is penalizing me for duplicate content? If so, what would be my best course of action? And if not, does anyone know why my site keeps disappearing from the SERPs?!

Thanks for any input you can give.

[edited by: engine at 5:04 pm (utc) on April 13, 2007]
[edit reason] Please use example.com [/edit]

 

tedster




msg:3300702
 2:56 am on Apr 3, 2007 (gmt 0)

You're right to think that it can be a problem. In the "Hot Topics" listed at the top of this forum's home page, you can find several threads that bear on this "canonical" issue - also a duplicate content issue. Perhaps to most specific to your question is:P

Why Does Google Treat "www" & "no-www" As Different? [webmasterworld.com]

soccer_star




msg:3300705
 3:04 am on Apr 3, 2007 (gmt 0)

Thanks tedster, I'll give that a read.

soccer_star




msg:3309036
 3:44 am on Apr 12, 2007 (gmt 0)

I just wanted to give some feedback in case it may help anyone else having the same problem.

I registered the site in Google Webmaster Tools, created a sitemap, set my preference to www and put a 301 redirect with htaccess from non-www to www. Then I sat back and waited...

My site has just today returned to the top two pages for numerous key phrases and I'm hoping it'll be here to stay and hopefully recover its many first page positions. The http://example.com/index.html is still in Google's index but is now a supplemental - I assume it'll drop out naturally in the next few months.

Fingers crossed that'll be the end of the problem. :)

[edited by: engine at 5:05 pm (utc) on April 13, 2007]
[edit reason] Please use example.com [/edit]

soccer_star




msg:3310266
 1:56 pm on Apr 13, 2007 (gmt 0)

...and now I'm gone again, lol.

My site's on the last page of every search it was ranking well for yesterday. Guess if I'm still having this problem when Google finally drops http://example.com/index.html I'll have to start going through the -950 threads.

While http://example.com/index.html and http://www.example.com/ are still in the index though I have to assume that's the problem and rule that out first (in Webmaster Tools most of my internal pages have two internal backlinks, http://example.com/index.html and http://www.example.com/) :(

[edited by: engine at 5:06 pm (utc) on April 13, 2007]
[edit reason] Please use example.com [/edit]

g1smd




msg:3310310
 2:35 pm on Apr 13, 2007 (gmt 0)

You have four URLs for your root index page: both www and non-www, each with / and index.html.

You need to redirect all versions to www.example.com/ with a 301 redirect. You have done the non-www, now you need to cater for index.html requests. That redirect will test for index.html (and will work for both www and non-www requests) in the request and then preserve the folder names (if there are any) and force the www to be added if it is not already there. The redirect strips the index.html part from the URL. The index.html redirect needs to be placed before the non-www redirect in your .htaccess file in order to avoid a redirection chain.

The other "versions" of your index page will hang around in the index as Supplemental Results for perhaps six months to a year, but that is NOT a problem. If anyone does click those results, then they will still arrive at the correct page of the site via the redirect. I would let Google updates handle this automatically so that PR is correctly calculated and redistributed.

I wouldn't use WMT to do anything at all at this stage, especially not the XML sitemap file stuff.

[edited by: engine at 5:08 pm (utc) on April 13, 2007]

spina45




msg:3310366
 3:25 pm on Apr 13, 2007 (gmt 0)

- g1smd

> now you need to cater for index.html requests.

Do you have example code for this? Thank you very much!

SteveWh




msg:3310380
 3:32 pm on Apr 13, 2007 (gmt 0)

Also, the day-to-day fluctuations in Google for your site are meaningless. IMO, watching them closely on a daily basis isn't time well spent. It is extremely unlikely that any change you see in your SERPs tomorrow has anything to do with a change you made today, and lots of people have wasted lots of time chasing after those fluctuations thinking that they were doing useful tweaking.

Fix the real problems and then forget about them and go back to site building.

[edited by: SteveWh at 3:39 pm (utc) on April 13, 2007]

notme




msg:3310397
 3:53 pm on Apr 13, 2007 (gmt 0)

g1smd,
now you need to cater for index.html requests

How you do that? Please give us a code.

I found some examples on different forums but for some reason they do not work correctly.
Please help.

g1smd




msg:3310421
 4:19 pm on Apr 13, 2007 (gmt 0)

The code is posted in the Apache forum here at WebmasterWorld several times each month.

Check it out.

jdMorgan




msg:3310470
 4:55 pm on Apr 13, 2007 (gmt 0)

One example: [webmasterworld.com...]

Jim

soccer_star




msg:3310471
 4:56 pm on Apr 13, 2007 (gmt 0)

g1smd - Thanks for the advice.

Here is my current htaccess file (it also has a 404 redirect as you can see):


Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

ErrorDocument 404 http://www.example.com/404.htm


After doing a search as you suggested I found the following redirect:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]

So to follow your suggestion, my htaccess file now looks like this:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
ErrorDocument 404 http://www.example.com/404.htm

Could you please confirm that would work before I upload it?

Thanks. :)

[edited by: engine at 5:08 pm (utc) on April 13, 2007]
[edit reason] Please use example.com, thanks [/edit]

g1smd




msg:3310477
 5:00 pm on Apr 13, 2007 (gmt 0)

The first redirect looks like it takes any URL containing index.html and strips that off while forcing a www on to the front whether it had one before or not.

The second one looks like it takes any non-www URL and adds the www to that URL. By then, the redirected "index to /" URLs have already been forced to be www.

That should do the job.

But make sure you test it, and confirm all four possibilities as fixed, by using a HTTP Header Checker of some sort.

For any given URL, non-www with index.html, or www with index.html, or any other non-www URL without index.html, only one of the redirects will be run. This avoids a redirection chain occurring.

Personally, I set things up so that any request for index.html, index.htm, index.php, index.cfm, default.asp, default.cfm, and so on, are all redirected to /. The redirect always preserves any folder names that were present in the originally requested URL too.

jdMorgan




msg:3310488
 5:11 pm on Apr 13, 2007 (gmt 0)

Also be aware that your ErrorDocument syntax is incorrect, and will generate a 302-Found response, not a 404. The correct syntax is:

ErrorDocument 404 [b]/404.htm[/b]

See Apache ErrorDocument [httpd.apache.org] and read the notes for details.

Jim

g1smd




msg:3310489
 5:13 pm on Apr 13, 2007 (gmt 0)

>> ErrorDocument 404 http://www.example.com/404.htm

Your error document will actually return a 302 response. This is documented on the Apache website. Returning a 302 response is a very bad idea here, as your error page can then be indexed under an infinite number of duplicate content URLs.

This needs fixing before any damage is done.

soccer_star




msg:3310521
 5:45 pm on Apr 13, 2007 (gmt 0)

I checked all four possibilities were fixed using a HTTP Header and they're fine.

g1smd & jdMorgan - Thanks for the heads up about the 404, I've now corrected it (as you can tell I'm pretty clueless about all of this stuff!)

BTW g1smd, I didn't use the XML sitemap in WMT, I just did a txt file.

Fix the real problems and then forget about them and go back to site building.

Good idea, think I'll do that now. :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved