homepage Welcome to WebmasterWorld Guest from 107.20.34.144
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
What to do when Google has both www & non-www pages listed?
I thought Google would be smarter than this!
crazybrain

5+ Year Member



 
Msg#: 29815 posted 6:20 pm on Jun 7, 2005 (gmt 0)

First question to webmasterworld - I have been snooping for a while!

My site stats down 50% this month and of course it hurts a little. Most of this is due to Google stats dropping dramatically.

I have done some investigation and read alot about the recent Bourbon update. There seems to be a lot of focus on yourdomain.com/ redirect to www.yourdomain.com/

I did a search in Google and about 15% of my pages have a Google listing of both www and non-www. For example the same page is listed as www.example.com/page.html and example.com/page.html with a slightly older title for the non-www example.

I would think the google algorithm should be smart enough to determine that these are indeed the same pages - uggh. The only thing I can see why it does not is they have slightly different cache results (I updated the pages and changed the content slighty).

My first thought is give Google some time to fix it on its own - maybe this is naive on my part. So I tried some things and got into more of a mess.

Has anyone else seen a similar issue - if so how did you resolve it. I tried to update my root .htaccess file with the following syntax and it generates an 404 error when I access the non-www URL

RewriteEngine on
RewriteCond %{HTTP_HOST} ^mydomain\.com [NC]
RewriteRule ^(.*)$ [mydomain.com...] [L,R=301]

Has anyone using Aplus webhosting successfully redirected their non-www requests to www?

Also my my domain does not have a duel listing - it is only certain pages. I currently use relative linking - meaning I do not use the full URL in my html pages. If I change all my internal pages to full URL linking would this also solve the problem?

 

Stefan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 29815 posted 8:51 pm on Jun 7, 2005 (gmt 0)

Welcome to WW.

You might want to check the syntax of the redirect at the Apache forum. It's somewhat different than what I'm using successfully.

I would think the google algorithm should be smart enough to determine that these are indeed the same pages - uggh. The only thing I can see why it does not is they have slightly different cache results (I updated the pages and changed the content slighty).

It saw different content when it crawled the two different URL versions, at the two different times. That's likely part of the problem - it correctly recorded both of the different pages, (as it saw things).

My first thought is give Google some time to fix it on its own

You might wait an unacceptably long time. It's an easily solved problem, so best to take care of it now.

Absolute linking is the way to go, yes, but get the 301 happening first.

Barleycorn

10+ Year Member



 
Msg#: 29815 posted 8:59 pm on Jun 7, 2005 (gmt 0)

I once had a site temporarily kicked out of G for the same thing. The first thing I did to correct the problem was do the 301 redirect. After that I went through all my incoming links and low and behold found 2 backlinks pointing to my non-www. address. I emailed the other webmasters and had them point the links to the correct url.

About after a month I believe, the site was crawled and reincluded in the Google index.

Good luck!

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 29815 posted 12:41 am on Jun 8, 2005 (gmt 0)


RewriteCond %{HTTP_HOST}!^www\.example\.com [NC]
RewriteCond %{HTTP_HOST}!^$
RewriteRule ^(.*) http://www.example.com/$1 [L,R=301]

Please note the forum software may mangle this.

This is valid only for servers on port 80 (the default).

crazybrain

5+ Year Member



 
Msg#: 29815 posted 10:46 am on Jun 8, 2005 (gmt 0)

Thanks for the help. I got the .htaccess working so that all non-www redirects to www. Also Google still is hitting my site, in fact I had 6 new pages added to the index yesterday.

Since the redirect is now working, is it still neccessary to change all links on might site to full URL links? (i.e. href=http://www.mydomain.com/examplepage.html instead of examplepage.html)

Seems the redirect should fix the problem. But I can't see any harm in changing the links, so maybe I will go ahead and change them anyways.

wiseapple

5+ Year Member



 
Msg#: 29815 posted 2:30 pm on Jun 8, 2005 (gmt 0)

After doing the redirect from non-www to www --- Someone mentioned in an earlier post that it is important to keep at least one non-www url so that googlebot can discover the change and respider.

Has anyone done this?

Where do you put the one non-www?

Link to one page with non-wwww from the homepage?

Basically, is this a good idea or not?

Thanks.

crazybrain

5+ Year Member



 
Msg#: 29815 posted 3:39 pm on Jun 8, 2005 (gmt 0)

Interesting thought wiseapple.

It was my understanding that Google would eventually revisit indexed pages even if no links are still pointing to the page. If not - then Google would get bloated with old stale pages.

But you may have a point because we all know that a site with without backlinks will not not get picked up by Google.

I also would like to see an answer to wiseapple's question?

wiseapple

5+ Year Member



 
Msg#: 29815 posted 4:15 pm on Jun 8, 2005 (gmt 0)

If Google did in fact index the www and non-www, this could be the reason for inflated page counts.

Example: Our site has about 20,000 pages... Google reports that we have 80,000 pages. Not sure why.

Keeping a link to non-www maybe a key to getting Google back to correct page counts.

Anyone have any thoughts?

Thanks.

Stefan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 29815 posted 9:31 pm on Jun 8, 2005 (gmt 0)

Someone mentioned in an earlier post that it is important to keep at least one non-www url so that googlebot can discover the change and respider.

When G visits the non-www URL that it has in its index, it will immediately discover the change because of the 301 redirect. When it visits the www URL version of the page, (assuming it has both versions listed), it will find what it was looking for. I can't see any need to leave non-www URL's to help it find things.

crazybrain

5+ Year Member



 
Msg#: 29815 posted 11:06 pm on Jun 8, 2005 (gmt 0)

Stefan, are you confirming that Google will re-visit non-www listings even if nothing is linking to the non-www pages?

Stefan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 29815 posted 11:33 pm on Jun 8, 2005 (gmt 0)

Stefan, are you confirming that Google will re-visit non-www listings even if nothing is linking to the non-www pages?

If the non-www versions are indeed listed by G, and Y, (presumbably because of a bad incoming link in the past), the bots will continue to look for those pages. If the server is configured so that both www.example.org and example.org lead to the index/default page, then the bots will always get a 200 and continue to list both versions of the page, even with no existing incoming links to the non-www URL, (once it's in, it's in, unless you tell the bots otherwise - G likes to boast about how many pages they have listed, and that includes phantom pages like what we're talking about). The non-linked page will have no PR, so it should always be over-ridden by the linked-to version in the serps, but it will continue to exist in the index. Of course, if someone happens to link to the page with the non-www URL in the future, from a high-ranking PR page, then you're back into it all again.

When you do the 301 in the .htaccess, the bots will continue to look for a while. G catches on fairly quickly, a month or so, but Y is very poor at correcting things, (doesn't like 301's, or something). But, yes, G will continue looking for them for a period, even with the 301, assuming that they are actually in their index.

I went through this a couple of years ago and saw it in action. Fortunately, I figured out what was happening before it had spread too far through the site following relative links. Changed all my internal links to absolute at the same time as I did the 301 in the mod rewrite. Affected maybe 5 pages out of 250, but the first one to go was the index/default page. It was all because of one link to the home page without the www, (ironically from a research centre I'm a director of). It took G a month or two to have it all sorted out, and since then no problems.

<edit>Clarification</edit>

Vimes

10+ Year Member



 
Msg#: 29815 posted 1:43 am on Jun 9, 2005 (gmt 0)

HI,

i've got a similar problem.
instead of duplicate pages being h*tp://mysite.com i've got.

https://www.mysite.com and h*tp://www.mysite.com.

now i've tracked the offending link down and sent the webmaster a note asking for the change.

But how can i stop the bots crawling my secure pages? with out these i can't make a living and if Gbot hits my site with a duplicate filter and drops me from there index i'm in the same boat. G has been indexing my my site with page total bloat for a while and i couldn't figure out where it was coming from until yesterday morning.

Vimes.

Stefan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 29815 posted 2:09 am on Jun 9, 2005 (gmt 0)

Vimes, that's a bad one, eh? It's not like you want to eliminate traffic to those URL's - you can't redirect them like the canonical www stuff...

I have vague recollections of reading threads addressing that problem in the Apache forum. If no one jumps in here, perhaps you could try there. Check the robots.txt forum too. There's a way to do it, somehow.

ilikeyou

5+ Year Member



 
Msg#: 29815 posted 5:52 am on Jun 9, 2005 (gmt 0)

I went through this problem for a good six months ..(lost hundreds of visitors.. only 30 pages indexed on a 50,000 page megasite).

The site had external links to both www.domain.com and domain.com. All pages were valid html, not one peice of javascript anywhere, or any of that nonsense.

I set up a 302 redirect in htaccess and it eventually solved the problem. I set all pages to go to non www..i.e. www.domain.com was redirected to domain.com

So 30 pages were indexed mixed with www.domain.com and domain.com for several months.. but then after setting up a 302 redirect for all www's to point to [domain.com...] things got better.

But it's fairly random, on whether or not your site will get indexed soon, even if you take this htaccess step. Google is not predictable. You could wait 2 weeks or 4 months for things to get better.

I set up redirects and didn't change a thing for several months.. then finally about 4 months later things started getting indexed. In fact, I'm not even sure if the htaccess 302 redirect had anything to do with it - but that's what I did.

Having 30 pages indexed on a 50,000 page megasite is quite annoying though - and again I don't think it's due just to 302 redirects but all sorts of bizarre random things that are completely unpredictable.

The thousands of pages that did get finally indexed, were all old pages from 4 months ago. When checking google's cache, I notice all the pages were 4 months old.. so it's as if the googlebot hit me up 4 months ago, saved all the data.. then 4 months later posted the data up on the net.

ilikeyou

5+ Year Member



 
Msg#: 29815 posted 6:06 am on Jun 9, 2005 (gmt 0)

If I said 302 anywhwere, actually I meant 301. All work I did was always with 301 redirects, not 302.

By the way guys/gals, regarding incoming links:

On the megasite I discussed above.. there are still several links out there that point to www and non-www.

Having the 301 redirect should really set things straight, because I still have both www and non-www links all sorts of places. There is not one www.domain.com result listed in the search results now that I have a 301 setup in htaccess.

So basically what I'm saying is, you should be able to leave your www.domain.com and [domain.com...] links up.. as long as you have a 301 redirect, it will eventually be figured out (may take even 6 months, or 1 week).

The fact is, some people will decide to reference your website as www, and some won't .. no matter what steps you take (if you have people referring your website).

So it's best to build a solution that works no matter what links are out there. From my experience, eventually, a 301 redirect with htaccess solved my problem, or at least it appeared to. No more www links appear in the results, which is what I wanted. But again, it did take several months, and results are always random, bizarre and unpredictable with google.

From my experience, there is no better URL to choose.. i.e. www.domain.com is no better than [domain.com...]
I chose [domain.com...] because shorter is better.. Some people say that google likes www. better.. but from my experience google likes [domain.com...] better .. So basically, it's split 50/50.. anything you hear is just a rumor. Either www.domain.com or domain.com will work.

Here is the code from my htaccess that solved my problems..:

(if you don't have apache, you'll have to look into other methods.)

#redirect all [www....] to http:// (trim www) # NC stands for nocase
rewriteEngine On
rewriteCond %{HTTP_HOST} ^www\.domain\.com$ [NC]
rewriteRule ^(.*)$ [domain.com...] [R=301,L]

Armi

10+ Year Member



 
Msg#: 29815 posted 12:09 pm on Jun 14, 2005 (gmt 0)

Iīve got the same problem with duplicate content on [domain.com...] .

Iīve solved this for a long time with a 301-redirect.

But: Thousand of URLs are still in the index, because they have no backlinks and because of this Google donīt visit this URLs to get this information.

Itīs a good idea to put this URLs into a Google Sitemap?

I think that Google will visit this URLs and will get the 301-information!?!?

ramachandra

5+ Year Member



 
Msg#: 29815 posted 2:52 pm on Jun 14, 2005 (gmt 0)

Can anyone help in creating .htaccess file for my website with 301 redirect for non-www to www, like I want to know what syntax I have to write and where this file should be uploaded on the server.

One more thing I want to know that apart from creating .htaccess file for redirecting non-www to www.domainname.com is there any other method which can be fixed this problem.

ramachandra

5+ Year Member



 
Msg#: 29815 posted 2:58 pm on Jun 14, 2005 (gmt 0)

Hello ILikeyou,

Is this code can be copied and pasted in .htaccess file and changing the domain to mysitename will work?

(#redirect all [www....] to http:// (trim www) # NC stands for nocase
rewriteEngine On
rewriteCond %{HTTP_HOST} ^www\.domain\.com$ [NC]
rewriteRule ^(.*)$ [domain.com...] [R=301,L] )

(if you don't have apache, you'll have to look into other methods.)

If there is no apache server available what will be other method.

Thanks

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 29815 posted 6:09 pm on Jun 15, 2005 (gmt 0)

I fixed a case where Google continued to list hundreds of non-www URLs that were now redirected to the www version, simply by building a normal sitemap listing them all, and then putting that fake sitemap on another site. It took Google about a month to remove all of the URLs from their index.

Richie0x

10+ Year Member



 
Msg#: 29815 posted 2:51 am on Jul 1, 2005 (gmt 0)

My webhost decided long ago not to allow their clients to use .htaccess. They say the restriction is for security reasons but after years of trying to persuade them, I'm beginning to think they do it purely to annoy the heck out of me.

Since all the pages on my site are PHP files, and they all include the same txt file for the navigation etc, I thought I could use PHP to solve the problem. This is the code I came up with, I've tested it and it works, but could you tell me if it's okay to use and works exactly the same as the htaccess version?

<?
$server = $_SERVER["SERVER_NAME"];
$page = $_SERVER["REQUEST_URI"];
if ($server == "domain.com") {
header("HTTP/1.1 301 Moved Permanently");
header("Location: [domain.com$page");...]
}
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>Blah blah blah</title>
ETC ETC ETC

Nitrous



 
Msg#: 29815 posted 1:46 pm on Jul 1, 2005 (gmt 0)

Could someone who understands this stuff reverse this for me so everything goes to www.domain.com please! I cannot make it work!

(#redirect all [www....] to http:// (trim www) # NC stands for nocase
rewriteEngine On
rewriteCond %{HTTP_HOST} ^www\.domain\.com$ [NC]
rewriteRule ^(.*)$ [domain.com...] [R=301,L] )

Thank you!

claus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 29815 posted 10:44 pm on Jul 1, 2005 (gmt 0)

Richie0x I believe your php header example will work exactly like it should, and is equivalent to a solution using the ".htaccess" file.

Nitrous:

------------------------------- 
rewriteEngine On
rewriteCond %{HTTP_HOST} .
rewriteCond %{HTTP_HOST} !^www\.domain\.com$ [NC]
rewriteRule ^(.*)$ [domain.com...] [R=301,L]
-------------------------------

The first line with "HTTP_HOST" checks to see if there is a "host" field. If there is no such test and there is no host field, then your server will go into a loop.

guitaristinus

10+ Year Member



 
Msg#: 29815 posted 11:17 pm on Jul 1, 2005 (gmt 0)

crazybrain,

I don't worry about whether the domain has a www or not. Surely all those smart people at Google can get their computers to see that the domain is the same and not have a site rank lower because it has a www or not.

Stefan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 29815 posted 2:53 am on Jul 2, 2005 (gmt 0)

I don't worry about whether the domain has a www or not. Surely all those smart people at Google can get their computers to see that the domain is the same and not have a site rank lower because it has a www or not.

It's not the same. WWW is a subdomain and G sees it that way. It can cause problems having two versions of your website indexed. This isn't always the case - it depends on incoming links and the PR value of them, and who knows what else. But you shouldn't so blithely dismiss other people's concerns with this issue. Better safe than sorry, and best to clean up any potential canonicalization problems, (hope I spelled that right).

SEOtop10

10+ Year Member



 
Msg#: 29815 posted 4:56 am on Jul 2, 2005 (gmt 0)

G1smd, you mentioned that you put the fake sitemap on another domain. How could you do that?

Google insists that the sitemap (xml or txt one) has to be on the highest level directory in the same domain.

guitaristinus

10+ Year Member



 
Msg#: 29815 posted 8:20 am on Jul 2, 2005 (gmt 0)

I have a page that is indexed with and without the www. It ranks #1 for a search that has 1,390,000 results.

crazybrain

5+ Year Member



 
Msg#: 29815 posted 12:45 pm on Jul 2, 2005 (gmt 0)

Just to clarify things - I did email Google and they did respond. When google has non-www and www pages listed you should create a 301 redirect in your .htaccess file and change all your links to http://www.example.com/page.html instead of /page.html

It took about 3 weeks but eventaully Google recognized my 301 redirect. Now when I check my site:www.example.com in Google I get the same results as site:example.com

And my stats are steadily improving. When this change happened I did see an increase in visits from Google.

larryhatch

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 29815 posted 1:06 pm on Jul 2, 2005 (gmt 0)

Hello guitaristinus:

With SERPs like you describe, I'd be hesitant to make changes too.
On the other hand, you could actually benefit by consolidating PR.
Which URL shows up first for your kw(s) in that particular search?
Is it the www. one, or the non-www URL? That's important.
IF they BOTH show up, I would strongly consider pointing (301) the weaker one
to the higher ranking URL. Just my 2 cents worth. -Larry

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved