Forum Moderators: Robert Charlton & goodroi
I've read a fair bit about Google crawling 2 "versions" of a site, one with and one without the "www.". After doing some research I can't seem to get a clear answer as to which "version" to redirect to which, i.e. which is the best to keep in Google and which to effectively trash (and why).
Is there a definitive answer to this?
To get hypertechnical, in certain cases there is no particularly good reason to use the www. Let's say I own example.com. If I have a FTP server at ftp.example.com, and the mail server is mail.example.com, then I SHOULD use www.example.com instead of example.com for the web server to be consistent. However, if all that is at example.com is a website, the www is superfluous.
HOWEVER, in ALL cases if there is a website on example.com (some people have domains used just for e-mail, etc.), then it is positively brain dead not to have www.example.com also resolve. I've seen webmasters who actually have made this error. In my case, I redirect all calls to root to www on my sites, and recommend this practice to all.
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.site\.com [NC]
RewriteRule ^(.*)$ [site.com...] [R=301,L]
Can anyone comment if this is indeed the best way to handle non www?
-Patrick
I wish sites didn't use that since it's just a waste of 4 characters, but since it's what people are used to, I use it on most sites.
Personally, I do it in PHP at the top of the global include file. The mod_rewrite way is more elegant and probably preferred.
<?php
if (!preg_match("/^www.site/i",$_SERVER['HTTP_HOST']) ) {
header("HTTP/1.1 301 Moved Permanently");
if (strlen($_SERVER['QUERY_STRING'])>0)
{
header("location:http://www.site.com$PHP_SELF?$_SERVER['QUERY_STRING']");
}
else
{
header("location:http://www.site.com$PHP_SELF");
}
exit;
}
I'm with you on that, Stinkfoot. But, it would only be true if we knew that Google IS affected by double listings. It's all speculation otherwise. I suppose it's hard to know whether setting up a redirect made any difference unless you later disable it and see what the result is. Has anyone done that?
One of my doubley listed sites is on an IIS server. I have a "DNS Zone Editor" as part of the hosting control panel, and one of the "Record Names" amongst "www" and "ftp" is the wildcard "*". I presume that if I disabled the "*" I would in effect block access to mydomain.co.uk? Not the same as a redirect I realise, but maybe equally effective? Otherwise, how can I redirect on IIS?
I can't get G to show me all the pages they think we have so can not see what pages need to be htaccess removed...
"Sorry, Google does not serve more than 1000 results for any query."
any ideas on how to fix that result number or is it something not to worry about?
So I am assuming G has double content indexed and we are being penalized - as that site does not get any traffic from G at all...
Will putting in this htaccess code sort Google out?:
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^www\.site\.com [NC]
RewriteRule ^(.*)$ [site.com...] [R=301,L]
will G drop the non www pages automatically? - or do I have to set up a "redirect gone" notice for all the non www page to get G to drop them?
No, just redirect them.
It's been said already, but if you permit two copies of each of your pages to exist, then you are relying on Google (and other engines) to post-process their crawl results and "figure out" that you have two (or more) URLs resolving to each page. So this means that if this post-processing fails --or if they don't have time to complete it-- you are risking having your PageRank/Linkpop split across two pages. So why risk it?
The reason for "www" is historic, and goes back to when a company might have two (or more) different servers for their domain -- One for internal private use (non-www) and one for public use (www). This was often done for security reasons and/or because the old servers didn't have the speed and disk capacity of today's machines that can often support dozens or hundreds of virtual servers. Today, most hosting companies use a one-size-fits-all approach, and simply set up both variants to resolve to the same host.
As to which variant you should use, that depends on whether you're setting up a new domain or 'fixing' an old one. If you're setting up a new domain, then pick one based on aesthetics and other factors such as existing branding in print media, etc. -- I think the above-mentioned point about browsers auto-prepending "www" on type-in domains is a good one too, but what's most important is that you pick one version and stick to it.
If you've got an existing domain, and incoming links are split across the www- and non-www variants, then a careful analysis of PageRank and link-popularity passed by those incoming links is in order. Check the PageRank of the pages linking to you, and weight them according to PR and "stability" of the linking site/page -- An incoming link with PR6 that you're sure will be maintained may be better than a link from a PR7 page that you don't expect to last more than a year, for example. Also, factor in the link text on those pages; Again an incoming link from a PR7 with good, specific link text may give you better results than a PR8 link with "good stuff" as the link text. Count the links on those pages that link to you, too; A PR7 page with five outgoing links and one to your site may be better than a PR8 page with one hundred links, one of which goes to your site. You need to do this same kind of analysis with Yahoo and MSN as well, although it's a bit harder without the toolbar PR-meter to give basic guidance; You have to count the incoming links of those pages that link to your site, and guesstimate the "quality" of those links.
Anyway, the question is --and especially for the Google-bashers-- "Do you want to rely on an extra search engine processing step to sort out your canonical domain, or do you want to declare it unambiguously?" If the latter, do a 301 redirect and, over time, get as many of the "wrong" links to your site corrected as possible.
Jim
i'm trying to imagine what caused that error... i can't see the http.conf because it's a reseller account... i'm on an apache server.
I think the guys who say use it, mostly say that coz all their pages are configured that way, and they can't change 'em now.
Seems to me that with a fresh site, just skip them, and put a link on your home page pointing to itself. That'll tell even the dumbest robot what it's meant to prioritize.
Using the code posted in msg #19 above as a basis, simply add the line:
Options +FollowSymLinks
As stated in the error message and in the mod_rewrite documentation [httpd.apache.org], the FollowSymLinks or SymLinksIfOwnerMatch option is required in order to use mod_rewrite.
Jim
BELIEVE ME..I know, I have spent 2 weeks un-doing a bunch of stuff. And my putting .htaccess redirect code in place opened several additional can-of-worms in my hosting co's tech support dept, problems that had to be solved that no one was even aware of, and it all took time even tho I get preferential treatment from them and they worked like crazy to get everything functioning correctly.
After a great deal of additional searching online, forums and Usenet, seems no one has found a reasonable FP and .htaccess redirect workaround.
When in place, the redirect from non-www to www worked just fine, but FP publish didn't and the extensions on the server had to be re-installed several times before we had it all working right in the end. Usually I had to have everything on the servers deleted--everything--and start over like it was a new site and let FP re-build all its quirky special files.
I had done the redirects on a bunch of sites and in some cases I ended up having to rebuild some old sites in newer versions of FP....lots of bother and we all learned a lot.
I know many people poo-poo FrontPage, but honestly, I, by myself, couldn't possibly manage as many sites as I do with out it. I know about bloated code and all that, I just strip the extra code out and just use it for the bare bones stuff I need.
Anyway--there is not much info online about the FP and mod rewrite problerms , so I thought I had better mention it and save people some real troubles.
You can the answer to mod_rewrite-Frontpage extension compatibility problems in the Apache forum, courtesy of members Bumpski and chopin2256. See message #43 of this thread [webmasterworld.com].
Jim
i did have to add the line "RewriteEngine On" to get the redirection to work in the first place, but that's probably not relevant to this issue.
www vs. non-www? test that on any major website and you'll see it redirect to the www version... google, cnn, msn, etc.
front page tip appreciated as well!
Good point about RewriteEngine on. Assuming that there is no other mod_rewrite code already present in .htaccess, the entire code snippet would look like this:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
The first RewriteCond causes the rewrite to be skipped if the HTTP_HOST variable is blank. This can occur if the client (browser or robot) is a true HTTP/1.0 client and does not send the HTTP Host header with each request. Most HTTP/1.0 clients are now "Extended HTTP/1.0" and do send the header, since it's impossible to access most name-based virtual hosts without it. However, without this line, it would be possible for an HTTP/1.0 client (or a malicious user-agent) to cause errors on name-based virtual hosts which were assigned to unique IP addresses by putting the host into an "infinite" redirection loop. It's simply "an ounce of prevention."
The second RewriteCond says, "If the requested host is NOT www.example.com" (possibly followed by an optional port number). Literal periods in the hostname should be escaped as shown. I've changed it so that the match is no longer case-insensitive as well, but that's just a personal preference.
The RewriteRule then rewrites any request for any page in the "wrong" domain to the same page in the "right" domain. The same code can be used to do the rewrite from www to non-www simply by changing the "www.example.com" in both lines to "example.com". So, as before, I'm leaving that part of this discussion open.
Jim
RewriteEngine on
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
The problem is with links that I've already made static (where I've removed the? = , etc.). Using the code above, the? and = returns at the end of the URL when the non-www is used. Any thoughts? Thanks!