How to make Google see example.com instead of www.example.com

Forum Moderators: open

Message Too Old, No Replies

How to make Google see example.com instead of www.example.com

dup content worries

MagicDaddy

3:01 pm on Jul 22, 2004 (gmt 0)

Hello

Google sees both www.example.com and example.com, When i type in site:example.com google shows the domain but also has this below:

"In order to show you the most relevant results, we have omitted some entries very similar to the 1 already displayed. If you like, you can repeat the search with the omitted results included."

when i click that it then proceeds to show me the www and the non www example domains on top of one another, Im really concerned this may trip some dup content filter.

Right now when i type example.com and www.example.com seperatly both names stick in the browser and i can browse without either changing, I really want for google and for people visiting to only see the example.com version. Is there some way using htaccess perhaps to force google to only see example.com and to enforce people who type or visit www.example.com to be redirected to example.com only?

Thanks for your time
Mark

DaveAtIFG

3:55 pm on Jul 22, 2004 (gmt 0)

Technically speaking, www.example.com is a subdomain of example.com. Google differentiates and often indexes both, usually because they find links to both I suspect.

You can use a 301 redirect to correct this for ALL SEs, and it will also consolidate your PR which is probably split between the domains presently.

I use this code in my .htacces file to accomplish the redirect:

RewriteEngine on
RewriteCond %{HTTP_HOST} !^example\.com
RewriteRule ^(.*)$ http://example.com/$1 [R=301]

If memory serves, it can take several months for Google to get this all sorted out. Using the Server Header Check [webmasterworld.com] is a convenient way to verify that your redirect is doing what you intend.

MagicDaddy

4:46 pm on Jul 22, 2004 (gmt 0)

That worked a treat, Any attempt to access www version now redirects correctly, Thankyou for your help Dave!

rfgdxm1

10:33 pm on Jul 22, 2004 (gmt 0)

Can anyone give the correct .htaccess file to redirect all requests to example.com to www.example.com? Also, do .htacess files get loaded at the root directory or public_html?

DaveAtIFG

1:08 am on Jul 23, 2004 (gmt 0)

Use:

RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

Normally, .htaccess should reside in your site root, in other words in the same directory as your default index page.

rfgdxm1

1:43 am on Jul 23, 2004 (gmt 0)

Thanks DaveAtIFG. Been wanting to do this for a while, but hadn't figured it out before. :) Just upload as a .txt file, and rename that as .htaccess. My server hides that from view, so I'll have to save the .htaccess files locally if I ever want to change them. This should help stop any SE bots from getting confused. Doubt any user will care if entering example.com redirects to www.example.com.

jdMorgan

2:08 am on Jul 23, 2004 (gmt 0)

MagicDaddy,

Note the [R=301,L] flag in the RewriteRule in DaveAtIFG's second code post - You should use that form as well.

rfgdxm1,

One of the other advantages of doing this kind of redirect is that it makes it a lot harder for other webmasters to link to you at the "wrong address." Since most of us cut-n-paste, the fact that the browser address bar gets "corrected" can help a lot.

Jim

No5needinput

3:38 pm on Jul 23, 2004 (gmt 0)

I would also like to redirect www to non www

I have a .htaccess file with this in it

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName mysite.fp.domain.net
AuthUserFile /home/sc81/www/_vti_pvt/service.pwd
AuthGroupFile /home/sc81/www/_vti_pvt/service.grp

(whatever that is)

Can I just add the code to this file as well? If so should I add it to the top or bottom or does'nt it matter?

Thx.

DaveAtIFG

5:06 pm on Jul 23, 2004 (gmt 0)

I don't think it matters as long as you don't put it in the middle of your "front page code block." I have added it after the FrontPage code ON SITES THAT DON'T USE THE FRONTPAGE EXTENSIONS with no ill effects.

CAUTION: IF YOU ARE USING THE FRONTPAGE EXTENSIONS IN YOUR PAGES, TO PROCESS A FORM FOR EXAMPLE, THESE MOD_REWRITE DIRECTIVES WILL NOT WORK AS ADVERTISED.

No5needinput

7:36 pm on Jul 23, 2004 (gmt 0)

I don't use frontpage extensions at all on the site.

I added the code to the top of the .htaccess file and uploaded it but I got a "page cannot be displayed" for both www and non www urls.

Removed the code and uploaded .htaccess again and pages could be accessed again.

Any ideas?

jdMorgan

7:51 pm on Jul 23, 2004 (gmt 0)

Are you using an HTTP/1.0 client? If so, you'll need to add another RewriteCond as shown below to prevent looping:


RewriteEngine on
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

Also, note that the space preceding the "!" is required. Posting on this forum removes them, leading to problems.

Jim

DaveAtIFG

10:56 pm on Jul 23, 2004 (gmt 0)

Depending on how your server is configured, some sites may require an additional directive:

Options +FollowSymLinks
RewriteEngine on
...

g1smd

11:21 pm on Jul 23, 2004 (gmt 0)

And, of course, all the above discussion applies to Apache webservers.

Maybe someone would be good enough to post the equivalent code for those few people out there that are using IIS instead.

I use Apache too.

toddb

12:42 am on Jul 24, 2004 (gmt 0)

Is there a lot of strain on the server by causing it to look at this .htaccess all the time? If I just showed my IQ please ignore this... heheh

dirkz

4:41 pm on Jul 24, 2004 (gmt 0)

> Is there a lot of strain on the server by causing it to look at this .htaccess all the time?

Yep. It looks for the .htaccess in the directory where the requested file resides and then looks in all parent directories and processes them all.

But most servers are configured to do this anyway, so it doesn't make a difference.

In the end it's only peanuts ...

metatarsal

4:53 pm on Jul 24, 2004 (gmt 0)

I'd like to redirect to wwww.mydomain.com (that's four w's)

Why?

Because I'm a blo*dy idiot - and I have as much chance of Google picking this up during my lifetime, as I have of being struck down by a meteorit...

Aghhhhhhhhh....

(dying breath: don't do it!)

Marcia

5:11 pm on Jul 24, 2004 (gmt 0)

>>I have as much chance of Google picking this up during my lifetime, as I have of being struck down by a meteorit...

I happen to know someone whose husband was struck by lightning, so don't give up hope! wwww is just a subdomain, sure they can pick it up. They could even pick this up as a subdomain:

Aghhhhhhhhh

;)

Googlebot will pick up anything that can't run away on its own two feet.

glitterball

5:19 pm on Jul 24, 2004 (gmt 0)

The ASP (IIS) version of this is discussed here:

[webmasterworld.com...]

balam

7:30 pm on Jul 24, 2004 (gmt 0)

jdMorgan...

> Are you using an HTTP/1.0 client? If so, you'll need to add another RewriteCond [...]

Um, why?

> RewriteCond %{HTTP_HOST} .

This is saying "If there's a HTTP_HOST header AND [more rewrites...]", correct?

When trying to figure out what this looping condition is about, I did notice this in RFC 2616 (and 1945 & 2068):

Note: When automatically redirecting a POST request after
receiving a 301 status code, some existing HTTP/1.0 user agents
will erroneously change it into a GET request.

This has nothing to do with my question, right? (And can't be controlled at the server-side, I should think...) Actually, that seems to be a stupid question, but I just thought it was something worth noting...

jdMorgan

7:41 pm on Jul 24, 2004 (gmt 0)

>> Are you using an HTTP/1.0 client? If so, you'll need to add another RewriteCond [...]

> Um, why?

HTTP/1.0 clients do not provide a Host: request header.

Therefore, the %{HTTP_HOST} environment variable will be empty if the request comes from or through an HTTP/1.0 client or proxy.

If the condition "RewriteCond %{HTTP_HOST} !^your_hostname_here" is used alone without checking for the existence of a hostname first, then that condition will match a blank hostname, and the redirect will occur for every request. The client and server will end up in a 301 redirect loop, until one or the other reaches its redirection limit.

Jim

Marcia

9:22 pm on Jul 24, 2004 (gmt 0)

BTW, I once brought a whole Front Page site down for a day by putting something into .htaccess - by itself, separate from those proprietary entries.

Is the mod_rewrite directive better and safer to use (for Ink/Yahoo) than just Redirect 301 with mod_alias?

jdMorgan

9:52 pm on Jul 24, 2004 (gmt 0)

In cases where several domains and subdomains all point to the same Web root, Redirect 301 won't work, because requests for *all* domain name variants will go to the same place, use the same .htaccess file, and get redirected. So you get a loop. Redirect 301 only works if the requested URL and the redirect URL resolve to different spaces (either domain, directory, or URL-path must be different & unique enough to avoid the loop).

The advantage of mod_rewrite is that it can test the requested Host HTTP header, and redirect only if it is "wrong" -- i.e. not the one you want to be known by. The problem comes in where a true HTTP/1.0 client is involved in the transaction. In that case, no Host header is sent with the request and as a result, there is nothing there for mod_rewrite to test. So the above code modification prevents a loop in that case.

The Host header was added in HTTP/1.1 in order to support shared IP addresses. Without the Host header, the server has no idea which virtual server to steer the request to. As a result, HTTP/1.0 is almost dead today, because it won't work with shared IP addresses. But there are still a few holdouts.

It is one thing if you're on a ahared IP and HTTP/1.0 requests can't access your site. But it's potentially worse if HTTP/1.0 requests *can* access your site but your code makes them to go into a loop and possibly crash your server!

Note that although Google and several other search engines "advertise" in access logs as HTTP/1.0 clients, they *do* send a Host header, and they do support many HTTP/1.1 functions. I think they advertise as HTTP/1.0 in order to at least function with servers that don't support HTTP/1.1, while not giving up all of the advantages of HTTP/1.1

Jim