homepage Welcome to WebmasterWorld Guest from 54.235.16.159
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / WebmasterWorld / New To Web Development
Forum Library, Charter, Moderators: brotherhood of lan & mack

New To Web Development Forum

This 62 message thread spans 3 pages: 62 ( [1] 2 3 > >     
Canonical problems?
How does this happen?
miki99




msg:3092991
 2:27 pm on Sep 22, 2006 (gmt 0)

I've been looking into why my site took a steep nosedive in the Google rankings recently (after Sept. 15), and just discovered that I have many pages listed with www at the beginning of the URL, and many more not beginning with www.

How much of a problem is this, how does it happen, and what can I do about it? Please keeping in mind that until a couple of days ago, I never heard of "canonical problems," and am not even really sure what the term means (though I did Google it, and ended up back on another wemasterworld forum, where the discussions were over my head). :-)

Many thanks for any and all enlightenment.

 

Quadrille




msg:3093044
 3:04 pm on Sep 22, 2006 (gmt 0)

Do a search for 301 redirects.

It's a big problem; essentially, SEs see www.domain.com and domain.com as two different sites. This therefore divides your ranking between two domains, and induces Google to list half the pages as 'supplementary' results. And if other issues apply (such as identical meta tags), pages could be dropped completely.

It does not apply to all sites - if a site has always used one form (www.domain.com or domain.com), and no internal or external links exist to the alternate, then peace reigns.

But links to both forms precipitates the problem; Google follows the link, respiders the site, and adds it to the index. Then discovers it duplicates the other form, and problems go on from there.

The solution is alarmingly simple: Choose one - the convention is www.domain.com but the choice is yours - and '301 redirect' from the other to the chosen domain.

[edited by: Quadrille at 3:06 pm (utc) on Sep. 22, 2006]

miki99




msg:3093096
 3:45 pm on Sep 22, 2006 (gmt 0)

Thank you very much, Quadrille. Unfortunately I know nothing about 301 redirects as yet, but I see there are many articles about them on the Web, and I'm currently reading one of them.

I have been renaming files and reorganizing my directory structure a lot, so I guess I've discovered part of the problem.

I have never done anything with the .htaccess file directly, but my website's Cpanel provides a form for doing redirects -- this I assume will accomplish the same thing?

I see a 301 is a "permanent redirect." So far I've only done "temporary redirects" even though the moves were permanent -- maybe that was a mistake too? (Duh!)

"It does not apply to all sites - if a site has always used one form (www.domain.com or domain.com), and no internal or external links exist to the alternate, then peace reigns."

I have been trying to wrap my brain around this issue but am still not getting it. Nearly all my internal links are root-relative URLS. The only exceptions are some absolute url links to main sections of my site in a footer at the bottom of each page -- PLUS --

--until recently, all the links in my site directory were absolute. I changed them all to root-relative just to make things simpler for myself. By any chance, could this alone account for my website's nosedive?

If so, would creating and submitting a proper Google sitemap help correct this or make it worse?

As for external links, I don't see how I have much control over whether other sites link to me using www or no-www?

Welcome to the (mis)adventures of a newbie. :-)

miki99




msg:3093168
 4:35 pm on Sep 22, 2006 (gmt 0)

One more question, sorry.

"The solution is alarmingly simple: Choose one - the convention is www.domain.com but the choice is yours - and '301 redirect' from the other to the chosen domain."

Do you mean I should do a separate redirect for every page of my site that is listed w/o www, say, to point to the www version?

Quadrille




msg:3093171
 4:36 pm on Sep 22, 2006 (gmt 0)

Don't worry about how it happened; it's happened, and the 301 permanent redirect will sort it out (it's so common and such an easily fixed problem, that most SEOs would suggest doing it for every site, 'just in case'.

Your hosts cpanel may do the job (mine does) but you need to be sure, as not all will do a 301 - if in doubt, ask them.

You may or may not have other problems, but you know you have this one, and others will be easier to find once this is out of the way. Having said that, I'd certainly replace any other redirects you have with 301s. Redirects are often a source of problems, and you've nothing to lose by being safe.

You'd do well to check all your site navigation - look up xenu; it will become your best friend, once you've learned to stop it giving too much information!

A Google site map is worth doing for two reasons: 1. it does what it says on the packet and 2. It demands you sort out your site if it encounters problems, so it can be a valuable fault-finder.

[edited by: Quadrille at 4:43 pm (utc) on Sep. 22, 2006]

Quadrille




msg:3093176
 4:38 pm on Sep 22, 2006 (gmt 0)

Do you mean I should do a separate redirect for every page of my site that is listed w/o www, say, to point to the www version?

Shouldn't be necessary; one for the whole domain should work for every page without problems.

miki99




msg:3093188
 4:52 pm on Sep 22, 2006 (gmt 0)

Thank you so much for all your time and trouble! I sure appreciate it.

I will ask my host about the redirect. In the meantime, I found this "fix" you're supposed to add to your .htaccess file. Will this work?

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.html\ HTTP/
RewriteRule index\.html$ http://www.example.com/%1 [R=301,L]

And I will look up Xenu. Wow, good thing this stuff is interesting....

[edited by: Brett_Tabke at 2:32 pm (utc) on Sep. 24, 2006]
[edit reason] added space before ! per poster request [/edit]

trader




msg:3093239
 5:40 pm on Sep 22, 2006 (gmt 0)
Thanks to some direct feedback from one of the Mods we decided to go to the non-www using htaccess for all our sites, even though it's much more common practice to forward to the www.

I did that for 2 reasons, one being since we have so many long domains it makes the url's shorter and look better, and it just seems to make sense since so many people no longer typein the www anyway, including myself.

Before doing that we check the current Page Rank. If the GPR is higher for the www version we do not forward to the non-www but that is relatively rare as most of the sites have the same PR for both versions. In fact, the non-www occasionally has better PR even though we never forwarded to the non-www in the past.

Here is the htaccess code we are using (which works well and seems simple vs some other methods):

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.name\.com [NC]
RewriteRule ^(.*)$ http://name.com/$1 [L,R=301]

ErrorDocument 404 http://name.com/index.html
ErrorDocument 401 http://name.com/index.html
ErrorDocument 402 http://name.com/index.html
ErrorDocument 400 http://name.com/index.html

miki99




msg:3093323
 6:31 pm on Sep 22, 2006 (gmt 0)

Thanks, trader! I assume that's the code that redirects from www to non-www? (Rather than the reverse.) Looks like it, but just checkin'.

I asked previously whether the "Redirect" form in my Cpanel would create a 301 redirect. I'm still waiting to hear back from my web host, but I just created a "permanent" redirect through the cpanel form, then looked at my .htaccess file to see what kind of code it created. This is the code:

RedirectMatch permanent ^/mypage.htm$ [mysite.com...]

Since this looks nothing like any of the other code, I assume it's not a 301 Redirect?

(I tried to edit the example link so it wouldn't appear as a link, but couldn't get it right--hope it's not a problem.)

[edited by: miki99 at 6:36 pm (utc) on Sep. 22, 2006]

jdMorgan




msg:3093336
 6:35 pm on Sep 22, 2006 (gmt 0)

trader,

Your ErrorDocument directives are malformed, and will lead to a 302-Moved Temporarily response for any of those errors. See the notes in the ErrorDocument documentation [httpd.apache.org] for details on this.

To return the correct response code for each error, use:

ErrorDocument 404 /index.html
ErrorDocument 401 /index.html
ErrorDocument 402 /index.html
ErrorDocument 400 /index.html

I also strongly suggest that each of these errors be directed to a page that describes the specific error, which in turn provides a text link to your index page and/or site map to aid the visitor. You may also add a 5-to-10-second meta-refresh on those error pages to send the browser to your home page or site map if you so desire.

miki,
To avoid a double redirect on client index.html requests, I'd suggest:

RewriteEngine on
RewriteBase /
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.html\ HTTP/
RewriteRule index\.html$ http://www.example.com/%1 [R=301,L]
#
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

This reverses your rules, so that the more specific case of a request for /index.html is taken care of first, regardless of whether the canonical domain is requested.

It also removes the end-anchor from the domain name test, so it won't break if a port number is appended, and removes the unneeded anchoring on the ".*" RewriteRule pattern.

Jim

[edited by: jdMorgan at 6:48 pm (utc) on Sep. 22, 2006]

miki99




msg:3093347
 6:46 pm on Sep 22, 2006 (gmt 0)

Thanks so much, Jim! (Though I don't begin to understand the technical explanation.) Do I need that ErrorDocument stuff on the end of that code?

ErrorDocument /index.html
ErrorDocument /index.html
ErrorDocument /index.html
ErrorDocument /index.html

jdMorgan




msg:3093349
 6:50 pm on Sep 22, 2006 (gmt 0)

Please see the corrected post above.

miki, take this one step at a time. If you have not yet created custom error documents, then using those directives will cause a problem on your site. Read the documentation cited in my first post to find out more.

Jim

miki99




msg:3093390
 7:25 pm on Sep 22, 2006 (gmt 0)

"Please see the corrected post above."

It's OK Jim -- that's what I figured you meant.

"miki, take this one step at a time. If you have not yet created custom error documents, then using those directives will cause a problem on your site. Read the documentation cited in my first post to find out more."

Thanks very much again, Jim. Frankly, I don't even know what programming language the .htaccess code is--ASP? I will read the documentation you cited (missed it the first time, sorry).

I just heard back from my host that using the form in my Cpanel WILL create a 301 redirect. Maybe I should just do it that way?

miki99




msg:3093398
 7:32 pm on Sep 22, 2006 (gmt 0)

I've just discovered that I can actually create custom error pages through my Cpanel as well.

trader




msg:3093402
 7:40 pm on Sep 22, 2006 (gmt 0)

Thanks jd for correcting my code. I am not an expert so really do not understand why my ErrorDocument code was wrong but you are correct in that I just implemented the change with these results.

Using the way I did it in my earlier post I get this when testing a madeup htm name tested by using webconfs.com free header checker. It forwards to index with this: HTTP/1.1 302 Found =>

The after the change you suggested it still forwards to index and gets this: HTTP/1.1 404 Not Found =>

So it does work either way but the 404 error code now is accurate. Thanks again JD, most appreciated.

BTW, why does it make a difference if the error code 404 is indicated correctly or not since they both go to the index page? Sorry if that is a dumb question.

miki99




msg:3093414
 7:58 pm on Sep 22, 2006 (gmt 0)

Ummmm....I have another clueless question as well. I'm not sure if I can do a redirect of my whole domain through my Cpanel or not. The Cpanel form would only let me redirect my index page. That is, I can't redirect:

[mysite.com...] to [mysite.com...]

Just:

[mysite.com...] to [mysite.com...]

That wouldn't work for redirecting my whole domain, or would it?

miki

miki99




msg:3093416
 8:02 pm on Sep 22, 2006 (gmt 0)

If not, I'll use your redirect, Jim. I'm just so scared of messing something up because I'm such a babe in the woods here. Just stick the code in the .htaccess file, that's all, huh?

miki99




msg:3093453
 8:38 pm on Sep 22, 2006 (gmt 0)

YES! I got brave and just tried it your way, Jim. All the non www urls are now reverting to www! Thank you so much.

jdMorgan




msg:3093520
 9:41 pm on Sep 22, 2006 (gmt 0)

miki,

Glad you got it working!

Trader,

> BTW, why does it make a difference if the error code 404 is indicated correctly or not since they both go to the index page? Sorry if that is a dumb question.

HTTP/1.1 [w3.org] server status codes [w3.org] have very specific meanings to clients -- i.e. browsers and search engine robots. By returning a 302 status, you are telling search engines to keep the error URL (the one that caused a 404, for example) in their index, and ascribe the content of the redirect target page (your home page, in this instance) to the error URL.

Therefore, duplicate copies of your home page will appear to exist with these error URLs, in addition to your actual home page URL. This opens you up to duplicate-content problems (do a search [google.com] for dozens of threads here on WebmasterWorld) and to explots intended to create duplicate content problems for you...

Returning incorrect server status codes means your site is broken at the most fundamental level -- The HTTP protocol used to request and transmit Web pages.

Since you're asking for explanation, let me offer this advice: Do not feel free to toy with the HTTP protocol, post an untested robots.txt file, or leave canonicalization problems unremedied. If you confuse the search engine robots in the slightest way, your site is likely to suffer. At the very best, you leave yourself at their mercy to "figure out what I meant, not what I said.".

Jim

trader




msg:3093722
 2:13 am on Sep 23, 2006 (gmt 0)

Thanks for the feedback and advice jd. I really did not realize how important it is to do the ErrorDocument code the right way (even though I am not really new to web development).

I will also be hesitant to post any code in the future for fear of giving bad code. Will be studying the 7,100 search results. Much appreciated and glad I asked the dumb question!

miki99




msg:3094137
 3:12 pm on Sep 23, 2006 (gmt 0)

I also had no idea how important custom error pages are, and have added those to my to-do list. Can't believe how much I have learned in the past couple days due to Google's sudden dislike of my website! I guess that's the up side.

Thanks again, JD.

jdMorgan




msg:3094143
 3:32 pm on Sep 23, 2006 (gmt 0)

Custom error pages aren't the critical factor. What's important is that if implemented, they must be implemented correctly.

Also, there's no such thing as a "dumb question" -- There's only lots of broken, poorly-ranked Web sites because documentation wasn't read, questions weren't asked, or broken code wasn't posted.

Jim

miki99




msg:3094312
 7:06 pm on Sep 23, 2006 (gmt 0)

Points well taken. Thanks yet again. :-)

twebdonny




msg:3094540
 1:02 am on Sep 24, 2006 (gmt 0)

Anyone , is this properly written code or am I missing something as I see different characters in various posts I am looking at, Thanks

RewriteEngine On
RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule (.*) [mysite.com...] [R=301]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?
RewriteRule ^index\.html?$ [mysite.com...] [R=301,L]

following this I have several simple redirects listed

some users are using[NC] after the first entry, some not,
some users are using a $ after the first .com, some not

lmo4103




msg:3094587
 2:43 am on Sep 24, 2006 (gmt 0)

apache manual...
Despite the tons of examples and docs, mod_rewrite is voodoo...

Additionally you can set special flags for CondPattern by appending

[flags]

as the third argument to the RewriteCond directive. Flags is a comma-separated list of the following flags:

* 'nocaseŽNC' (no case)
This makes the test case-insensitive, i.e., there is no difference between 'A-Z' and 'a-z' both in the expanded TestString and the CondPattern. This flag is effective only for comparisons between TestString and CondPattern. It has no effect on filesystem and subrequest checks.
* 'ornextŽOR' (or next condition)

And $ there is regex "end of line" anchor.
The apache manual has a lot of helpful information, but it is just helpful.

I think you have to hammer it, then try it out.
Then hammer, and try some more.

When it works like a charm, you need to test it a lot -- before google finds out what you forgot.

As a rewrite-newbie, I would like to see a collection or references to tested and proven rewrite rules for the different situations that have recently become required.

I think I'll read this too [webmasterworld.com]

g1smd




msg:3094846
 2:03 pm on Sep 24, 2006 (gmt 0)

When it works like a charm, you need to test it a lot
-- before google finds out what you forgot.

You were a poet,
-- and didn't know it. :-)

g1smd




msg:3094854
 2:09 pm on Sep 24, 2006 (gmt 0)

The very first example redirect code on the previous page, and again just above, contains one problem.

For index pages it creates a Redirection Chain. You do not want a chain.

For domain.com/index.html you first get redirected to www.domain.com/index.html and then finally to www.domain.com/.

It is far better to do things in one step.

For */index.html ---> www.domain.com/

For domain.com/* ---> www.domain.com/*

To correct the problem, check for index redirection first, before detecting for non-www redirection.

This will work because the index redirection code does not test the source domain, but always specifies www as the destination.

I see that someone has posted a corrected version too (bottom of previous page). That works fine.

.

Just wading into the discussion on Error pages:

ErrorDocument 404 http://name.com/index.html

As noted above, this always produces a "302" error code, because you included "http://" in the URL. That is the wrong response code. It needs to produce a "404" code in the HTTP header for the bot to know that it really is a 404 page.

.
ErrorDocument 404 /index.html

This is still problematical.

Although it now returns the correct "404" response code, it is very bad form to serve an exact copy of your root index page in response to an error.

You are much better off making your own custom error pages containing two essential elements:
- the fact that an error has occurred.
- some basic site navigation to get the user on their way.

I always put those custom pages in their own folder.

The .htaccess directive then becomes:

ErrorDocument 404 /error.pages/error.404.html

Those custom error pages also contain <meta name="robots" content="noindex"> to stop them being indexed at their "real" URL.

That, too, is important.

g1smd




msg:3094889
 2:40 pm on Sep 24, 2006 (gmt 0)

>> Some users are using a $ after the first .com, some not

I assume that you are talking about the $1 notation.

If you are referring to the RewriteRule line then I hope I can explain some things here.

In that line you are taking one URL and turning it in to some other.

.

The first bit specifies what the original URL was, the second bit specifies what it becomes.

You can collect up a chunk of URL in a (.*) and then re-use it in a $1 later.

If you had a second (.*) you could reuse it in a $2 later.

You can append ^ and $ around a (.*), or around a whole expression, to denote an exact match like : ^(.*)$ or even ^/somefolder(.*)$ or in this case perhaps ^(.*)index.html$

.

Some examples for your index redirection:

RewriteRule ^index\.html?$ http://www.mysite.com/ [R=301,L]

This takes only the root index.html page and rewrites to the root www.domain.com/ without the index.html appended.
It does not cater for index pages in folders on the site.

.

RewriteRule ^(.*)index\.html?$ http://www.mysite.com/ [R=301,L]

This takes any index.html page (even one in a sub-folder - that's the (.*) bit) and rewrites to the root www.domain.com/ without the index.html appended. It does NOT preserve the folder name - BAD!

.

RewriteRule ^(.*)index\.html?$ http://www.mysite.com/$1 [R=301,L]

This one takes any index.html page (even one in a sub-folder - that's the (.*) bit) and rewrites the URL to the same folder (that's the $1 bit) www.domain.com/folder/ but without the index.html appended.

.

A note about the ? in the RewriteRule ^index\.html?$ part.

RewriteRule ^index\.htm$ - rewrites only for index.htm

RewriteRule ^index\.html$ - rewrites only for index.html

RewriteRule ^index\.html?$ - rewrites both for index.htm and for index.html

.

This stuff is very very powerful. It is logical, but it is not always obvious.

twebdonny




msg:3094938
 3:24 pm on Sep 24, 2006 (gmt 0)

Still a bit hazy on this, can anyone comment on whether this appears to be coded properly?

RewriteEngine On
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule (.*) [mysite.com...] [R=301]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?
RewriteRule ^index\.html?$ [mysite.com...] [R=301,L]

Thank you

g1smd




msg:3094941
 3:31 pm on Sep 24, 2006 (gmt 0)

As I said above, you need the index page redirect to be the FIRST one to avoid a redirection chain.

Additionally, your index page rewrite does not cater for index pages in any folders; only the root.

This 62 message thread spans 3 pages: 62 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / New To Web Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved