homepage Welcome to WebmasterWorld Guest from 54.196.62.132
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / New To Web Development
Forum Library, Charter, Moderators: brotherhood of lan & mack

New To Web Development Forum

This 62 message thread spans 3 pages: < < 62 ( 1 [2] 3 > >     
Canonical problems?
How does this happen?
miki99




msg:3092991
 2:27 pm on Sep 22, 2006 (gmt 0)

I've been looking into why my site took a steep nosedive in the Google rankings recently (after Sept. 15), and just discovered that I have many pages listed with www at the beginning of the URL, and many more not beginning with www.

How much of a problem is this, how does it happen, and what can I do about it? Please keeping in mind that until a couple of days ago, I never heard of "canonical problems," and am not even really sure what the term means (though I did Google it, and ended up back on another wemasterworld forum, where the discussions were over my head). :-)

Many thanks for any and all enlightenment.

 

trader




msg:3094960
 3:54 pm on Sep 24, 2006 (gmt 0)

...ErrorDocument 404 /index.html

This is still problematical.

Although it now returns the correct "404" response code, it is very bad form to serve an exact copy of your root index page in response to an error.

You are much better off making your own custom error pages containing two essential elements:
- the fact that an error has occurred.
- some basic site navigation to get the user on their way.

I always put those custom pages in their own folder.

The .htaccess directive then becomes:

ErrorDocument 404 /error.pages/error.404.html

Those custom error pages also contain <meta name="robots" content="noindex"> to stop them being indexed at their "real" URL.

A question for g1 please. Can you post a copy of error.404.html as I do not know what to put in it?

Ideally I would also like visitors to get redirected after several seconds to the index page so I do not lose the vistors who may not navigate there, but I do not have the delayed forwading code. Do you know where I could get that code?

Also, my evidence indicates places like Adsense, YPN and parking firms do not count home-page traffic as valid uniques which comes via redirects though I am not positive about that. In fact, parking firms say they do not want forwarded traffic in their TOS.

By doing it this way: ErrorDocument 404 /index.html Woould the traffic be more likely or possibly less likely to be counted as a unique visitor by my stats or ppc provider vs: ErrorDocument 404 /error.pages/error.404.html

[edited by: trader at 3:55 pm (utc) on Sep. 24, 2006]

twebdonny




msg:3094961
 3:55 pm on Sep 24, 2006 (gmt 0)

Gotcha, so just swapping the 2 rules should place them in the proper positions. I think I'll leave the root domain issue alone, as I would fear it might affect index.htm files within other folders on my site?

Thanks

g1smd




msg:3094965
 4:00 pm on Sep 24, 2006 (gmt 0)

Yes, swap the two rules (index rule first; non-www rule last), and get yourself an index redirect that caters for all index files on the site - root index and folder index. My code above gives you all that you need. Just swap out the required line and test it on the server.

g1smd




msg:3094971
 4:07 pm on Sep 24, 2006 (gmt 0)

>> Can you post a copy of error.404.html as I do not know what to put in it? <<

It needs a statement that an error has occurred: "That page no longer exists" or whatever, and it also needs some basic site navigation back to your index page, and main content sections.

You could take a copy of your index page, save it as error.404.html and then change it around a bit to suit.

.

>> Ideally I would also like visitors to get redirected after several seconds to the index page so I do not lose the vistors who may not navigate there <<

No! Don't redirect. Put the words and the links that you want the visitor to see, ON the error page itself. They can click off from there.

.

>> By doing it this way: ErrorDocument 404 /index.html Woould the traffic be more likely or possibly less likely to be counted as a unique visitor by my stats or ppc provider vs: ErrorDocument 404 /error.pages/error.404.html <<

Do not serve your root index page as an error page. That will confuse the heck out of Google: your root index page looks the same as your error page. Don't do it. Serve a customised error page for your errors. It can be BASED on the same sort of content as your index page but do not serve an exact copy of it.

In this way, the origin page for the click will be seen as being the URL of the page that didn't exist.

trader




msg:3094993
 4:35 pm on Sep 24, 2006 (gmt 0)

Thanks G1, most appreciated. Makes good sense. Will try to implement all that asap.

P.S. A reason we wanted to install a delayed redirect of about 7 seconds is a percentage of visitors will likely not click on the link but instead navigate away using their X or back button. With the delayed redirect I would not lose that valuable traffic.

g1smd




msg:3095018
 5:01 pm on Sep 24, 2006 (gmt 0)

By supplying a customised copy of your index page as the error page you lose nothing compared to serving the index page itself.

However you gain a lot by not confusing the bots.

trader




msg:3095032
 5:27 pm on Sep 24, 2006 (gmt 0)

Ok, now I get it as by making a custom copy of the index page and similar to a regular content page with some links to other pages along with the error information, they should not be likely to exit! Thanks again.

g1smd




msg:3095041
 5:43 pm on Sep 24, 2006 (gmt 0)

Yep. Give 'em what they wanted, or an easy way to get to it, in the error page.

Glad we got there in the end. :-)

Best of luck with your site. Most people miss these basics; you are now ahead of them.

AndyA




msg:3095278
 10:19 pm on Sep 24, 2006 (gmt 0)

RewriteRule ^(.*)index\.html?$ [mysite.com...] [R=301,L]

This one takes any index.html page (even one in a sub-folder - that's the (.*) bit) and rewrites the URL to the same folder (that's the $1 bit) www.domain.com/folder/ but without the index.html appended.

This isn't working on my server. Everything else is, but when I enter http:/ /mydomain.com/folder/index.html, that's what I get in the address bar.

I should see http:/ /mydomain.com/folder/, correct?

g1smd




msg:3095295
 10:41 pm on Sep 24, 2006 (gmt 0)

You do have that paired with a RewriteCond statement (goes on the line before the RewriteRule) to select when that RewriteRule will be called, don't you?

What Condition are you testing for there? It should be the presence of "index.html" in THE_REQUEST.

jdMorgan




msg:3095303
 10:45 pm on Sep 24, 2006 (gmt 0)

Yes, that code may also cause a redirection loop, because it will interact with DirectoryIndex if index.htm or index.html is defined as the directory index file.

Use the code twebdonny posted above, modified per the discussion that followed:

RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*)/)*index\.html?
RewriteRule ^([^/]*)/)*index\.html?$ http://www.example.com/ [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301]

Jim

[edited by: jdMorgan at 10:46 pm (utc) on Sep. 24, 2006]

g1smd




msg:3095320
 11:01 pm on Sep 24, 2006 (gmt 0)

I already use something like:

RewriteEngine on

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/ [NC]
RewriteRule ^(.*)index.html?$ http://www.example.com/$1 [R=301,L]

RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

There are several similar ways to get the same job done.

I hope that there are no errors in my code. It seems to work.

jdMorgan




msg:3095332
 11:20 pm on Sep 24, 2006 (gmt 0)

Yes, that will work too. However, the correction you showed will indeed be necessary to the code I posted.

RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*)/)*index\.html?
RewriteRule ^([^/]*)/)*index\.html?$ http://www.example.com[b]/$1[/b] [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301]

Having made that correction, the only remaining difference is the use of the more-specific "([^/]*)/)*" pattern, which will require fewer retries to match than a ".*" pattern, because it looks for a "/" to end each subpattern matching attempt, rather than matching the entire string to start, and then having to back up through the "index\.html" substring to get a match.

Jim

g1smd




msg:3095357
 12:01 am on Sep 25, 2006 (gmt 0)

Is that right, with one left bracket and two right brackets?

AndyA




msg:3095362
 12:05 am on Sep 25, 2006 (gmt 0)

I still can't get it to work for /index.html pages. Here's the code I have:

RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\/([^/]*)/)*index\.html?
RewriteRule ^([^/]*)/)*index\.html?$ [mydomain.com...] [R=301]
RewriteCond %{HTTP_HOST} ^www\.mydomain\.com$ [NC]
RewriteRule ^(.*)$ [mydomain.com...] [R=301]

I have my domain redirected to the non-www version.

[edited by: AndyA at 12:06 am (utc) on Sep. 25, 2006]

g1smd




msg:3095372
 12:24 am on Sep 25, 2006 (gmt 0)

You're missing a space between the \ and the / in the first line of the first redirect.

You're also going to need [R=301,L] in place of just the [R=301] that you already have.

[edited by: g1smd at 12:28 am (utc) on Sep. 25, 2006]

miki99




msg:3095374
 12:26 am on Sep 25, 2006 (gmt 0)

I'm kind of glad at the moment I didn't even know enough to HAVE index.html files in folders. Is there an advantage to having these, rather than calling your main page in a folder something else?

g1smd




msg:3095377
 12:30 am on Sep 25, 2006 (gmt 0)

What happens when you try to access http://www.yourdomain.com/folder/ on your site then?

I hate it when I get an error message. I expect to see the index for that section of the website there.

twebdonny




msg:3095382
 12:39 am on Sep 25, 2006 (gmt 0)

The [R=301,L] should be after his last entry correct, and [R=301] after the first?

AndyA




msg:3095383
 12:41 am on Sep 25, 2006 (gmt 0)

Still not working.

RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*)/)*index\.html?
RewriteRule ^([^/]*)/)*index\.html?$ [mydomain.com...] [R=301,L]
RewriteCond %{HTTP_HOST} ^www\.mydomain\.com$ [NC]
RewriteRule ^(.*)$ [mydomain.com...] [R=301,L]

AndyA




msg:3095384
 12:45 am on Sep 25, 2006 (gmt 0)

OK - just checked again with another browser. It IS working for [mydomain.com...] - that is dropping the index.html part.

However, if you type in [mydomain...] it doesn't drop the index.html.

Progress, but still not where it should be. Any ideas?

g1smd




msg:3095386
 12:49 am on Sep 25, 2006 (gmt 0)

I'd wait for jdMorgan to throw some light on that.

It isn't obvious to me why that is happening. I suspect the unmatched brackets as I said above.

Bedtime here...

[edited by: g1smd at 1:02 am (utc) on Sep. 25, 2006]

jdMorgan




msg:3095394
 12:59 am on Sep 25, 2006 (gmt 0)


RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?
RewriteRule ^(([^/]*/)*)index\.html?$ http://example.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule (.*) http://example.com/$1 [R=301,L]

One too many right parentheses in both lines of the first rule...
Or rather, one too few left parentheses in the second line of the first rule.

Sorry, I'm posting while tired, and my typing was never very good.

Jim

[edited by: jdMorgan at 1:11 am (utc) on Sep. 25, 2006]

AndyA




msg:3095430
 1:58 am on Sep 25, 2006 (gmt 0)

Thanks jdMorgan and g1smd,

The final version worked - for all /index.html I tried. Now Googlebot will have something to think about next time he/she/it stops by!

Both of you get some rest, you've earned it!

twebdonny




msg:3095961
 2:31 pm on Sep 25, 2006 (gmt 0)

anyone on this:

The [R=301,L] should be after his last entry correct, and [R=301] after the first? or do I need the R=301,L after both entries?

This is the way I have it currently:

RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?
RewriteRule ^index\.html?$ [mysite.com...] [R=301]

RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule (.*) [mysite.com...] [R=301,L]

Thanks

g1smd




msg:3095982
 2:48 pm on Sep 25, 2006 (gmt 0)

The [L] says it is the last rule to be considered, at the time; so both need the [L] I think.

By the way, your index redirect does not cater for folders. See the code above that does.

AndyA




msg:3096038
 3:22 pm on Sep 25, 2006 (gmt 0)

twebdonny,

Since you're attempting to do two different things here, (two different conditions), add the [R=301,L] at the end of the second line in both conditions. I agree with you that it's a bit confusing. The last post from jpMorgan works beautifully on my site, and should accomplish exactly what you want as well. I used the L on both.

twebdonny




msg:3096074
 3:41 pm on Sep 25, 2006 (gmt 0)

Thanks all for the help. as far as
>>>index redirect does not cater for folders<<<

I would worry that my index.htm files that are defaults in
other folders than the root would be redirected to the main root index.htm, which wouldn't work in my case?

Thanks

g1smd




msg:3096085
 3:47 pm on Sep 25, 2006 (gmt 0)

No, you can redirect /whatever/index.html for both www and non-www at the same time, redirected to www.domain.com/whatever/ and preserve the folder names in the redirect too.

The code above does do that, and automatically works for folders, sub-folders, sub-sub-folders, sub-sub-sub-fols, s-s-s-s-fols, s-s-s-s-s-fols, etc.

.

You collect up the bit of the URL that isn't the letters "index.html" and put it in a container, with this:

^(([^/]*/)*)index\.html?$

and then you re-use that data you stored in the container, and append it to the target URL by using $1 here:

http://mydomain.com/$1

The stuff in the container was all of the URL except for the domain name, and the "index.html" part.

sc112




msg:3096117
 4:16 pm on Sep 25, 2006 (gmt 0)

In one of the earlier posts, after

RewriteEngine on

there is this line:

RewriteBase /

Is this line needed? What does it do?

g1smd




msg:3113912
 11:14 am on Oct 9, 2006 (gmt 0)

Try it both with and without, for a number of test cases, and you'll see...

This 62 message thread spans 3 pages: < < 62 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / New To Web Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved