Forum Moderators: phranque

Message Too Old, No Replies

404 - possible to use a generic redirect

and tell the bots the file's moved permanently?

         

lorax

7:00 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a new version of a site that had about 50 pages in the old version. When I publish the new site I want to catch all the existing links to these outdated pages and redirect them to the new home page. So I'm thinking of using something like:

ErrorDocument 404 / [R=301]

in hopes that this will tell the bots that the link they're checking (and which doesn't exist) has been moved to (the top page for the site).

What problems will this cause if any?

Birdman

7:15 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For fifty pages, I would think you could redirect them individually.

Search engine wise, I think it would be wiser to 301 each old page to its counterpart, rather than to the home page. Just my 2c, however.

BTW, I was just reading Shopping Carts 101 [webmasterworld.com]...Nice!

Birdman

7:28 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also, after reading:

httpd.apache.org/docs/mod/core.html#errordocument

I don't think the ErrorDocument directive allows for optional flags(R,L,etc).

Birdman

lorax

7:42 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Re: individual pages - yeah I suppose so but I was hoping to avoid it. Boring at best and it just fattens up the htaccess file.

Just tested it and you're right - the errordocument line does not like the flag.

Birdman

7:54 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you prefer to just let the spiders find the new pages via your new homepage, then a standard errorDocument 404 / should do the job.

Are the old pages in folders or root?

jdMorgan

8:00 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



301-redirecting that many files to your home page may look like a scam to our friends at the 'plex, too.

I'd suggest adding some explanatory (and apologetic) text, plus a text link to your home page on your custom 404-page, and also a 5-to-15 second meta-refresh redirect to your home page. This is a 302 redirect and so makes no implicit claim that the home page is a replacement for the custom error page or for the missing pages.

I've used that technique for years without ill effect in the search engines.

As Birdman says, it would be better to redirect each page to a close counterpart if one is available, and then use the method above for those that have no reasonable replacement.

You can minimize the number of code lines needed for the redirect by using the regex inline 'OR' function:


RewriteRule ^(file1¦file2¦file3¦file4)\.html$ http://www.example.com/newwidgets.html [R=301,L]

Replace the broken pipe "¦" characters with the solid pipe character from your keyboard.

Jim

lorax

8:39 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the help folks.

One last follow up. I mapped the missing pages and then added my 404 line:

ErrorDocument 404 /404.php

But the server chokes when I add this line. I've tried it immediately after the 'RewriteEngine on' and after my last mask (hiding dynamic pages) 'RewriteRule ^(.*)\.html$ $1.php [L] [T=application/x-httpd-php]'. When I first tested this without the additional mappings it worked. An idea as to why it's not working now?

Birdman

9:05 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you tried moving the line above "rewriteEngine on"?

lorax

9:30 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Doesn't it need to be within the Rewrite directive?

jdMorgan

9:45 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No,

ErrorDocument is an Apache 'core' directive, while Rewrite**** is a mod_rewrite directive. They are handled separately.

Also, look at your server error log to see what the problem was. You may need to add an AddHandler directive in order to parse php error pages if you *only* use that RewriteRule to redirect to php files, and do not reference them in any other way. In that case, you may not have a handler defined for php files, and any php file accesses (including ErroDocument) not done using the RewriteRule will not work.

This directive would take the form:


AddHandler server-parsed php

You could also add

AddType application/x-httpd-php php

Jim

Birdman

9:45 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't think so. It is an Apache core feature.

too quick for me jd

lorax

10:02 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Understood.

I think the issue is the wildcarded rules for *.htm and *.html which were designed to serve up php files. These rules seem to be taken before a 404 is detected (which makes sense). SO the test file foo.html initiates a rewrite to foo.php but foo.php does not exist. So the server delivers a generic 404 error. Seems like I may have to nail down the specific files I'm depending on the rewrite rule to handle in order to use the errordocument directive.

jdMorgan

10:47 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmm...

The process does indeed go as you have surmised, but the server should not deliver a 'generic' 404 error page, it should deliver your custom 404.php page. You are not redirecting 404.php, you are redirecting <anything>.html.

Is it possible that you have some other Redirect, RedirectMatch, or RewriteRule directives that are interfering?

Jim

lorax

11:31 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Other than the long regex to capture the old pages the 2 lines that follow it are:

RewriteRule ^(.*)\.htm$ $1.php [T=application/x-httpd-php]
RewriteRule ^(.*)\.html$ $1.php [L] [T=application/x-httpd-php]

I verified the 404.php file is there and have tried both a relative and absolute path.

jdMorgan

12:40 am on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That last line has a problem - you should combine both flags within one set of square brackets, separated by a comma. As a matter of fact, you can combine both RewriteRules into one by making the "l" in "html" optional (follow it with "?") :

RewriteRule ^(.*)\.htm[b]l?[/b]$ $1.php [T=application/x-httpd-php[b],L[/b]]

I don't see why that would cause your problem, though... :(

What kind of error are you getting?

Jim

lorax

12:56 am on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



re: flags - got it

re: errors. The bogus file I'm asking for - which I know doesn't exist - is foo.html.

The requested URL /foo.php was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

jdMorgan

1:08 am on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This says that /404.php does not exist in your web root directory. (?)

Jim

lorax

1:09 am on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well that's what I thought but I can see it right there in the root. Should I use a file system path or a webserver path?

jdMorgan

4:31 am on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, ErrorDocument requires a local path, such as /404.php in order to return the proper status. If you put a full URL in there, you'll get a 302-Moved Temporarily status, which can be a search engine nightmare (It's also the single-most common mistake made with ErrorDocument).

Your ErrorDocument code is correct!

What happens of you request /404.php directly (from your browser address bar)?

If you're going crazy, be assured I'm going with you! :o

Jim

lorax

2:34 pm on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



HA! You're not going to believe this but here goes the sordid truth.

I entered the URI for the 404 page and got it - but with errors. Seems I forgot to remove the references to my development directory for things like the CSS, includes, and images.

Changed the paths on these and the file was delivered fine. Tested for foo.html and voila - the 404 page appears.

What do you make of that?

jdMorgan

5:33 pm on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> What do you make of that?

Success!

lorax

6:02 pm on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ayup - I owe you two beers now.