Forum Moderators: phranque
The use case is:
1) client lands on 404 url like /dogsnot.php?x=2&y=1
2) client is redirected to /index.php?x=2&y=1
The variables are not fixed, and vary widely, so they can't be fixed. It must just pass everything after the?.
This is a must. Without that data, everything breaks.
The problems I'm not understanding are:
1) how do I pass querystring data?
2) how do I match *all* 404 errors on the site?
I could easily do this with php, but I discovered that this can lead to duplicate listings in search engines, etc, which is all bad.
Can anyone tell me how to do this? I'm sorry to ask, but I've put a good couple of hours into it, and can't seem to get a working solution. I've looked everywhere, including this handy url:
[engelschall.com...]
However, nothing. I know this is simple, but I can't seem to stitch it together.
Thanks in advance.
Jonathan
Welcome to WebmasterWorld!
I'm sorry, it's not clear what you want to do.
If you wish to have all 404's "redirect" to your index page, then you'd use
ErrorDocument 404 /index.html
If you want the visitor to actually see a specific 404 page before "redirecting," then put a link and/or a meta-refresh on your 404 page, pointing to your index page. You can include the query paramters using SSI and the QUERY_STRING (I think that's the name) server variable.
If you want to detect missing files and handle those is some special way, then the RewriteCond directive of mod_rewrite, used with %{REQUEST_FILENAME} and the -f and -d flags might be of some use.
However, if you wish to declare a 404 based on a missing database entry for a dynamic page, then you'll need to handle that inside php, by outputting the correct Status response header.
Before going any further, I'd like to know how you decide that a 404 response is needed, and then how and when you want to "redirect." Part of the cause of confusion is semantic, and part is technical -- a specific solution requires a very specific description of the problem.
Jim
Thank you Jim for your quick reply. Apparently, I didn't provide nearly enough information! ;-(
> I'm sorry, it's not clear what you want to do.
> If you wish to have all 404's "redirect" to your index page, then you'd use
>ErrorDocument 404 /index.html
Yes, of course. But this approach loses the Querystring arguments on our application, which kills things. ;-)
> If you want to detect missing files and handle those is some special way, then the RewriteCond directive of mod_rewrite, used with %{REQUEST_FILENAME} and the -f and -d flags might be of some use.
Yes, some variation on the approach is what I'm looking for.
> However, if you wish to declare a 404 based on a missing database entry for a dynamic page, then you'll need to handle that inside php, by outputting the correct Status response header.
No, this is simpler than that. I could do this with php, but we handle a lot of 404s, and this would be more effecient I think.
> Before going any further, I'd like to know how you decide that a 404 response is needed, and then how and when you want to "redirect." Part of the cause of confusion is semantic, and part is technical -- a specific solution requires a very specific description of the problem.
Well, we have campaigns that need to be periodically removed, such as /campaign6x9/?blah=1&q=2
What I want to do is redirect when campaigns are removed so that it will go back into our round robin scripts in index.php, so something like:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{QUERY_STRING} ^p=([0-9]*)$
RewriteRule ^\.*$ [mysite.com...] [R=301,L]
Some of that was stolen from your post.
The idea is that whatever comes in as a qs argument is appended to the redirect to index.php.
Jonathan
So the entire /campaign6x9/ directory will be removed?
It is critical to get a definition of the basis upon which you "declare" a 404. Is it removed files, removed directories, or just data missing from a database?
In the case of a removed "campaign" directory, you could use something like this:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/campaign.+\.php$ http://mysite.com/index.php [R=301,L]
Unless you need to modify the query string, the above code should pass it through unchanged. Be aware that you will not return a 404-Not Found response, so search engines will not know that the resource has been removed.
One approach you could take would be to rewrite (in the RewriteRule) to a small php script that would generate the proper 404 Status response, and put the query parameters into the "Location:" header of that response, along with "index.php".
Jim
Good question. Ok, so the problem is that there is a lot of legacy traffic (hundreds of conditional matches to old campaigns) that we're trying to match individually, so matching a particular directory is unfeasable, and it's for both files and directories.
There are campaigns called /campaign06, but also /6v2, /3ss, /abc, as well as about another million variations. So, I was trying to match all occurences of 404 in the VirtualHost, as in a 404 handler, in order to regain that lost traffic and improve visitor experience.
In terms of using php -- yes, we could, but it's a very high traffic site, and I want to gain the effeciency of a mod_rewrite solution. Loading up php for every 404 on the site will really add a lot of urls to our php caching, which may drag down the performance of the main applications.
We're very close. Your solution is ideal. The remaining question I have is how do you match all 404 errors on a site (we're using index.php as a factory file -- a file through which all others are called, but there are quite a few (again, hundreds) of exceptions where content still exists for various purposes, so I can't use a catch-all style arrangement to put all httpd calls but index.php calls through to index.php -- it's got to be 404s specifically and nothing else.
I hope I'm being clear. I do truly appreciate the help.
Jonathan
Going forward, I strongly suggest you fix this problem, by strictly categorizing "temporary" and "permanent" content, and putting them into specific subdirectories. This will ease the burden of future removed-content-handling. You can further improve the performance of your sites by categorizing cacheable and uncacheable content, and setting approriate cache-control headers in those directories.
The following code will check each requested php file to be sure that it exists. It is indiscriminate, except for requiring a php extension, and is therefore not very efficient. Anything you can do to make it more selective -- based upon your directory/file naming conventions -- will improve efficiency.
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule \.php$ http://mysite.com/index.php [R=301,L]
[edited by: jdMorgan at 11:50 pm (utc) on Nov. 16, 2004]
I read your post carefully. It looks as thought that would also redirect existing php pages, as it matches /*.php.
Is that not the case? For the sake of the archives, would you explain it if not? I'd love to have it on record for the next person who comes looking for this answer.
Jonathan
The first RewriteCond detects that the requested resource does not exist as a directory, the second detects that it does not exist as a filename.Unless you need to modify the query string, the above code should pass it through unchanged. Be aware that you will not return a 404-Not Found response, so search engines will not know that the resource has been removed.
I've also just finished testing this, and it doesn't work. The errors are logged:
File does not exist: /home/sites/foo.com_ssl/public_html/boo
However, the rules we are discussing are not working.
I have tried:
[foo.com...]
[foo.com...]
[foo.com...]
[foo.com...]
Nada. ;-(
Jonathan
Apache Version Apache/2.0.51 (Fedora)
Apache API Version 20020903
It's on FC2.
This is what's in the httpd.conf:
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
#
# AllowOverride controls what directives may be placed in .htaccess files.
# It can be "All", "None", or any combination of the keywords:
# Options FileInfo AuthConfig Limit
#
AllowOverride None
Do I need to allow overrides? And odes AcceptPathInfo have to be set?
Sorry, I'm asking basic questions, but I don't know the answers. ;-(
Thanks for all the help. I really appreciated it.
Unfortunately, nothing worked. So, I'm abandoning the project.
I wrote this in about 3 minutes in php, and it works great. Of course it should be wrapped in a OO framework, but this could help someone who's confronting the same problem, so I'm posting it here (GPL):
<?
**************************************************
*
* Author: Jonathan Dillon
* Use: passes Querystrings around on redirect
*
* License: GPL, have at, but redistribute
* and that's legal
*
**************************************************
// this just sets us up in case we have problems with requesturi (some machines have issues, especially winblows)
if (!isset($_SERVER['REQUEST_URI'])) {
$_SERVER['REQUEST_URI'] = $_SERVER['SCRIPT_NAME'].'?'.$_SERVER['QUERY_STRING'];
}
$myURL = $_SERVER['REQUEST_URI'];
// show me the location of the strlen
$pos = strpos($myURL, "?");
// show me the length overall
$length=strlen($myURL);
// make sure that the manipulation happens only if there is a qs
if ($pos!== false) {
// this just subtracts the length from the pos
$outputURL = substr($myURL, $pos, $length);
}
// now tell them we've moved to avoid duplicate entires in search engines
header("HTTP/1.0 301 Moved");
// and go
header("Location: index.php$outputURL");
?>