Forum Moderators: phranque
Recently, I redesigned my site and this has led to a lots of 404 error pages.
To quote an example, the url http://www.example.com/biology/students/john.html on my site is a 404 now.
What I would like to do is to redirect the user one level higher until there is an existing page (200 OK).
For eg: when he visits the url http://www.example.com/biology/students/john.html (which is a 404), the server would redirect him to the url http://www.example.com/biology/students/
In case the url http://www.example.com/biology/students/ is also a 404 error page, then it would take him one level up further to http://www.example.com/biology/ and so on until it comes across a real existing page.
Now I do not want to do a 301 redirect as this may reflect negatively in the search engines.
Using 404 directive, is there any way I can do this or is there any other solution which is safe with the search engines.
Thanks.
[edited by: jdMorgan at 4:42 pm (utc) on June 11, 2006]
[edit reason] Example.com [/edit]
However, to address your question directly, you may be able to use the "check for file exists" function of RewriteCond [httpd.apache.org]. Because this is an interesting and unusual question, I'll post some code -- something we try not to do here, since our focus [webmasterworld.com] is on helping you to write your own code, rather than providing a (clearly unsustainable) free coding service:
# If requested URL exists with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$1 -f
# serve contents from higher-directory file (Note that this is an internal rewrite, not an external redirect)
RewriteRule ^[^/]+/(.+)$ /$1 [L]
#
# Else if requested URL exists with two directory levels removed
RewriteCond %{DOCUMENT_ROOT}/$1 -f
# serve contents from higher-directory file
RewriteRule ^[^/]+/[^/]+/(.+)$ /$1 [L]
#
# Else if requested URL exists with three directory levels removed
RewriteCond %{DOCUMENT_ROOT}/$1 -f
# serve contents from higher-directory file
RewriteRule ^[^/]+/[^/]+/[^/]+/(.+)$ /$1 [L]
However, as I stated, the proper response is a 301-Moved Permamently. This will eliminate the old URLs from the search indexes over time, and avoid duplicate-content issues. Taking the first piece of code above as an example, modify it to:
# If requested URL exists with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$1 -f
# redirect to higher-directory URL (external redirect)
RewriteRule ^[^/]+/(.+)$ http://www.example.com/$1 [R=301,L]
The above code is not tested. And again, whether it works will also be determined by your existing server URL-to-filepath mapping configuration.
Jim
I forgot a step in the logic. Also, in the following code, I've added a query string that will help you debug this if it doesn't work. The server filepath corresponding to the originally-requested URL will be appended to the redirect URL, so you can read it in your address bar. If it's correct and the rule works, then comment out the first RewriteRule and use the second. Otherwise, that query string will give a good indication of how the filepath needs to be tweaked in order for the 'file exists' checks to function properly.
# If requested URL does not exist
RewriteCond %{DOCUMENT_ROOT}/$1/$2 !-f
# but it does exist with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$2 -f
# redirect to higher-directory URL (external redirect)
# (rule with debug data in query string)
RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$2?original_path=%{DOCUMENT_ROOT}/$1/$2 [R=301,L]
# (rule with debug data removed)
# RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$2 [R=301,L]
Now I do not want to do a 301 redirect as this may reflect negatively in the search engines.
Interesting approach. Seems like you plan to tell the search engines "well, the URL still exists, but the content has completely changed -- in fact, it's a duplicate of another page you already have in your index". No worries that that "may reflect negatively"?
I tried all this. Still it's not working.
I have modified the code in the sense : It should be $1 and not $2 as I understand. (see in bold)
###################################
# If requested URL does not exist
RewriteCond %{DOCUMENT_ROOT}/$1/$2!-f
# but it does exist with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$1 -f
# redirect to higher-directory URL (external redirect)
# (rule with debug data in query string)
RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$1?original_path=%{DOCUMENT_ROOT}/$1/$2 [R=301,L]
# (rule with debug data removed)
# RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$1 [R=301,L]
#######################################
For eg:
url http://www.example.com/abc/xyz.html is a 404 error.
url http://www.example.com/abc/ is a 200 status ok page.
When I load the page http://www.example.com/abc/xyz.html using the following code in .htaccess file
################################################
CODE 1
# If requested URL does not exist
RewriteCond %{DOCUMENT_ROOT}/$1/$2!-f
# but it does exist with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$1 -f
# redirect to higher-directory URL (external redirect)
# (rule with debug data in query string)
RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$1?original_path=%{DOCUMENT_ROOT}/$1/$2 [R=301,L]
# (rule with debug data removed)
# RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$1 [R=301,L]
################################################
I get the following in the rewrite.log file
############################################
REWRITE LOG FOR OUTPUT OF CODE 1
xx.xx.xx.xx - - [date] [deleted] (3) [per-dir /www/htdocs/example/] strip per-dir prefix: /www/htdocs/example/abc/xyz.html -> abc/xyz.html
xx.xx.xx.xx - - [date] [deleted] (3) [per-dir /www/htdocs/example/] applying pattern '^([^/]+)/(.+)$' to uri 'abc/xyz.html'
xx.xx.xx.xx - - [date] [deleted] (4) RewriteCond: input='/www/htdocs/example/abc/xyz.html' pattern='!-f' => matched
xx.xx.xx.xx - - [date] [deleted] (4) RewriteCond: input='/www/htdocs/example/abc' pattern='-f' => not-matched
xx.xx.xx.xx - - [date] [deleted] (1) [per-dir /www/htdocs/example/] pass through /www/htdocs/example/abc/xyz.html
xx.xx.xx.xx - - [date] [deleted] (3) [per-dir /www/htdocs/example/] strip per-dir prefix: /www/htdocs/example/abc/xyz.html -> abc/xyz.html
xx.xx.xx.xx - - [date] [deleted] (3) [per-dir /www/htdocs/example/] applying pattern '^([^/]+)/(.+)$' to uri 'abc/xyz.html'
xx.xx.xx.xx - - [date] [deleted] (4) RewriteCond: input='/www/htdocs/example/abc/xyz.html' pattern='!-f' => matched
xx.xx.xx.xx - - [date] [deleted] (4) RewriteCond: input='/www/htdocs/example/abc' pattern='-f' => not-matched
xx.xx.xx.xx - - [date] [deleted] (1) [per-dir /www/htdocs/example/] pass through /www/htdocs/example/abc/xyz.html
############################################
When I load the page http://www.example.com/abc/xyz.html using the following code in .htaccess file
################################################
CODE 2
# If requested URL does not exist
RewriteCond %{DOCUMENT_ROOT}/$1/$2!-f
# but it does exist with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$2 -f
# redirect to higher-directory URL (external redirect)
# (rule with debug data in query string)
RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$2?original_path=%{DOCUMENT_ROOT}/$1/$2 [R=301,L]
# (rule with debug data removed)
# RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$2 [R=301,L]
################################################
I get the following in the rewrite.log file
############################################
REWRITE LOG FOR OUTPUT OF CODE 2
xx.xx.xx.xx - - [date] [deleted] (3) [per-dir /www/htdocs/example/] strip per-dir prefix: /www/htdocs/example/abc/xyz.html -> abc/xyz.html
xx.xx.xx.xx - - [date] [deleted] (3) [per-dir /www/htdocs/example/] applying pattern '^([^/]+)/(.+)$' to uri 'abc/xyz.html'
xx.xx.xx.xx - - [date] [deleted] (4) RewriteCond: input='/www/htdocs/example/abc/xyz.html' pattern='!-f' => matched
xx.xx.xx.xx - - [date] [deleted] (4) RewriteCond: input='/www/htdocs/example/xyz.html' pattern='-f' => not-matched
xx.xx.xx.xx - - [date] [deleted] (1) [per-dir /www/htdocs/example/] pass through /www/htdocs/example/abc/xyz.html
xx.xx.xx.xx - - [date] [deleted] (3) [per-dir /www/htdocs/example/] strip per-dir prefix: /www/htdocs/example/abc/xyz.html -> abc/xyz.html
xx.xx.xx.xx - - [date] [deleted] (3) [per-dir /www/htdocs/example/] applying pattern '^([^/]+)/(.+)$' to uri 'abc/xyz.html'
xx.xx.xx.xx - - [date] [deleted] (4) RewriteCond: input='/www/htdocs/example/abc/xyz.html' pattern='!-f' => matched
xx.xx.xx.xx - - [date] [deleted] (4) RewriteCond: input='/www/htdocs/example/xyz.html' pattern='-f' => not-matched
xx.xx.xx.xx - - [date] [deleted] (1) [per-dir /www/htdocs/example/] pass through /www/htdocs/example/abc/xyz.html
Nonetheless, I tried all the codes that you provided me and tried working around by debugging, but no sucess.
I just got it. In the second Rewritecond statement if I used a "-d" instead of a "-f", it worked. Since it's a directory and not a file.
Now the case is such that there may be a file or a directory at any level which may not exists and may have to be replaced by a higher level directory (301 would always be done to a directory, but the 404 may be a file or a directory).
I have around 5 levels of categories like
www.example/a/b/c/d/e/xyx.html OR
www.example/a/b/c/d/abc.html and so on.
How do I prepare a perfect code which will look for a 404 directory or a file. (I understand that we may use only one option at a time. For eg: "-f" or "-d" but not "-fd" together.
Any idea as to how I may proceed?
Thanks again.
# (debug rule with file-exists path in query string)
RewriteCond %{QUERY_STRING} !docroot_path=
RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$1/$2?docroot_path=%{DOCUMENT_ROOT}/$1/$2 [R=301,L]
# If requested URL does not exist as a file
RewriteCond %{DOCUMENT_ROOT}/$1/$2 !-f
# and does not exist as a directory
RewriteCond %{DOCUMENT_ROOT}/$1/$2 !-d
# but it does exist as a file or directory with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$2 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/$2 -d
# Then redirect to higher-directory URL (external redirect)
# RewriteRule ^([^/]+)/(.+)$ http://www.example.com/$2 [R=301,L]
But the key is to look at the filepaths that are being tested by the RewriteConds, and to be sure they resolve to the correct directories and files for each URL tested. That's what the query string trick is for. If not, then adjust the path construction of the RewriteConds until they produce the correct paths to be tested.
A tested-path error might be obvious to you from looking at the RewriteLog info, but since I'm not familiar with your site or server and what the correct paths are, and since I have maybe 30 minutes a day to devote to posting here at WebmasterWorld, it's not easy for me to spot path errors. The data from the query string trick should make it obvious to you, though.
Also, remember that this is only one part of the solution. The above code is only intended to work on URLs that specify a file or directory one level below the Web root. Once this is tested and working, then additional rules or modifications can be developed to handle deeper directories.
Jim
I am testing the latest information that you provided. By doing the query string check, I found out that the url matches. Also there are no other special rewrites that we are doing through httpd.conf or other method.
For example, the query string showed the following info:
http://www.example.com/abc/xyz.html?docroot_path=/www/htdocs/example/abc/xyz.html
I actually almost succeeded in doing the 301 redirect, but there is a problem. For eg, using the code below:
############################################
RewriteEngine ON
# 6th Level Category
# If requested URL does not exist
RewriteCond %{DOCUMENT_ROOT}/$1/$2/$3/$4/$5/$6!-f
# but it does exist with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$1/$2/$3/$4/$5/ -d
# redirect to higher-directory URL (external redirect)
# (rule with debug data removed)
RewriteRule ^([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/(.+)$ http://www.example.com.com/$1/$2/$3/$4/$5/ [R=301,L]
######################################
I am able to do a 301 redirect to the correct path.
For eg: If there is a 404 error page at www.example/a/b/c/d/e/xyx.html , then it perfectly redirect to the path www.example/a/b/c/d/e/ (which is a 200 ok page).
But if I add one more condition like the following :
################################
RewriteEngine ON
# 6th Level Category
# If requested URL does not exist
RewriteCond %{DOCUMENT_ROOT}/$1/$2/$3/$4/$5/$6!-f
# but it does exist with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$1/$2/$3/$4/$5/ -d
# redirect to higher-directory URL (external redirect)
# (rule with debug data removed)
RewriteRule ^([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/(.+)$ http://www.example.com.com/$1/$2/$3/$4/$5/ [R=301,L]
# 5th Level Category
# If requested URL does not exist
RewriteCond %{DOCUMENT_ROOT}/$1/$2/$3/$4/$5!-f
# but it does exist with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$1/$2/$3/$4/ -d
# redirect to higher-directory URL (external redirect)
# (rule with debug data removed)
RewriteRule ^([^/]+)/([^/]+)/([^/]+)/([^/]+)/(.+)$ http://www.example.com.com/$1/$2/$3/$4/ [R=301,L]
##########################################
In such a case, it is following the conditions which I have given for the # 5th Level Category. It does not obey the conditions for the 6th level category.
So when I go to the url www.example/a/b/c/d/e/xyx.html (404), then instead of redirecting to www.example/a/b/c/d/e/ (200 ok page), it redirects to www.example/a/b/c/d/ (even though the page www.example/a/b/c/d/e/ exists)
Moral of the story is that it is following the last set of conditions and applying the 301 redirect.
Is there any way to stop it as soon as it encounters a 200 ok page?
You're also doing this the 'hard way' by using so many variables, but we'll get to that later after you get it working.
if you test $1/$2/$3/$4/$5 and it doesn't exist, then the next step is to check (and possibly redirect to $1/$2/$3/$5, not $1/$2/$3/$4 -- You'll need to check/redirect to the same file, but one directory up. The fact that you're accessing the file "/" doesn't matter -- it's still the index file of a directory.
So, taking a URL-view, if
/a/b/c/d/e/foo.html doesn't exist, then check/redirect to
/a/b/c/d/foo.html, and not to
/a/b/c/d/e
I hope that's clear.
Now in case I'm not back here for awhile, let me just demonstrate an easier way to do this, with the warning that it will be easier to do this later after you get all the bugs worked out:
Instead of working all the way through this with one set of rules for each possible directory depth, you can 'nest' the parenthesis and do it with just one rule. Something like:
# If requested URL does not exist
RewriteCond %{DOCUMENT_ROOT}/$1$3/$4 !-f
# but it does exist with one directory level removed
RewriteCond %{DOCUMENT_ROOT}/$1$4 -f
# redirect to higher-directory URL (external redirect)
RewriteRule ^[b]([/b]([^/]+)/[b])*[/b]([^/]+)/([^/]*)$ http://www.example.com.com/$1$4 [R=301,L]
Again, I suggest not trying this until you get everything else working with say, up to three levels deep, because it may be *much* harder to debug. Get all issues with correct redirect URLs and with missing trailing slashes resolved first, then go for the 'elegant' solution.
Jim
if you test $1/$2/$3/$4/$5 and it doesn't exist, then the next step is to check (and possibly redirect to $1/$2/$3/$5, not $1/$2/$3/$4 -- You'll need to check/redirect to the same file, but one directory up. The fact that you're accessing the file "/" doesn't matter -- it's still the index file of a directory.So, taking a URL-view, if
/a/b/c/d/e/foo.html doesn't exist, then check/redirect to
/a/b/c/d/foo.html, and not to
/a/b/c/d/e
Hi Jim,
Thanks for the information. The reason that I am doing so is because of the file structure that I have.
For eg: If /a/b/c/d/e/foo.html doesn't exist, then I need to first check whether there is an index.html file existing at the path /a/b/c/d/e/index.html (which is also equal to /a/b/c/d/e/). If it does not exists, then it should look for the path /a/b/c/d/index.html (which is also equal to /a/b/c/d/).
Hence I was doing a $1/$2/$3/$4/$5 to $1/$2/$3/$4/
In my case,
/a/b/c/d/e/foo.html has not been moved to /a/b/c/d/foo.html
The reason I am doing so is because foo.html no longer exists because it is outdated and hence I would like to redirect the user to the category (directory) under which it used to reside.
If that directory also doesn't exists anymore, then it should go one level higher, but look for the index.html and not for foo.html in that higher level directory.
I hope I have been able to clear it up.
Thanks a lot.
# If requested URL *does not* exist
RewriteCond %{DOCUMENT_ROOT}/$1$3 !-f
# and next-higher directory level *does* exist
RewriteCond %{DOCUMENT_ROOT}/$1 -d
# Strip filename or lowest-level directory and [b]externally redirect[/b] to next-higher-level directory-index URL
RewriteRule ^(([^/]+)/)*(.+)$ http://www.example.com/$1 [R=301,L]
#
# Else if requested URL *does not* exist
RewriteCond %{DOCUMENT_ROOT}/$1$3!-f
# Strip filename or lowest-level directory, [b]internally rewrite[/b] to next-higher directory-index URL, and [b]restart mod_rewrite[/b]
RewriteRule ^(([^/]+)/)*(.+)$ /$1 [N,L]
Otherwise, the code will internally rewrite to the trimmed URL, and restart mod_rewrite processing. Therefore, the code effectively loops, trimming off one URL element per pass until an existing directory-level is reached. Then a 301 redirect is invoked.
This approach eliminates multiple 301 redirects as the filepath is traversed.
Note that this code should be placed as high as possible in your .htaccess file. Because it is recursive, all mod_rewrite code between the top of the file and this code will be re-executed every time a path=part is removed, and all non-mod_rewrite code will also be paresed (but ignored) by mod_rewrite. The point being that if there's a lot of code between the top of the file and this code, it will make this code run slowly and inefficiently. However, you may have code that does need to execute before this code, and that should be left in place.
Jim