Forum Moderators: phranque
For some reason, on the old box, if someone were to input a href or image location without the leading slash in the path, magically Apache would look in the current directory, realize that the directory or file does not exist, and then start at the DocumentRoot. e.g. CWD=/somepath, href="somepath/something.html", would actually end up bringing someone to /somepath/something.html. On the new version, if one enters href="somepath/something.html", and they are already in "/somepath" it will bring them to "/somepath/somepath/something.html" instead of "/somepath/something.html". This is exactly what I would expect, as it is simply following the unix filesystem structure.
I've tried a bunch of rewriterules, but I'm fairly new to rewriterules so I've had no luck. Any help would be greatly appreciated.
A useful test would be to compare the HTTP headers exchanged between your browser and the server on both boxes, using the Live HTTP Headers add-on for Firefox and Mozilla browsers.
Some background:
It is the client (browser or robot) which resolves relative links such as <a href="somepath/something.html>" or <img src="../images/logo.gif">. It must do this, because it is the agent looking at the HTML page, and it must send a canonical URL to the server to comply with the HTTP specifications.
However, since it was your server that changed and not your browser, this indicates that there may be a mis-match between the actual URL-path and the URL-path requested by your client. Or there may be a mis-match between the filepath resolved by the server for that URL-path and the actual filepath. The goal is to find out why.
It could be that there is some basic mis-configuration on the new server, such that a correct URL is not resolving to the proper server filepath, or it could be that the site design has been incorrect for awhile (on the old server) but that the old server had content-negotiation or mod_speling enabled, and was "fixing-up" these requests, unbeknownst to you.
Look at the HTTP request headers from your browser, correlated with the contents of your server error log (not just the access log), should give you some useful information about this problem. If the URLs look OK, look very carefully at the filepath in the error log error line, and compare that with the actual location of the file on the server.
This process will allow you to determine where in the request chain the breakdown is occurring, and then you will be able to get a better idea of what the problem might be.
The expected sequence is:
GET /something/otherthing.html HTTP1.1
Host: example.com
(other request headers here)
So that's what you expect to see, as long as there are no Alias functions, redirects, rewrites, errors, content-negotiation, or mod_speling fix-ups in the transaction.
Jim
So here is where the rub comes in. I installed the Live HTTP heads add-on for firefox, and unfortunately, it's telling me exactly what I figured.
Let me give you some more specific examples. Within my DocumentRoot, I have an images, /images, folder which houses all of the images for the site (quite normal).
Now, I have other subsections of the page, such as /resources. We have some homebrewed content management system which allows anyone to enter raw, unfiltered html, and does no error checking. As such, people have created pages under /resources, that include references to the image folder, but forgot to put the leading slash. So, instead of img src="/images/abc.gif", they've put img src="images/abc.gif"...
When I navigate to the old site, somehow apache knows that "images/abc.gif" actually refers to "/images/abc.gif", and writes the page accordingly -- even though if one looks at the source of the page, "images/abc.gif" shows up.
The new site, however, does not do this. The new site will send "images/abc.gif" to "/resources/images/abc.gif"
The headers don't mention anything about redirection on either site. The error and access logs also don't have anything of worth, other than the browser is requesting files that do not exist. I even turned up my log level to debug.
I compared the modules that are being loaded on the old site versus the new site, and there are quite a bit of additional modules on the new site.
Here is a list of modules that are on the new site:
LoadModule auth_basic_module modules/mod_auth_basic.so
LoadModule auth_digest_module modules/mod_auth_digest.so
LoadModule authn_file_module modules/mod_authn_file.so
LoadModule authn_alias_module modules/mod_authn_alias.so
LoadModule authn_anon_module modules/mod_authn_anon.so
LoadModule authn_dbm_module modules/mod_authn_dbm.so
LoadModule authn_default_module modules/mod_authn_default.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule authz_user_module modules/mod_authz_user.so
LoadModule authz_owner_module modules/mod_authz_owner.so
LoadModule authz_groupfile_module modules/mod_authz_groupfile.so
LoadModule authz_dbm_module modules/mod_authz_dbm.so
LoadModule authz_default_module modules/mod_authz_default.so
LoadModule ldap_module modules/mod_ldap.so
LoadModule authnz_ldap_module modules/mod_authnz_ldap.so
LoadModule include_module modules/mod_include.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule logio_module modules/mod_logio.so
LoadModule env_module modules/mod_env.so
LoadModule ext_filter_module modules/mod_ext_filter.so
LoadModule mime_magic_module modules/mod_mime_magic.so
LoadModule expires_module modules/mod_expires.so
LoadModule deflate_module modules/mod_deflate.so
LoadModule headers_module modules/mod_headers.so
LoadModule usertrack_module modules/mod_usertrack.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule mime_module modules/mod_mime.so
LoadModule dav_module modules/mod_dav.so
LoadModule status_module modules/mod_status.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule info_module modules/mod_info.so
LoadModule dav_fs_module modules/mod_dav_fs.so
LoadModule vhost_alias_module modules/mod_vhost_alias.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule dir_module modules/mod_dir.so
LoadModule actions_module modules/mod_actions.so
LoadModule speling_module modules/mod_speling.so
LoadModule userdir_module modules/mod_userdir.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule cache_module modules/mod_cache.so
LoadModule suexec_module modules/mod_suexec.so
LoadModule disk_cache_module modules/mod_disk_cache.so
LoadModule file_cache_module modules/mod_file_cache.so
LoadModule mem_cache_module modules/mod_mem_cache.so
LoadModule cgi_module modules/mod_cgi.so
LoadModule version_module modules/mod_version.so
LoadModule cern_meta_module modules/mod_cern_meta.so
LoadModule asis_module modules/mod_asis.so
Here is a list of modules being loaded on the old site:
LoadModule access_module modules/mod_access.so
LoadModule auth_module modules/mod_auth.so
LoadModule auth_anon_module modules/mod_auth_anon.so
LoadModule auth_dbm_module modules/mod_auth_dbm.so
LoadModule auth_digest_module modules/mod_auth_digest.so
LoadModule include_module modules/mod_include.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule env_module modules/mod_env.so
LoadModule mime_magic_module modules/mod_mime_magic.so
LoadModule cern_meta_module modules/mod_cern_meta.so
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
LoadModule usertrack_module modules/mod_usertrack.so
LoadModule unique_id_module modules/mod_unique_id.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule mime_module modules/mod_mime.so
LoadModule dav_module modules/mod_dav.so
LoadModule status_module modules/mod_status.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule asis_module modules/mod_asis.so
LoadModule info_module modules/mod_info.so
LoadModule dav_fs_module modules/mod_dav_fs.so
LoadModule vhost_alias_module modules/mod_vhost_alias.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule dir_module modules/mod_dir.so
LoadModule imap_module modules/mod_imap.so
LoadModule actions_module modules/mod_actions.so
LoadModule speling_module modules/mod_speling.so
LoadModule userdir_module modules/mod_userdir.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
Any thoughts are greatly appreciated!
-Andrew
Some folks accuse me of suffering from the "If all you've got is a hammer, then everything looks like a nail" syndrome, and I hate to just throw mod_rewrite code at every problem before it's clear that code is needed. So, that's why I asked about this situation in detail.
It's clear that you have a somewhat 'contained' problem here, so the answer now boils down to whether there's an SEO aspect to it. If these page and/or image URLs are indexed (or indexable) and you care about their ranking, then a mod_rewrite solution would be beneficial, because you could signal the error to the search engines with a 301 redirect to the correct URL. These transactions would then show in your log files, and you could take action to correct them if you wished to do so.
So, that would look something like this:
# If requested /resources/images/ URL does not resolve to existing file or directory
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# and if requested URL resolves to an existing file in /images
RewriteCond %{DOCUMENT_ROOT}/images/$1 -f
# Externally redirect the request to remove the /resources URL-path-part
RewriteRule ^resources/images/(.+)$ http://www.example.com/images/$1 [R=301,L]
This also assumes that the code goes into example.com/.htaccess and that you've already got other working mod_rewrite rules in that file.
The first two checks are only needed if there is a possibility that the requested resource (image) might actually exist in /resources/images. Get rid of them if you're sure this is not the case and will never be the case. File-exists and directory-exists checks are very server-resource-intensive, and should be avoided when possible. However, this aspect of the rule design may also need to change depending on how many other source-URL-path and target-path variations you've got, so you may want to leave this in place for now.
Jim