Forum Moderators: phranque
I have tried:
Redirect 301 /folder/page-name/ http://www.example.com/folder/page-name Of course, it simply times out. I tried disabling indexes in the vain hope that this would work (it doesn't) and I've searched for a couple of hours for a solution. Is there a simple one?
URLs, on the other hand, need no extension.
So, in terms of your Links/URLs and your server-internal filenames (don't mix them up), what are you actually trying to accomplish here?
Jim
It's a follow on to an issue I was having on a related topic here, converting from a CMS to static pages. [webmasterworld.com]
I have worked out a way to deal with that issue, but I now have the problem that - whilst I have maintained the exact structure of the CMS - I am being defeated by pages that have the same name as a subdirectory.
E.G. If a file is called widget.ttt, I have used a combination of a custom MIME extension and the following (which I believe is your code from another thread) to remove the display of the file extension, without affecting other files on my site:
RewriteCond %{REQUEST_filename} !-d
RewriteCond %{REQUEST_filename} !-f
rewriterule ^(([^/]+/)*[^./]+)/?$ /$1.ttt [L] However, if the file sits at the same level of a directory called 'widget' (and probably even if it doesn't), the server appears to assume that a URL without a file extension is a directory.
Am I making any sense?
RewriteCond %{REQUEST_FILENAME}.ttt -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.ttt [L]
Jim
> Extensionless file URLs are still seen as paths to the root of a directory.
You're making this difficult for yourself... URLs can be extensionless, files cannot.
Rather that go through all the grief of changing page URLs, I'd suggest you stick with this project a bit longer. Such problems can be solved...
What URLs did you test with? (Several examples would be good... Using example.com as the domain)
To what server file paths should those example URL-paths resolve?
Also, do be sure to completely flush (delete) your browser cache after any change to server-side code.
Jim
It's a shared hosting service, so I will have to check about MultiViews or AcceptPathInfo. I suspect that I wouldn't be able to change it either way though...
OK, so far...
This CMS was a directory type of script (although this is not a directory), hence the reasons that the URLs are organised in this way. Category / Sub Category etc...
So, to simulate the URL structure that this CMS displayed using static html files, I put the files such as 'subject-matter.ttt' in the same location as a sub-directory called 'subject-matter' (which contains other html files from a 'sub-category').
As a result, when the url http://www.example.com/sub-directory/subject-matter is entered, it adds the trailing slash like this http://www.example.com/sub-directory/subject-matter/
I then get a not found error.
However, if I change the sub-directory name to something else - e.g. 'subject-matter-2' - the above URL loads the file 'subject-matter.ttt' without an issue using this url http://www.example.com/sub-directory/subject-matter . I.E. the training slash is not added.
This behaviour is consistent with all the urls I have tried. If there is a directory with the same name as the last part of the url, then the url defaults to adding the trailing slash and trying to view the directory contents.
E.G.
top (directory)
/
subject-matter.ttt
subject-matter (directory)
/
other-file1.ttt
other-file2.ttt
If the sub-directory has a different name, then the url displays fine
E.G.
top (directory)
/
subject-matter.ttt
subject-matter-2 (directory)
/
other-file1.ttt
other-file2.ttt
This ship is not sinking. In fact, it's not even leaking as initially reported by crew. The only problem is that the cook left his portal open to get a bit of fresh air, and a wave splashed a bit of water in...
In .htaccess:
Options -MultiViews AcceptPathInfo Off DirectorySlash Off All three of the above can be done on some sites -- and on practically all static sites. However, if your site depends on the function, then obviously you'll have trouble if you disable it. But this trouble will be quite obvious quite quickly, and the change can easily be reverted.
Your 'control' of "what gets rewritten and when" might be better done by location in the filesystem; I don't see a need for a custom MIME-type here. You can put a .htaccess file into a subdirectory, and it will affect only requested URL-paths which resolve to that subdirectory. Or you can prefix all mod_rewrite rule patterns with the subdirectory name, and again, these rules will only apply to URL-paths referring to that subdirectory. However, in the interest of fixing the problem at hand before adding any additional complexity, I suggest fixing what you have first. The result can then serve as a 'solid base' for further modification.
> whatever I do here needs to only affect the files and folders in this part of the site.
Clarifying again: Whatever you do here needs to only affect the URLs in this part of the site's URL-space. RewriteRules look at requested URLs, not files. They then translate URL-paths to filepaths as a *final* step, unless an external redirect is specified by the rule's syntax. Don't mix up URLs and files, or much difficulty will ensue; URL-spaces and filespaces are entirely different things, not even necessarily related. It is only the action of the server that associates files in the filespace with URLs defined on the Web. And mod_rewrite can be used to alter the "default association rules" so that URLs are traslated to filepaths in a "non-standard" way.
Jim
OK, putting my cook's hat back on. So far, I put an .htaccess file just in the subdirectory (as I realised that this affected a couple of URLs elsewhere) and I only implemented 'DirectorySlash Off'.
This seems to have done the trick. Fantastic! Thank you very much.
This being the case, is there any reason why I should also disable MultiViews and AcceptPathInfo? I added and then disabled them as they appeared to have no extra effect.
Otherwise, I'm having some slight redirection issues (directories to file urls), but I'll try working on these myself a bit before I take up any more of your time.
Thanks again for your help.
Some links from external sites go to http://www.example.com/sub-directory and, because of the DirectorySlash Off command, this gives a 404 (or rather a 403 because I have switched indexes off). I.E. it is now looking for a file in the root of the web site called 'sub-directory'.
I have tried redirecting this to http://www.example.com/sub-directory/ but all I get is the addition of multiple forward slashes. E.G. http://www.example.com/sub-directory//////////////
The less important redirection issues are (and this really isn't a big deal, I only mention it because it probably has a connection with the above issue) that I'm having problems with redirecting directory url to file url. This works fine for the top level directories/categories, but falls down with the sub-directories…
E.G. Redirect 301 /widget/category1/category2/ http://www.example.com/widget/category1/category2
I eventually worked out that this was because I am doing it the slow and klunky way, by trying to redirect individual urls and so the higher directory level url was cancelling out the deeper subdirectory rewrite. Like this: http://www.example.com/widget/category1category2/
I also tried RedirectMatch permanent but this simply redirected the sub-directory to the higher level directory.
E.G. Redirect 301 /widget/category1/category2/ http://www.example.com/widget/category1
Strangely, this even affects URLs that are not specified in the rewrite. E.G. The url http://www.example.com/widget/blah1/blah2 is also redirected to http://www.example.com/widget/blah1blah2/ even though I have not added a redirection rule for it in .htaccess.
This is not a huge problem, as the main thing is that the links are all correctly working now.
I'm sure others will be looking in horror at this, but it works. >->'
Any suggestions always welcome.
This being the case, is there any reason why I should also disable MultiViews and AcceptPathInfo? I added and then disabled them as they appeared to have no extra effect.
Each of these options requires processing time and adds complication. If you don't need them, turn them off.
Well, I fixed the main issue by creating a new MIME type for a php file.
Quit it with the MIME-types! You're tinkering with Web standards here in the name of expediency, and those standards exist for a reason: To make the Web work properly and to save you misery, time, and money...
e.g. Redirect 301 /widget/category1/category2/ http://www.example.com/widget/category1
Do not mix mod_alias code (e.g. Redirect 301) with mod_rewrite code. If you use any mod_rewrite code, then use all-mod_rewrite code, forswearing mod_alias redirects. Otherwise, you can lose control over execution order of your external redirects versus internal rewrites, thus exposing internally-rewritten filepaths to Web clients as URLs, and creating duplicate-content issues.
"Now hear this. This is the Captain speaking: Directory and directory-index-page URLs end with slashes. Other 'page' URLs do not. Web sailors who violate this protocol will be subject to disciplinary action and loss of (Page)rank."
You simply cannot play fast and loose with trailing slashes without suffering problems, server performance issues, or both. Although some of this is beyond your control, being subject to the linking errors of other Webmasters, you should unfailingly observe the rules on/within your own site.
You have two choices to fix/resolve the trailing slash issues. You can add one-off external redirects for the incorrect URLs), or you can get into the fur-ball of doing file- and directory-exists check, externally redirecting to add or remove slashes as required.
The first approach, using one-off rules, would look like this:
RewriteRule ^sub-directory$ http://www.example.com/sub-directory/ [R=301,L]
The end-anchor "$" on the pattern is the critical difference, and was likely missing from your RedirectMatch directive(s), causing the looping which led to multiple trailing slashes.
This approach will work for a few 'bad' incoming links, but requires on-going vigilance and maintenance as new 'bad' incoming links are found. Unfortunately, the second approach which eliminates this maintenance aspect can have a huge impact on your server performance, so although it "looks easy" it will slow down your site and work your physical disk harder.
In outline, the rules to implement the second approach, which are medium-specific and so should precede only a few other external redirects (perhaps only your domain canonicalization rule which is generally the last external redirect), would look something like this:
# Externally redirect to add missing trailing slash to directory
# requests, excluding requests for URLs that have "file extensions."
RewriteCond $1 !^([^/]*/)*([^.]*\.)+.+$
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*[^/])$ http://www.exmple.com/$1/ [R=301,L]
#
# Externally redirect to remove spurious trailing
# slash(es) from requested "extensionless" URLs
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)/+$ http://www.exmple.com/$1 [R=301,L]
This construct gives priority to directories when extensionless URLs are requested; If a directory exists, then no extensionless "page" URL which "collides" with that directory name can be used on this site. In such a case, the extensionless "page" will be inaccessible, as the directory will get priority and the requests will always be redirected to add the trailing slash.
This code affects server performance and hard-drive life because for every request to your server, either one or two file-exists/directory-exists checks are made. If either redirect rule is invoked, then three file-exists/directory-exists checks will be made because both rules will be invoked -- one before and one after the redirect.
Each exists-check results in a call to the operating system to check the filesystem 'map'. If that map is not currently cached in memory or is stale, then the OS will have to go read the physical disk -- which being mechanical, is quite slow compared to just running code. In addition to reduced server responsiveness and wear-and-tear on the hard drive, there is also the possibility that you'll be forced into an early server upgrade as your site grows in popularity, simply due to all the extra work created by 'exists' checking...
Jim
Quit it with the MIME-types! You're tinkering with Web standards here in the name of expediency, and those standards exist for a reason: To make the Web work properly and to save you misery, time, and money...
For some reason I read this in my head in the voice of Scotty from Star Trek, warning about me disrupting the space time continuum, or something. :)
You are, of course, quite right. I will give this a try tomorrow when I have had a rest and can try to get my head around it a bit more.
Thanks very much for all the help, it is much appreciated and I really am learning stuff (very slowly).