Forum Moderators: phranque

Message Too Old, No Replies

Handling multiple redirects and avoiding looping issues

Redirect uppercase urls to lowercase using apache mod and other redirects

         

coolseo

7:02 am on Feb 24, 2016 (gmt 0)

10+ Year Member



Hi,

Our website runs on windows server. Recently we noticed that our urls are case-insensitive. So for SEO purposes we are going to use ‘httpd.conf file’ method for uppercase to lowercase redirection. (As mentioned here: brianflove.com/2014/08/11/lowercase-your-uris)

My question is,

We also want to do few other 301 redirects on our site to undo editorial errors in past and some of these old urls have uppercase letters in them. Do you think that the above mentioned redirect rule will contradict with these manually set redirects in some way?

On top of that, we are planning to have SSL certificate which creates HTTPS version of pages and needs redirects from http to https.

We don't want to create issues like redirect loops or similar.

Kindly guide, what would be the best practice for handling these 3 requirements simultaneously.

phranque

11:07 am on Feb 24, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



your redirects should be ordered from most specific to most general.
the Substitution string of these RewriteRules should specify the full protocol and hostname (i.e. https://www.example.com/...)

whitespace

12:49 pm on Feb 24, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Recently we noticed that our urls are case-insensitive.


This is presumably for URLs that map directly to the filesystem?

As mentioned here: brianflove.com/2014/08/11/lowercase-your-uris)


Ironically, although the linked article states, "This is especially problematic if you are using a case insensitive file system such as Windows.", the example code they have given won't actually work as-is on a case-insensitive file system such as Windows. The code they've given:


RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule ^(.*)$ ${lc:$1} [R=301,L]


On a case-insensitive file system... if you have a file called "example.html" and request "ExAmPlE.HtMl", the first condition will fail, because "ExAmPlE.HtMl" does exist!

You would need to remove the ! (negation) prefix on the CondPattern of the first 2 conditions (RewriteCond directives), so that only files that exist are rewritten. (Or remove them altogether to redirect all requests?!)

whitespace

2:24 pm on Feb 24, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



You would need to remove the ! (negation) prefix on the CondPattern of the first 2 conditions (RewriteCond directives), so that only files that exist are rewritten.


AND... if you do that you will need to add the OR flag to the first condition. Either a file or a directory (it can't be both). In other words:


RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d

whitespace

5:11 pm on Feb 24, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Actually, if you are on (case-insensitive) Windows then you don't even need the RewriteMap. The URL-path matched by the RewriteRule pattern is already the normalized file system path. ie. the lowercase URL. So, you can simply write:


RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


Assuming the actual file on the file system is "example.html", then a request for "EXAMPLE.HTML" will be redirected to "example.html" (as it appears on the file system).

...some of these old urls have uppercase letters in them. Do you think that the above mentioned redirect rule will contradict with these manually set redirects in some way?


The above directives do assume that all your URLs (file system paths) are lowercase (except for the query string, which should be passed through unscathed).

So yes, there could potentially be a conflict, although it depends on the redirect. Are you redirecting to another URL that contains uppercase letters (conflict)? Or is the target URL always lowercase (no-conflict)? In the later case you just need to make sure the specific redirect appears before these later directives (as phranque states, most specific to most general - in that order).

In order to avoid conflict (the former case) you may need to make exceptions for any valid target URL that contains uppercase letters.

lucy24

6:10 pm on Feb 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We also want to do few other 301 redirects on our site to undo editorial errors in past and some of these old urls have uppercase letters in them. Do you think that the above mentioned redirect rule will contradict with these manually set redirects in some way?

Not if you put your rules in the right order. List your individual, specific-URL redirects first. Then the new URL will already be correct by the time the request reaches your catch-all rule, which would be something like
RewriteRule [A-Z] blahblah
(no anchors) meaning "If the request contains EVEN ONE capital letter in the path, then do stuff". The "stuff" can be either the to_lower RewriteMap (if you've got access to the config file, which it sounds as if you do) or a php page (if you don't).

The rule will be your second- or third-to-last external redirect: before domain-name canonicalization, and either before or after the "index.html" redirect. (Before, if you've got directory names containing capital letters so you potentially have to do two things at once; otherwise after.)

If you've got URLs in the form "Index.html", I do not want to hear about it ;)

whitespace

10:50 pm on Feb 24, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month




RewriteRule [A-Z] blahblah


This form doesn't actually work as intended on case-insensitive Windows, where the request maps to the file system and the file that it maps to is all lowercase. So, given a request for "EXAMPLE.HTML", where the file is actually "example.html" (successful because the OS is case-insensitive) then the above RewriteRule fails to match. The RewriteRule appears to match against the normalized filename as it appears on the file system. However, the REQUEST_URI server variable does hold the uppercase URL that was actually requested.

As a little test....


RewriteCond %{REQUEST_URI} .*
RewriteRule .* - [E=ENV_MATCHED_CASE_TEST:$0\ %0]


And output the environment variable in your script of choice:


<?php
echo getenv('ENV_MATCHED_CASE_TEST');

coolseo

8:53 am on Feb 29, 2016 (gmt 0)

10+ Year Member



I guess this is getting too technical for me. :)
Please see if I am getting it correctly or not. As per @phranque and @lucy24, we should be doing it in this very sequence:

1) specific redirects (using full protocol and hostname)
2) index.html redirect
3) lowercase redirection rule
4) www canonicalization redirect
5) https canonicalization redirect

So the 'https' will come last? Do we have to use https protocol in lowercase redirection rule as well?

Please guide me.
----------------------------------------------

@whitespace

Thanks for dwelling deep in this.

This is presumably for URLs that map directly to the filesystem?

Assuming the actual file on the file system is "example.html"

Yes, the urls that map directly to filesystem. We are using a custom CMS. But I am not sure, what actual name the file gets when an editor creates a page in backend and names it as '/Sample-Page'. How do I check it? What if its also a, 'Sample-Page'.

Are you redirecting to another URL that contains uppercase letters?

No, the target URL will always be lowercase.


RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

So the above rule is good enough to redirect all Uppercase/SentenceCase urls to lowercase on windows server? Does it work even if one capital letter is found in path? Also as you said I don't need to rewritemap for this, right?

lucy24

4:35 pm on Feb 29, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



4) www canonicalization redirect
5) https canonicalization redirect
These two can be combined into a single rule. Use two Conditions, with [OR].

Do we have to use https protocol in lowercase redirection rule as well?
Use the correct protocol for each individual URL. If necessary, have two sets of rules-- one for the HTTP directories and the other for HTTPS.

lucy24

8:50 pm on Feb 29, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops, forgot about time limits.
I am not sure, what actual name the file gets when an editor creates a page in backend and names it as '/Sample-Page'. How do I check it? What if its also a, 'Sample-Page'.

Have a look at your site, using ftp or whatever you normally use to access the physical files.

While it is theoretically possible for a real file to be named "Sample-Page" without extension, it is wildly unlikely that your CMS really does this. If the URL ends in /page then the physical file might be "page.php" or it might be "page/index.php" (that is, the index file of a single-page directory) or ... there are other possibilities, but those are the likeliest.

If the URL is different from the physical filename, there's rewriting going on somewhere-- in mod_rewrite, in the CMS itself, or both. Does your CMS involve a -d or -f test? You definitely don't want to set up your rules in such a way that the test has to be done more than once. And if you can prevent it being done at all, as with non-page files, that's even better.