Forum Moderators: phranque

Message Too Old, No Replies

Reduce 404 errors due to case sensitivity

.htaccess RewriteCond

         

thecheeto

6:22 pm on May 25, 2016 (gmt 0)

5+ Year Member



Here is my website setup:
  • A WordPress site installed in the directory /var/www/wordpress This /wordpress directory is where the domain www.domain.com points to
  • Static HTML pages located at /var/www/arizona /var/www/alabama and so on
  • I have yet to take all of the static HTML pages and put them into WordPress, so I have created symbolic links within the /wordpress directory that go up one directory so that a user can type www.domain.com/arizona and be directed to the HTML static pages and NOT a page within WordPress.

My problem is that after looking at my 404 error logs, I see a number of errors having to do with users typing in URLs with the wrong case. For example, a capital A in the URL, like: http://www.domain.com/Arizona displays the WordPress 404 error page.

What I would like to do is make it so that if a user types in any of the following that they are directed towards the correct static HTML page
  • http://www.domain.com/Arizona
  • http://www.domain.com/aRiZoNa
  • Or any other combination of lower case and capital letter spelling for the word arizona
  • A side benefit would be to be able to capture www.domain.com/AZ and www.domain.com/az and direct those users to www.domain.com/arizona as well.

Here is what is in my .htaccess file currently. A few rewrite rules to capture some old URLs from a previous Drupal installation of the site, and then just the stuff that WordPress set there.

RewriteEngine on
RewriteCond %{QUERY_STRING} ^q=contact
RewriteRule ^ /contact-us/? [L,R=301]
RewriteCond %{QUERY_STRING} ^q=node/6
RewriteRule ^ /industry/? [L,R=301]
RewriteCond %{QUERY_STRING} ^q=node/7
RewriteRule ^ /exams/? [L,R=301]
RewriteCond %{QUERY_STRING} ^q=node/5
RewriteRule ^ /company/? [L,R=301]

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPres


I have tried using a rewrite condition such as this:
RewriteCond %{REQUEST_URI} /arizona [NC]
RewriteRule .* http://www.domain.com/arizona/ [R=301,L]

but if I place it above the WordPress entries in the .htaccess file then I get an error that the "The page isn't redirecting properly. Firefox has detected that the server is redirecting the request for this address in a way that will never complete." but if I place it after then WordPress entries in the .htaccess file then it gets ignored.

Thank you in advance for any help you can offer.

keyplyr

3:02 pm on Jun 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld thecheeto.

You can use RedirectMatch in your htaccess to cover both letter cases.

Just do a search for "Apache RedirectMatch." There's plenty of tutorials.

lucy24

6:23 pm on Jun 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can use RedirectMatch

But you wouldn't want to, because that's a mod_alias directive and we've already established that the site uses mod_rewrite.

### ! Where are all these unanswered posts coming from? There was one yesterday too that was many days old.

I have tried using a rewrite condition such as this:

RewriteCond %{REQUEST_URI} /arizona [NC]
RewriteRule .* http://www.example.com/arizona/ [R=301,L]
You need a second RewriteCond so the rule only applies to incorrectly cased requests. Otherwise it goes around in circles, as you've found.
RewriteCond %{REQUEST_URI} ^/arizona [NC]
RewriteCond %{REQUEST_URI} !^/arizona/$
RewriteRule .* http://www.example.com/arizona/ [R=301,L]
See how that works? But if the rule is only intended for one specific URI, then that belongs in the body of the rule so the server doesn't waste time evaluating conditions the rest of the time. Make sure the [NC] flag is in the right place:
RewriteCond %{REQUEST_URI} !^/arizona/$
RewriteRule ^arizona http://www.example.com/arizona/ [NC,R=301,L]

keyplyr

11:04 pm on Jun 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But you wouldn't want to, because that's a mod_alias directive and we've already established that the site uses mod_rewrite
Ya know, I see people saying that but servers nowadays preload most mods so I don't see the issue. Just don't mix the two in the same section of the code. Besides RedirectMatch would only take one line. I don't do it currently but I've used both together without issue.

lucy24

11:52 pm on Jun 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



don't mix the two in the same section of the code

? Mods never mix; each one executes independently, and each one goes all the way along-- from config file to the inmost directory-- before handing off to the next mod. The only time things might toggle back and forth is if you've got some redirects tucked inside a <Files> envelope. (I actually do this, because, ahem, cough-cough, I didn't know you're not supposed to, and it's a handy way to separate-out some categories.)

No, combining mod_alias and mod_rewrite won't break either mod. But on shared hosting you have no control over execution order, so you're liable to end up with chained redirects if a given request applies to more than one mod's rules.

Be that as it may: In the specific question that prompted this thread, I think you really have to use mod_rewrite, because the redirect requires a condition "If the request is not suchandsuch". RedirectMatch can do a lot, but it can't use conditions.

keyplyr

12:09 am on Jun 5, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Got it :)

LilyTousi

3:29 pm on Jun 13, 2016 (gmt 0)

10+ Year Member



Hi,
I have a similar problem, but reverse ...
GWT generates periodically error 404 because of capital letters.
For example. An actual file looks like it : map-HotelMartinoSpaResort-CostaRica.html
GWT error 404 : map-hotelmartinosparesort-costarica.html (same file name, but only without capital letters)
I used to redirected them manually, but as my site contains nearly thousand files. I do not want to do it this way.
Is there a way to prevent this ? in .htaccess.
Thanks!

lucy24

9:50 pm on Jun 13, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you figure out why Google is asking for nonexistent files? (CostaRica != costarica, in exactly the same way that costarica != guatemala) Does it happen with all files, or only selected ones? If it's the latter, see if you can find who's linking to you with incorrect casing. Check WMT/GSC in the "who links to you" tab. You could then redirect those specific files only.

Not long ago there was a flurry of bingbot requests for wrongly cased files, but mercifully they gave up after a while. In fact the 404 may have helped; if they'd got a 301 they would have kept asking to see if the redirect is still in place.

If you've got mod_speling [sic] you can override case sensitivity, but honestly, you don't want to use it. It just encourages wrong requests and makes more work for the server. This is the rare case where a 404 really is the most appropriate response: "Sorry, but I have no idea what you are talking about."

LilyTousi

1:49 pm on Jun 14, 2016 (gmt 0)

10+ Year Member



Hi Lucy,
It started about two weeks ago. At first, I thought it was my fault during an hotel renaming, but I realized it is exactly the same name, but without capital letters. I always look at who the link referred to, but in this case, there is no (who links to you) tab. I check the broken links every week. It is new. I am worry that all my files will show up eventually. It is weird!
Thanks

lucy24

5:37 pm on Jun 14, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



there is no (who links to you) tab

:: detour to look ::

Search Traffic >> Links to Your Site
and then you've got a choice between "Who links the most" and "Your most linked content".

Find the specific file, if you can-- it can be awkward because they're listed in order of frequency, so you have to plow past all those blogs that create a new URL every time the blogger sneezes. (According to google, WebmasterWorld-- that is, this site-- has 982 links to my biography page. I'm pretty sure that count is off by 981. In fact even Google can only come up with 12 specific citations, reducing the overcount to a more manageable 11.)

The googlebot itself will never send a referer on page requests. (On rare occasions they do for non-pages, probably to test whether you serve different versions to different pages.) That's why you have to check "Links to your site" and see if you can figure out who was leading them astray.

I am worry that all my files will show up eventually. It is weird!

Search engine spiders are weird. Nothing you can do about that. But take a closer look at the specific files that are-- so far-- getting requested with wrong casing.