Forum Moderators: phranque

Message Too Old, No Replies

directory letter case and mod rewrite

         

Excalibur

7:19 pm on Mar 1, 2010 (gmt 0)

10+ Year Member



Hello,

I have a website to which I use some cms and I choose to run it on a sub directory (en-US). Now the problem with this system that it automatically converts all URL's to lowercase "I know it's stupid".
So, trying to make sure all urls (www.site.com/en-us/...) are changed to (www.site.com/en-US/...) So I try to use %{QUERY_STRING} and I dont seem to be able to do it properly.

These are my lines (for everything)


#the line where I try to make sure it's "forced" to be en-US
RewriteCond %{QUERY_STRING} ^en-us/$ [NC]
RewriteRule ^(.+)?$ http://%{HTTP_HOST}/en-US/$1 [L,R=301]

RewriteCond %{REQUEST_URI} !^/?en-US [NC]
RewriteCond %{REQUEST_URI} !^/images
RewriteCond %{REQUEST_URI} !^/fonts
RewriteRule ^(.+)?$ http://%{HTTP_HOST}/en-US/$1 [L,R=301]
RewriteRule ^images(/([a-z0-9/._\-]*))?$ /en-US/templates/theme2010/images/$2 [NC,L]
RewriteRule ^fonts(/([a-z0-9/._\-]*))?$ /en-US/fonts/$2 [NC,L]
############################################################################################
# Force WWW (force www to all urls except for these specific sub-domains
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} !^support.*
RewriteCond %{HTTP_HOST} !^wiki.*
RewriteCond %{HTTP_HOST} !^projects.*
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/en-US/$1 [R=301,L]



Any help is much appreciated.

[edited by: jdMorgan at 4:02 pm (utc) on Mar 2, 2010]
[edit reason] de-linked HTTP_HOST lines for readability [/edit]

g1smd

7:59 pm on Mar 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Personally, I prefer all lower case for URLs.

It makes redirecting and rewriting a lot easier, and easy to find linking errors.

Excalibur

9:43 pm on Mar 1, 2010 (gmt 0)

10+ Year Member



Why thank you, g1smd. But maybe I prefer things differently.
Is it possible to fix that issue I need to fix please?

jdMorgan

4:55 pm on Mar 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Modify the CMS code, or if you're on Apache 2.x, put an output filter on it to parse each out-going page and change the language-directory-path to uppercase. The former may be costly, and will have to be done for each CMS upgrade, while the latter will be slow, and may require a server hardware upgrade if your site is busy.

Or, drop the requirement for uppercase language code aesthetics.

The reason that you must use the above-described methods is that it is the links on your pages that *define* the URLs by "publishing" them to the Web. It matters not whether a particular URL resolves to an existing file on a live server; If the URL appears in a link, then that URL exists and will be spidered by search engines and be clicked by visitors.

So, if your CMS publishes lowercase-language-code URLs, and you add code to redirect those lowercase-language-code URLs to uppercase-language-code URLs, then the result will be a browser redirect for each and every request for those lowercase-language-code URLs. This will essentially double the page-loading time, make a mess of your logs and stats and double their size, throw off your analytics, and increase your server workload and bandwidth usage.

In addition, sites publishing links which always result in a redirect will likely suffer a reduction in ranking factors associated with "technical quality," since this is a glaring error in site implementation. I'm not claiming that there is any specific "penalty" for this, but if something (e.g. these proposed redirects) on your site slows the user experience and wastes Googlebot's time, you cannot expect the results to be good. In fact, Google has announced that "page loading speed" will now be considered in the ranking algorithms, and you're currently planning to almost double yours with your proposed solution.

In order to avoid this, you have to fix the lowercase-language-code URLs where they originate -- in the HTML code for the pages themselves. So that requires a CMS patch or an Apache output filter as described.

The redirect function is only useful and correct after the lowercase-language-code URLs are corrected in your on-page links, in that it will speed re-indexing of the corrected URLs in search engines, and deliver visitors to the correct page if they use an incorrect/obsolete bookmark or click on an incorrect/obsolete link on a third-party site beyond your control. In these cases, the previously-described client-side delay and server-side workload are unavoidable, but since you cannot control visitor bookmarks or third-party sites, these problems have to be accepted.

If you review the threads here, you'll find that many of our contributors are opinionated. While the reasoning for their opinions may not be fully explained in every thread, you can be assured that most of them are based on hard-won experience dealing with many real-world problems. So, for the reasons outlined above, I also prefer lowercase URLs myself, and especially in the case where a major software package I'm using forces me to use them anyway.

Should you wish to proceed, we may need to see examples of the lowercase-language-code URLs in the links on your pages, and the server filepath to which each of these links is to resolve; In your initial post, you discuss checking QUERY_STRING, but it appears that your URLs are actually in static form, with the language-code represented as the initial URL-path-part (i.e. the "top-level virtual directory"). If this is the case, then there is no need to examine the query string to detect lowercase language-code requests; The pattern for the RewriteRule should be testing that initial URL-path-part.

It's also not clear what your HTTP/HTTPS requirements are, and your last rule seems to redirect everything to HTTPS as well as forcing "www." Since using HTTPS also slows page and included-object loading times, that rule needs careful consideration as well; Things that require HTTPS should be linked-to as HTTPS on your pages, and should be forced to HTTPS in much the same way as described above for language codes -- to "repair" bad third-party links and bookmarks. Things that do not require HTTPS should be linked-to as HTTP, and should be forced back to HTTP if incorrectly bookmarked or linked-to on third party sites.

The bottom line here is that you cannot use HTTP client redirects to "correct" bad links published on your site, because doing so will slow your visitors' experience, double the work needed to load a page, and possibly cost you in search engine rankings.

Because of the questions I have about your on-page-link URLs and the HTTP/HTTPS aspects of your site, the following suggested corrections and optimizations are only partial -- I believe that the HTTP/HTTPS aspect is still incorrect and needs much more attention.

#the line where I try to make sure it's "forced" to be en-US
RewriteRule ^en-us(/.*)?$ http://%{HTTP_HOST}/en-US/$1 [L,R=301]
#
RewriteCond $1 !^(en-US|images|fonts) [NC]
RewriteRule ^(.*)$ http://%{HTTP_HOST}/en-US/$1 [L,R=301]
#
RewriteRule ^(images|fonts)(/([a-z0-9/._\-]*))?$ /en-US/templates/theme2010/$1/$3 [NC,L]
#
# Force WWW [b]and HTTPS[/b] (force www [b]and HTTPS[/b] for
# all URL requests except for these specific sub-domains
RewriteCond %{HTTP_HOST} !^(www|support|wiki|projects)\. [NC]
RewriteRule ^(.*)$ htt[b]ps:[/b]//www.%{HTTP_HOST}/en-US/$1 [R=301,L]

This code, although much shorter, is a complete replacement for everything that you posted. Be sure to delete your browser cache before testing any new server-side code.

Jim

[edit] Correction as noted below. [/edit]

[edited by: jdMorgan at 6:45 pm (utc) on Mar 3, 2010]

Excalibur

5:36 pm on Mar 2, 2010 (gmt 0)

10+ Year Member



Jim,

Thank you for everything. This shortened code works like charm, except for the part where it changes "en-us" to "en-US" But you know something, you're right. Maybe I should follow the language setting, which is "en-us" all lower case.

Thanks for everything, and I shall take the advise!

jdMorgan

6:46 pm on Mar 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On review, I found that I had left your "QUERY_STRING" rewritecond in place, instead of deleting it as I intended. I corrected the code above so as to prevent others from copying bad code...

Jim

Excalibur

8:28 pm on Mar 3, 2010 (gmt 0)

10+ Year Member



Jim,

Thanks. But this still won't change "en-us" to "en-US", I'm sorry.

g1smd

8:59 pm on Mar 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Code within .htaccess cannot 'change' URLs. To change a URL, you have to edit the link the user clicks and change it there.

Code in .htaccess can redirect the user to the correct URL when a user clicks an incorrect link, but at the end of the day, the URL in the link itself needs to be corrected.

Excalibur

9:09 pm on Mar 3, 2010 (gmt 0)

10+ Year Member



g1smd,

I don't mean to be rude, but you SERIOUSLY think I counted on .htaccess to change the links? OH COME ON.

What I mean is when I navigate to www.site.com/en-us/blah.php i'm not redirected to www.site.com/en-US/blah.php

g1smd

10:06 pm on Mar 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you read back through the last decade of postings in this forum, you'll see many thousands of examples of people who seriously thought all kinds of things, many of them crazy or impossible.

There are also very many times where people failed to explain what they were thinking, or used confusing terminology to try to explain it.

I have NO idea what you are thinking. It's therefore prudent to make sure that you're on the right path.

Excalibur

10:40 pm on Mar 3, 2010 (gmt 0)

10+ Year Member



Oh, I get your reason, and you're right. I think you know now :)
I'm trying to get Apache to change the url to site.com/en-US/ if it's site.com/en-us/ ... I think that makes sense, I hope

jdMorgan

8:25 pm on Mar 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, you're trying to get Apache to redirect requests for example.com/en-us/blah to example.com/en-US/blah.
Mod_rewrite cannot change links, and it cannot change the URLs in those links, so my description is much more accurate than yours. And the reason I state that is because this problem can be much more easily understood if the correct description is used.

Flush you browser cache and re-test. If that doesn't help, then disable MultiViews and AcceptPathInfo if they are enabled; Either of those can interfere with mod_rewrite.

Jim

Excalibur

8:55 pm on Mar 4, 2010 (gmt 0)

10+ Year Member



Agreed,

I don't have either enabled, and this is my complete .htaccess

RewriteEngine On
IndexIgnore *
Options +FollowSymLinks -Multiviews -Indexes
############################################################################################
RewriteRule ^en-us(/.*)?$ http://%{HTTP_HOST}/en-US/$1 [L,R=301]
############################################################################################
RewriteCond $1 !^(en-US|images|fonts) [NC]
RewriteRule ^(.*)$ http://%{HTTP_HOST}/en-US/$1 [L,R=301]
############################################################################################
RewriteRule ^(images|fonts)(/([a-z0-9/._\-]*))?$ /en-US/templates/digitalfyre2010/$1/$3 [NC,L]

So I don't know anymore...

[edited by: jdMorgan at 5:00 pm (utc) on Mar 5, 2010]
[edit reason] Removed domain name in code comments [/edit]

jdMorgan

5:57 pm on Mar 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Based on the canonicalization problems implied by your code, I'd suggest giving this a try in your example.com/.htaccess file:

IndexIgnore *
Options +FollowSymLinks -MultiViews -Indexes
RewriteEngine On
#
# Externally redirect to canonicalize casing and trailing slashes on /images subirectory
RewriteCond $1 !^images/$
RewriteRule ^(images/?)$ http://%{HTTP_HOST}/images/ [NC,R=301,L]
#
# Externally redirect to canonicalize casing and trailing slashes on /fonts subdirectory
RewriteCond $1 !fonts/$
RewriteRule ^(fonts/?)$ http://%{HTTP_HOST}/fonts/ [NC,R=301,L]
#
# Externally redirect to canonicalize casing on /en-US subdirectory requests
RewriteCond $1 !^en-US/
RewriteRule ^(en-us(/(.*))?)$ http://%{HTTP_HOST}/en-US/$3 [NC,R=301,L]
#
# Internally rewrite all requests except for /images and /fonts
# subdirectory requests to /en-US subdirectory (unless already done)
RewriteCond $1 !^(en-US|images|fonts)/ [NC]
RewriteRule ^(.*)$ http://%{HTTP_HOST}/en-US/$1 [R=301,L]
#
# Internally rewrite /images and /fonts subdirectory
# requests to /en-US/templates/company-name/ subdirectory
RewriteRule ^((images|fonts)/[a-z0-9/._\-]*)?$ /en-US/templates/example/$1 [NC,L]

If that doesn't work, it may be time to step back and try some code that's trivially simple, just to see if mod_rewrite is actually working properly on your server. If that's successful, then you can proceed forward from that point, adding and testing one rule at a time.

As before, delete your browser cache before testing new code...

Jim