Forum Moderators: phranque

Message Too Old, No Replies

.htaccess lowercase url

         

daktau

6:08 pm on Oct 11, 2009 (gmt 0)

10+ Year Member



Hi,

I have the following to rewrite...
http://www.example.com/index.php?cat=Giochi Di Corse

this works and now comes out as ...
http://www.example.com/Giochi-Di-Corse.html

I have replaced the spaces with hyphens and rewritten it to the above.

I need the Giochi-Di-Corse.html to be lowercase. This I can do but I have other urls which need to maintain their letter casing.

Is there anyway of making a variable eg $1 lowercase so it can be specific the the rewrite rule?

Is there a way to only lowercase the urls if they fit a certain pattern?

Many thanks,
George

[edited by: jdMorgan at 12:57 am (utc) on Nov. 4, 2009]
[edit reason] example.com [/edit]

jdMorgan

6:30 pm on Oct 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Case-conversion in .htaccess is horribly-inefficient and should be avoided if at all possible. It's one of the functions that you can actually *see* slowing down a server.

However, from your description of the problem, it sounds like this can (and must) be fixed entirely within your script(s) because .htaccess can't change URLs published on your pages, and that is where URLs are defined.

It's not clear how far your case-variation problem goes. If the only problem is that "Giochi-Di-Corse" should always be "giochi-di-corse", then look for exactly "Giochi-Di-Corse" and change it to "giochi-di-corse" before putting the link on your page. This should be trivial in PHP or PERL.

It will likely help you to think about this problem in terms of "what is the URL?" and "what is the filepath?" Realizing that these two things are not equivalent but are only associated by the action of the server can be very helpful to understanding problems like this. When a URL is requested from your server, mod_rewrite can be used to change the default URL-to-filename translation to something else, but it cannot change the URL that has already been requested from your server -- The only place that the URL can be changed is in the HTML page on your site (whether that page is static or is generated by a script).

Jim

daktau

8:42 pm on Oct 14, 2009 (gmt 0)

10+ Year Member



Hi,
I have done what you suggested and everything seems to work now.
Today however, I got a load of internal server error 500's. This does not always seem the case. Does a lot of traffic cause intermittent 500 errors?

Here is my .htaccess
Can you spot any bad configs?

RewriteEngine on
Options +FollowSymlinks
RewriteBase /

RewriteCond %{REQUEST_URI} "urlNoQuery" [NC]
RewriteRule (.*) - [F,L]

RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,NC]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/

RewriteRule ^index\.php$ http://www.example.com/ [R=301,NC]

################################
########Change spaces to hyphens
################################
RewriteRule ([^\ ]+)\ ([^\ ]+)\ ([^\ ]+)\ (.+) http://www.example.com/$1-$2-$3-$4 [R=301,L]
RewriteRule ([^\ ]+)\ ([^\ ]+)\ (.+) http://www.example.com/$1-$2-$3 [R=301,L]
RewriteRule ([^\ ]+)\ (.+) http://www.example.com/$1-$2 [R=301,L]

################################
#######Get Game Data and Rewrite
################################
RewriteRule ^([0-9]+)(.+)/?$ flash-arcade-game.php?gameid=$1&gamename=$2
RewriteRule ^giochi-([0-9]+)(.+)/?$ flash-arcade-game.php?gameid=$1&gamename=$2
RewriteRule ^giochi/([0-9]+)(.+)\.html/?$ flash-arcade-game.php?gameid=$1&gamename=$2

##################################
##################### Custom pages
##################################
#RewriteCond %{THE_REQUEST} ^giochi/(.+)/?$
RewriteRule ^giochi/(.+)/?$ custom_page.php?url=$1 [L]
##################################

##################################
####################Category Pages
##################################
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?cat=([^\ ]+)\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/%1\.html? [NE,R=301,L]
##################################
############ Category Page Numbers
##################################
RewriteRule ^(.+),([0-9]+)\.html$ index.php?cat=$1&blockpage=$2 [NC]
##################################

RewriteRule ^Giochi-di-Corse\.html$ giochi-di-corse\.html [R=301,L]
RewriteRule ^Azione\.html$ azione\.html [R=301,L]
RewriteRule ^Calcio\.html$ calcio\.html [R=301,L]
RewriteRule ^Classici\.html$ classici\.html [R=301,L]
RewriteRule ^Giochi-Bar\.html$ giochi-bar\.html [R=301,L]
RewriteRule ^Giochi-di-Cucina\.html$ giochi-di-cucina\.html [R=301,L]
RewriteRule ^Giochi-Gratis\.html$ giochi-gratis\.html [R=301,L]
RewriteRule ^Giochi-per-Ragazze\.html$ giochi-per-ragazze\.html [R=301,L]
RewriteRule ^Platform\.html$ platform\.html [R=301,L]
RewriteRule ^Sparatutto\.html$ sparatutto\.html [R=301,L]
RewriteRule ^Sport\.html$ sport\.html [R=301,L]
RewriteRule ^Abilita\.html$ abilita\.html [R=301,L]

RewriteRule ^(.+),([0-9]+)\.html$ index.php?cat=$1&blockpage=$2 [NC]
RewriteRule ^(.+)\.html$ index.php?cat=$1 [NC]

##################################

Options +Indexes +MultiViews +FollowSymlinks
IndexIgnore *

#php_flag display_errors on

Many thanks,
George

[edited by: jdMorgan at 12:58 am (utc) on Nov. 4, 2009]
[edit reason] example.com [/edit]

jdMorgan

10:21 pm on Oct 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Too much to review, although there are some 'problems' -- but not fatal errors.

What do you see in your server error log file when you get a 500-Server Error?

Jim

daktau

10:45 am on Oct 15, 2009 (gmt 0)

10+ Year Member



I just seem to see lots of file not found errors. I only see the last 500 and they are entirely this error. Does this mean that the .htaccess is failing in a lot of cases where in actual fact it should be rewriting to a url that the website can interpret?

jdMorgan

1:29 pm on Oct 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Only you can determine that. You have to look at the URLs and filepaths reported in the server access and error logs, and determine whether those URLs were 'good' and whether they were resolved to the correct filepath by your rewriterules. The URL is visible in the access log, and the filepath is visible in the error log.

Knowing the answer to this question is necessary to 'fix' any problems with your code as well.

A common cause for getting a 500-Server Error when a 404-Not Found is expected is that your server has declared a custom 404 error document, but that custom error document does not exist. This should be quite obvious if you check the log files as recommended.

Jim

daktau

7:15 pm on Oct 16, 2009 (gmt 0)

10+ Year Member



Can you please tell me if I am doing this correctly?

I have legacy links such as this:-
http://www.example.com/index.php?cat=Giochi Di Corse

and am changing the query string into the name of the category page like this:-

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?cat=([^.]+)\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/%1\.html? [R=301,L]

Then to prevent duplicates I am attempting to replace all spaces with hyphens and change it into lower case. I do this in 2 ways, rules to replace spaces for hyphens and then specific rules for each category to change it to lower case.

I am having difficulty with the following...
RewriteRule ^Giochi\ di\ Corse\.html$ giochi-di-corse\.html [R=301,L] #does not work
RewriteRule ^Giochi+di+Corse\.html$ giochi-di-corse\.html [R=301,L] #does not work

I think what is happening is that the rule which redirects teh querystring into a filname is inserting %2520 (which is the % converted into %25) so the rules never match it. Is there a way to acheive what I am trying to do?

thanks a lot,
George

jdMorgan

9:50 pm on Oct 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You may have to examine THE_REQUEST to catch the percent-encoded (or multiply-encoded) spaces:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /Giochi\%(25)*20di\%(25)*20Regex\.html\ HTTP/
RewriteRule ^Giochi[^.]+\.html$ http://example.com/giochi-di-regex\.html [R=301,L]

The RewriteRule pattern, though appearing somewhat redundant, keeps the RewriteCond from being evaluated unless it's likely to be needed, but without slowing things down too much, or making too many assumptions about why the RewriteRule patterns in your rules above did not match.

The RewriteCond pattern will match when the spaces are singly-encoded as "%20" doubly-encoded as "%2520", or multiply-encoded as "%25252520", for example.

Jim

daktau

6:21 pm on Nov 2, 2009 (gmt 0)

10+ Year Member



Thanks a lot for this help it seems to have sorted out the SEO needs for the site for the time being.

The site is now however producing a lot of 500 internal server errors. After checking the error logs, the huge majority of these errors follow this pattern...

[Mon Nov 02 18:55:29 2009] [error] [client 95.233.69.184] File does not exist: /home/impester/public_html/meshes, referer: http://www.example.com/resources/Ghoul-Racers.swf

[Mon Nov 02 18:55:28 2009] [error] [client 79.19.85.89] File does not exist: /home/impester/public_html/resources/img/.png, referer: http://www.example.com/giochi/22326-Streetfighter-2-CE.html
[Mon Nov 02 18:55:27 2009] [error] [client 93.45.204.220] File does not exist: /home/impester/public_html/resources/img/.png, referer: http://www.example.com/giochi-di-cucina.html

[Mon Nov 02 18:55:27 2009] [error] [client 151.60.114.166] File does not exist: /home/impester/public_html/resources/img/.png, referer: http://www.example.com/abilita,4.html

Is there an explanation for this? These urls seem to work ok most of the time and I cannot understand the /.png part of the error.

Could you please offer some explanation as to what is going on with the server?

Thanks very much,
George.

jdMorgan

8:01 pm on Nov 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A more interesting and critical question is "Why do you get a 500-Server Error whenever a 404-Not Found condition exists?"

What happens if you just type in one of those bad .png image links and look at the server response using the Firefox "Live HTTP Headers" add-on? Do you get a 404 header or a 500 header? If you're getting a 500 response, then you've got a serious problem -- one that may cost you search ranking (and revenue).

As for why those links might be bad, perhaps the script that builds those links on your pages is occasionally failing for some reason...

Jim

daktau

10:53 pm on Nov 2, 2009 (gmt 0)

10+ Year Member



What I'm finding troubling is that the links which produce the 500 errors if refreshed then work fine. The same is true for some of the 404's....they do actually work.

I have tracked down a lot of the 404 errors due to some script that wasn't dealing with the legacy links that google still has indexed.

So is there any reason why the server should be so choosy as to when it throws the error? Could the amount of traffic or work load the server is under be a reason?

cheers,
George

jdMorgan

11:11 pm on Nov 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As for why those links might be bad, perhaps the script that builds those links on your pages is occasionally failing for some reason...

That is really all I can say... It is probably a scripting error -- either a logic error or something that causes the script to consume too much memory and therefore fail.

When you get a bad link on a page, do a "View-Page-Source" in your browser and see if the nature of the error is clarified by what you see in the defective page source.

Jim

TheMadScientist

9:17 am on Nov 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'd personally probably use PHP for the whole shebang... (This is just an example, because your file's a bit longer than I can spend enough time on to be overly accurate here, so I'll let you work out all the details.)

# If there's a capital or space in the location
RewriteRule [A-Z\ ] /redirect.php [L]

# If the Query_String is cat= anything...
RewriteCond %[QUERY_STRING} ^cat=
RewriteRule .? /redirect.php [L]

# redirect.php below.

<?php
$oldURL=rawurldecode($_SERVER['REQUEST_URI']);

/* Make sure we got it decoded with a single decode above, by checking for a % still remaining in the $oldURL. Below, as well as the $_GET, could be changed to a while() for decoding depending on your needs and level of encodedness (can I say encodedness?) */

if(strpos($oldURL,"%")) { $oldURL=rawurldecode($oldURL); }

$newURL=strtolower(str_replace(" ","-",$oldURL));

if($newURL=='index.php' && isset($_GET['cat']) && !empty($_GET['cat'])) {
$newURL="/".strtolower(str_replace(" ","-",rawurldecode($_GET['cat']))).".html";
}

header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.example.com".rawurlencode($newURL));

# Or something to the effect of the above...
?>

daktau

5:25 pm on Nov 3, 2009 (gmt 0)

10+ Year Member



I appears that the 500 errors were due to large amounts of traffic causing the apache server to produce the errors.

It seems the overhead caused by the .htaccess was causing the server to error.

thanks a lot,
George

jdMorgan

6:07 pm on Nov 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Look into moving the code to a server config file, where it will be processed much more efficiently.
Look into using a RewriteMap to call the system lowercasing utility 'tolower'.
Look into modifying the script itself to accept, correct, and redirect URLs with spacing and casing errors.
Look into optimizing the rules themselves, for example, the two rules

RewriteRule ^Giochi\ di\ Corse\.html$ giochi-di-corse\.html [R=301,L] #does not work
RewriteRule ^Giochi+di+Corse\.html$ giochi-di-corse\.html [R=301,L] #does not work

can become one rule

RewriteRule ^Giochi[+\ ]di[+\ ]Corse\.html$ http://www.example.com/giochi-di-corse.html [R=301,L]

(and note that a regular-expressions pattern error is also corrected, since an un-escaped "+" outside of a regex [group] is taken as a quantifier meaning "match one or more of the preceding character, character group, or parenthesized sub-pattern" and the the protocol and domain are specified.

Also, if you have problems with characters being double-encoded (e.g. %20 being re-encoded to %2520), then see the [NE] flag for RewriteRule. And a regex pattern to 'catch' both encoded and multiply-encoded spaces would be "\%(25)*20"

Jim