Forum Moderators: phranque

Message Too Old, No Replies

Can't Rewrite to .html, only .htm

         

robzilla

2:45 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm trying to rewrite paths like "/folder/blue/index.html" to "blue.html", "/folder/blue/index2.html" to "blue2.html", etc, with the following code in my .htaccess file:

Options +FollowSymLinks
RewriteEngine on
RewriteRule blue(.*)\.htm$ /folder/blue/index$1.html

Using that, I can only access the file as blue.htm, not as blue.html.

I've tried..

RewriteRule blue(.*)\.html$ /folder/blue/index$1.html

..and..
RewriteRule blue(.*)\.html /folder/blue/index$1.html

..but they were merely guesses, and didn't work (resulted in an internal server error).

Now, it would be even better if I could rewrite a "blue" folder (e.g. /folder/blue/index.html to blue.html) and, for example, a "red" folder (e.g. /folder/red/index.html to red.html) with one piece of code, but I haven't been able to figure that out.

Can anyone tell me how to fix this?

jdMorgan

2:56 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's nothing fatally wrong with your code.

What was in your server error log when you got the internal server error? -- That is the key to your problem.

Jim

robzilla

3:20 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's what it says:
[Tue May 30 09:34:14 2006] [error] [client xx.xx.xx.xx] mod_rewrite: maximum number of internal redirects reached. Assuming configuration error. Use 'RewriteOptions MaxRedirects' to increase the limit if neccessary.
[Tue May 30 09:34:14 2006] [error] [client xx.xx.xx.xx] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.

jdMorgan

3:37 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The logic of your rule has created a loop. Since the output URL will match the required pattern, the rewriting will continue until the maximum number of redirects is reached.

You should consider making the pattern more specific, or adding an exclusion so that /index.*.html is not rewritten. How you do this depends on where the code is located, and what you want its scope to be; That is, should it apply only to /folder, or glabally across your whole site.

Jim

robzilla

4:29 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All right, this seems to work:
Options +FollowSymLinks
RewriteEngine on
RewriteRule ^blue(.*)\.html$ /folder/blue/index$1.html

Now, as I mentioned earlier, it would be more convenient if I could use one piece of code for files in both the "blue" and the "red" folder, instead of placing the above code in the .htaccess file twice. More convenient for the CPU, that is.

I tried..

RewriteRule ^(.*)\(.*)\.html$ /folder/$1/index$2.html

..and..
RewriteRule ^(.*)(.*)\.html$ /folder/$1/index$2.html

..but once again that was just silly pray-it-works guessing, which it did not.

jdMorgan

4:54 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You really should consider not guessing, and getting after the documentation. The .htaccess file is a server configuration file, and guessing about what to put in it is inadvisable, to say the least.

Any URL on the right side of a rule that can match the pattern on the left side of the rule is going to create a loop.

You can stop a loop either by making sure the rewritten URL won't match the pattern by modifying either the rewritten URL or the pattern, or by specifically excluding the rewritten URL from being re-rewrriten by using an exclusionary RewriteCond.

Problems can be minimized if you think in terms of what you don't want to rewrite in addition to thinking about what you do want to rewrite. Being specific and thorough in designing a rule will go a long way toward preventing problems --potentially disastrous problems-- later. Bad rewrites can affect your visitors and your search engine placement in rather dramatic ways.

Where is your code, in your home directory or in /folder? If it's in your home directory, then the following might be better:


RewriteCond %{REQUEST_URI} !^/folder/[^/]+/index[^.]+\.html$
RewriteRule ^folder/(red¦blue)/([^.]+)\.html$ /folder/$1/index$2.html [L]

Replace the broken pipe "¦" character above with a solid pipe character before use; Posting on this forum modifies the pipe character.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

Jim

robzilla

8:28 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the help so far, Jim. I agree I really should research something before I implement it. I usually do. And rest assured I'm not doing any htaccess guesswork on a live website :-P

Because I needed to have the files in the root of the domain, I altered your code slightly:

RewriteCond %{REQUEST_URI} !^/folder/[^/]+/index[^.]+\.html$
RewriteRule ^([^.]+)([^.]+)\.html$ /folder/$1/index$2.html [L]

Also, this way I won't have to define all the folder names such as "red" and "blue". On the actual website, there are quite a few pages that have to be rewritten, and I'd rather not have that large a htaccess file.

It works, except for when no number comes after index in index.html. So, blue2.html works, but blue.html does not. Using the code you gave me, a URL would be rewritten to, for example, /folder/blue/2.html for /folder/blue/index2.html. For index.html however, there is no value for $2, and you can't have /folder/blue/.html. I've browsed WebmasterWorld and the Apache mod_rewrite guides for a solution, but was not able to come up with anything.

jdMorgan

10:54 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You made no mention of digits in your post, and it's not clear what the requested and rewritten URL map looks like, but your regex wasn't quite right. Hint [^.]+ means, "one or more characters not equal to a period/dot/full stop." Note that within [], most regex special characters including "." do not need to be escaped. Exceptions would be space, "^", "]" and "-".

RewriteCond %{REQUEST_URI} !^/folder/[^/]+/index[^.]+\.html$
RewriteRule ^([^/]+)/([^.]+)\.html$ /folder/$1/index$2.html [L]

Look at the regular-expressions tutorial cited in our Charter, and the "info box" on regular expressions included in the mod_rewrite documentation.

Jim

robzilla

11:54 pm on May 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I did mention it, but I'm afraid I may not have given it too much emphasis.
I'm trying to rewrite paths like "/folder/blue/index.html" to "blue.html", "/folder/blue/index2.html" to "blue2.html", etc

Thanks for the help. I'll hit the mod_rewrite books and post when I found the solution.

jdMorgan

2:25 am on May 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note the correction in my previous post...

Jim

robzilla

9:27 am on May 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In retrospect, I'm afraid I did not explain this properly in my first post. Let me try again to explain what I wanted to do.

In, for example, "/folder/red", I have several index files: /folder/red/index.html, /folder/red/index2.html, /folder/red/index3.html, /folder/red/index4.html, etc.

I wanted to rewrite those URLs to /red.html, /red2.html, /red3.html and /red4.html respectively, in the root of the domain (e.g. mysite.com/red2.html).

This morning I managed to do that with:

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/freestuff/[^/]+/index[^.]+\.html$
RewriteRule ^([a-z]+)([0-9]*)\.html$ /freestuff/$1/index$2.html [L]

I had to use an asterisk for the number after "index" in, for example, index9.html, because when I used a question mark earlier, that number could not preceed 9 (because quantifier "?" stands for "0 or 1 of the preceding text"), and the plus ("+") wasn't appropriate either, because there isn't always a number in index.html.

Again, thanks for pointing me in the right direction. I desperately need a break now.

jdMorgan

2:56 pm on May 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I desperately need a break now.

;) Understandable. But now you're armed to take on anything else that might come up.

It seems that a small tweak may be needed to the exclusionary RewriteCond, to make it agree with your destination URLs:


Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/freestuff/[^/]+/inde[b]x[0-9]*[/b]\.html$
RewriteRule ^([a-z]+)([0-9]*)\.html$ /freestuff/$1/index$2.html [L]

Glad you got it working!

Jim

robzilla

3:38 pm on May 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ah, I forgot about that - thank you very much.

Hopefully someone who is looking for a similar solution will be able to make sense of this topic ;-)

robzilla

3:56 pm on May 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Argh, now pages that aren't rewritten can no longer be accessed. I can't even access mysite.com/index.html. Nothing to be found in the error log. It's obviously true what they say about mod_rewrite: it's voodoo. Brr.

<added> Fixed by removing the exclamation mark from RewriteCond. </added>

jdMorgan

5:08 pm on May 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That doesn't sound like a good fix -- It essentially opens you up to looping again, so logically, it's wrong.
You need some way to tell the not-yet-rewritten URLs from the rewritten URLs, so the negative ("!" means NOT) pattern in the RewriteCond should match previously-rewritten URLs, while the positive pattern in the RewriteRule should match the URLs that need to be rewritten.

Jim

robzilla

10:13 am on Jun 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You're right, it wasn't a good fix. More of a desperate one. I've calmed down and am ready to take this bugger on again.

I've adjusted the site's structure a bit, actually improved it, and now there's only one file that physically remains in the root of the domain, and that's index.html, which I've now excluded from this specific RewriteRule:

RewriteCond %{REQUEST_URI} !^/folder/[^/]+/index[0-9]*\.html$
RewriteCond %{REQUEST_URI} !^/index.html$
RewriteRule ^([a-z]+)([0-9]*)\.html$ /folder/$1/index$2.html [L]

The last hurdle I'm facing now is that I not only need index files rewritten in, for example, /folder/blue/, but also the other files in that folder. To illustrate, I've now rewritten /folder/blue/index.html (and all other index pages, such as index2.html, index3.html, etc.) to /blue.html (and blue2.html, blue3.html respectively), but I still need to rewrite, for example, /folder/blue/widgets.html to /widgets.html. The problem here is that because I've already rewritten the index pages in that directory, a second RewriteRule to rewrite the other pages in that directory would overrule that RewriteRule because the index pages also fall under that same directory. A RewriteCond to exclude all those rewritten index pages would get pretty long, and would cause extra strain on the CPU, but perhaps that is the only solution.

jdMorgan

2:50 pm on Jun 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I still need to rewrite, for example, /folder/blue/widgets.html to /widgets.html.

What happens to "blue" in this example?

And I thought you wanted all files to be in /folder -- so why are you rewriting to /widgets.html?

In case it's not clear, the rule
RewriteRule ^abc$ /def [L]
rewrites *from* old URL-path '/abc' *to* new URL-path '/def', so maybe it's just a 'direction of the rewrite' terminology issue.

You won't need any fancy or difficult code to rewrite files in subdirectories. The trick is all in the regular expressions used to make the RewriteRule pattern. I suspect you can do your additional rewrites by adding only one or a few more rules to your existing code, and you won't need to exclude anything more than /index.html, /robots.txt, /w3c/p3p.xml, and perhaps a few more 'standard location' static files.

Do yourself a favor, and sit down and make a visual 'map' of the rewrites you need to do before starting to code. I find the following general 'style' to be useful:

Old URL-path ........... New URL-path 
/index.html ............ No change
/widgets/blue .......... /folder/widget.php?color=blue
/widgets/<any_color> ... /folder/widget.php?color=<any_color>
/foo.html .............. /bar.html

Arrange the URLs by path and by required type of 'transformation' so that you can do any many rewrites as possible with just a few rules. Be aware of the 'degrees of freedom' of each parameter -- i.e. length of parameter, characters used in parameter ([a-z] or [0-9_] for example), and group common types of rewrites for best results. Then after all the rewrites are sorted into 'classes' you'll be able to code them up much more easily.

Jim

robzilla

3:47 pm on Jun 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> What happens to "blue" in this example?

Nothing. "blue", a category name, is only used when rewriting the paths of the index files to the root (e.g. /widgets/blue/index.html becomes /blue.html). All pages in the "blue" folder other than the index pages also have to be rewritten to the root of the domain, but since they all have unique filenames there's no need to alter their filenames, they only have to be rewritten to the root of the domain, basically eliminating the directories in the path (/widgets/blue/azure.html becomes /azure.html). Unlike the index pages, where /widgets/blue/index.html would be in conflict with /widgets/red/index.html if I were to rewrite both of them to /index.html, so I rewrite them to /blue.html and /red.html instead.

To illustrate with a map:

Old URL-path .................................................... New URL-path 
/index.html ..................................................... No change
/widgets/blue/index.html ........................................ /blue.html
/widgets/blue/index2.html ....................................... /blue2.html
/widgets/<anycolor>/index<anynumberornonumber>.html ............. /<anycolor><anynumberornonumber>.html
/widgets/blue/azure.html ........................................ /azure.html
/widgets/blue/sapphire.html ..................................... /sapphire.html
/widgets/<anycolor>/<anypageexceptindexpages>.html .............. /<anypageexceptindexpages>.html

In case it's not clear, the rule
RewriteRule ^abc$ /def [L]
rewrites *from* old URL-path '/abc' *to* new URL-path '/def', so maybe it's just a 'direction of the rewrite' terminology issue.

Isn't it the other way around when working in .htaccess?

RewriteRule ^([a-z]+)([0-9]*)\.html$ /widgets/$1/index$2.html [L]

The latter (/widgets/blue/index4.html for example) would be (and currently is being) rewritten to the former (/blue4.html in this example).

jdMorgan

10:57 pm on Jun 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, sorry. URLs are rewritten after a request arrives at your server and before any content is served or any script are invoked. You rewrite the client-requested URL either to a different filepath or redirect it to a new URL, depending on the rule syntax.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

Jim