Forum Moderators: phranque

Message Too Old, No Replies

redirecting to SEF URLs

trying to get .htaccess to help me out

         

baze22

3:23 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



Without really thinking things through, I recently modified my forum to use more search engine friendly urls. Given that one search engine has a fair number of threads indexed, I need to redirect the old URL to the new. I'm having problems accomplishing that. In my clip below, the last line works for converting the SEF urls for the program to use, is the 2 before it that I am trying to get to redirect those old style (showthread.php?t=999) to the new style (t999.html). To me (regex,mod_rewrite novice that I am) it looks like it should work. But obviously I've got something wrong. If anybody has any suggestions, I'd sure appreciate it. BTW, .htaccess and showthread.php are in /forum directory if that matters.

RewriteEngine On
Options +FollowSymLinks
RewriteCond %{REQUEST_URI} showthread.php?t=([0-9]+)
RewriteRule showthread.php?t=([0-9]+)$ t$1.html [L,R=permanent]
RewriteRule ^t([0-9]+).html$ showthread.php?t=$1 [L]

thanks,

baze

valder

4:32 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



Hi,

yep, a couple of familiar errors there. :)

I'm not an expert myself yet, but I believe I could give you a few starters.

First of all, I don't believe R=permanent is valid syntax, you must use R=status code [w3.org], which in this case is 301.

Secondly, when you use parantheses, it's in most cases because you want to capture the contents to a variable. The first variable can later be used as either %1 or $1 depending on whether or not it is in the context of the following or the same line respectively.

The variable in this line,

RewriteCond %{REQUEST_URI} showthread.php?t=([0-9]+)

could be referenced by the following line (but not further following lines) as %1, like this:
RewriteCond %1 123

It seems you got the use of $1 right. :)
My point is that t=([0-9]+) is unnecessary in this case, and could be rewritten to t=[0-9]+ (but it doesn't matter really)

Further, regex has some special characters, like . and ? and several others that has a special meaning to the regex engine.

. means any one character, so .* would mean any one character zero or infinite times.
? means maybe, so abc? means "maybe there's a "c" in there or maybe not".
+ means 1 or more.

If you want to use any of these characters as a value, you should escape them, which means putting a back-slash in front of it. Like this "\." or "\?".

In your case however, it may be as simple as replacing permanent with 301, but I can't say for sure.

If you want to learn more regex, there are several good tutorials on it. I have collected some at my own web site, see my profile's website if you're interested (then click regex).

Eivind

valder

4:37 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



Just noticed another thing that is not crucial to your code, but that could be useful. If you use the flag NC in RewriteCond, it means ignore case, which is often better.

For instance, it wouldn't matter if people typed showthread.php or ShowThread.php etc.

<added>
Oops, my instructions could easily be misunderstood.
I meant that the parantheses in the line

RewriteCond %{REQUEST_URI} showthread.php?t=([0-9]+)

was not necessary, because you weren't using %1 in the following line.

Obviously, the other case of t=([0-9]+) was necessary.
</added>

jdMorgan

5:35 pm on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



baze,

You didn't say specifically what the problem is, but a common result of doing that in .htaccess is an "infinite" loop, since one rule rewrites from A to B and the next rule rewrites from B to A. This is because from a practical standpoint, .htaccess is recursive -- and this is true despite the use of the [L] flag. It must be remembered that .htaccess works in a per-directory context. That means that any time a URL is changed, the server must re-run httpd.conf to check for URL-related restrictions and rewrites at that level, and then it will re-run .htaccess. So, for practical purposes, .htaccess is run and re-run until there are no more URL changes as a result.

If my guess is correct, you should be seeing a 500-Server Error, and your server error log should contain a message stating that the maximum redirect limit has been reached. During the time that the server is repeatedly redirecting, you should see your browser "trying to do something" but failing to load the page.

The trick to doing what you apparently want to do is to use the right server variables. While the URL tested by {REQUEST_URI} and by RewriteRule is updated on each pass through .htaccess, the value in {THE_REQUEST} is not; {THE_REQUEST} always contains the original request line sent by the client (browser, search engine robot, etc.). For example:

GET /forum/showthread.php?t=2005 HTTP/1.1

So that fact can be used to avoid the rewriting loop:


Options +FollowSymLinks
RewriteEngine on
#
# If the client requests a dynamic URL, externally redirect the client to a static URL
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /forum/showthread\.php?t=([0-9]+)\ HTTP/
RewriteRule ^showthread\.php$ http://www.example.com/forum/t%1.html [R=301,L]
#
# Internally rewrite static URLs to the page display script
RewriteRule ^t([0-9]+)\.html$ /forum/showthread.php?t=$1 [L]

The first rule will detect and externally redirect dynamic URLs, but only if those URLs were directly-requested by the client. It will ignore dynamic URLs created as the result of any previously-invoked rewrite. (When I say previously-invoked, I mean within the context of the current HTTP request. An external redirect ends the current HTTP request and causes the client to start a new request. Note the use of the terminology "redirect" and "rewrite" and the corresponding differences in the RewriteRule syntax.)

Note that the RewriteRule substitution now uses "%1" to back-reference the thread number matched in the RewriteCond. Note also for future reference that you cannot directly test a query string in a RewriteRule; Query strings are data attached to a URL, and not part of the URL. Therefore, query strings are not visible to RewriteRule or {REQUEST_URI}, so a RewriteCond must be used to test the query string in either {QUERY_STRING} or {THE_REQUEST}.

The second rule will internally rewrite any request for a static URL to a dynamic one, invoking your script.

This code has been adjusted to run in your /forum subdirectory as before, but to use server-rooted paths. Special characters have been escaped as required.

Because of the way that this code works, the order of the two rules doesn't matter.

This may not be exactly what you need, but if I understood your problem, it should be close.

Jim

baze22

5:44 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



I appreciate your responses. I have been using the R=permanent in the other lines that are redirecting and it works, but I have gone ahead and changed it here. I know how to debug my programming stuff but this is still new to me. I've been playing around trying to find my problem by making changes like this:
RewriteEngine On
Options +FollowSymLinks
#RewriteCond %{REQUEST_URI} showthread.php?t=[0-9]+
RewriteRule showthread\.php /gs/forum/justphp.html?t=$1 [L,R=301]

trying to trim things out until I can zero in on my problem. Using my log file the lines above give me this:

127.0.0.1 - - [24/Jan/2005:11:33:53 -0600] "GET /gs/forum/showthread.php?t=6 HTTP/1.1" 301 323
127.0.0.1 - - [24/Jan/2005:11:33:53 -0600] "GET /gs/forum/justphp.html?t= HTTP/1.1" 404 293

which is what I want. When I make it

RewriteEngine On
Options +FollowSymLinks
#RewriteCond %{REQUEST_URI} showthread.php?t=[0-9]+
RewriteRule showthread\.php\?t /gs/forum/justphp.html?t=$1 [L,R=301]
or // both of these lines aren't in htaccess
RewriteRule showthread\.php\?t\=([0-9]+) /gs/forum/completetest.html?t=$1 [L,R=301]

I just get

127.0.0.1 - - [24/Jan/2005:11:36:46 -0600] "GET /gs/forum/showthread.php?t=6 HTTP/1.1" 200 33463

I've changed the permanent and escaped the characters that I think need escaping. Can you spot anything else I'm missing here?

thanks again,

baze

edit: jd, I posted this while you were posting above. Just wanted to let you know before you responded to this. I will go over your post now. thanks.

baze22

6:16 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



Although my problem isn't infinite loop related, I did get some experience with those possibilites in my testing. :)

My problem is that it isn't redirecting. it just doesn't seem to be matching the condition with the '?t=([0-9]+)'. Since I'm on my testing machine the URL is:

[localhost...]

(Yes I realized that the dir is /gs/forum and I made those changes. It is /forum on production server)

I plugged yours in and still not getting any redirection from that line. Rewrite is working because this:

[localhost...]

rewrites fine.

I do appreciate the time you've taken and will be implementing the infinite loop prevention in my htaccess mods where applicable. I may not have it working yet, but I'm learning a lot. ;)

thanks,

baze

edit: this is what I'm trying now:

RewriteEngine On
Options +FollowSymLinks
# If the client requests a dynamic URL, externally redirect the client to a static URL
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /gs/forum/showthread\.php?t=([0-9]+)\ HTTP/
RewriteRule ^showthread\.php$ http://localhost/gs/forum/t%1.html [R=301,L]
#
# Internally rewrite static URLs to the page display script
RewriteRule ^t([0-9]+)\.html$ /gs/forum/showthread.php?t=$1 [L]

jdMorgan

6:28 pm on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This may be it... I forgot to escape the literal "?" in the RewriteCond:

Options +FollowSymLinks
RewriteEngine on
#
# If the client requests a dynamic URL, externally redirect the client to a static URL
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /forum/showthread\.ph[b]p\?t[/b]=([0-9]+)\ HTTP/
RewriteRule ^showthread\.php$ http://www.example.com/forum/t%1.html [R=301,L]
#
# Internally rewrite static URLs to the page display script
RewriteRule ^t([0-9]+)\.html$ /forum/showthread.php?t=$1 [L]

Jim

valder

6:42 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



RewriteCond %{THE_REQUEST} [b]^[A-Z]+\ /forum[/b]/showthread\.php\?t=([0-9]+)\ HTTP/

I don't get what the back-slash and space does, is it a typo? If not, could you please explain? I thought spaces weren't allowed with mod_rewrite regex.

baze,
I didn't know that R=permanent was valid syntax, but you're right, it obviously is. I seem to learn something new every day, and I love it. Thanks. :)

Eivind

baze22

6:44 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



Thank you very much. That fixed it.

I appreciate your time,

baze

valder

6:53 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



One thing regarding debugging that I find very handy, is a simple server-side script that outputs the processed values, so it makes it easier to spot errors instead of having to guess what comes out or looking in the server log all the time.

In my case, I use a php script that would look something like this:

<?php
echo "The_Request: " . $_SERVER['THE_REQUEST'];
echo "<br />Request_Uri: " . $_SERVER['REQUEST_URI'];
echo "<br />Query_String: " . $_SERVER['QUERY_STRING'];
?>

I call the file test.php, and when testing rewrites, I direct them to this file so it outputs the result. If used correctly, this could save you some time, I know it has to me. :)

valder

8:10 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



Err, nevermind my earlier question;
I don't get what the back-slash and space does, is it a typo?

I'm still a newbie you know. :)

THE_REQUEST has spaces inside, so you needed to escape the space when comparing it. I should start thinking before talking I guess. :)

baze22

9:09 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



Guess I was in a hurry before. I just noticed a problem. The url:

[localhost...]

is getting rewritten as:

[localhost...]

help? thanks,

baze

jdMorgan

9:34 pm on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you want to clear the query string? If so:

Options +FollowSymLinks
RewriteEngine on
#
# If the client requests a dynamic URL, externally redirect the client to a static URL
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /forum/showthread\.php\?t=([0-9]+)\ HTTP/
RewriteRule ^showthread\.php$ http://www.example.com/forum/t%1.htm[b]l?[/b] [R=301,L]
#
# Internally rewrite static URLs to the page display script
RewriteRule ^t([0-9]+)\.html$ /forum/showthread.php?t=$1 [L]

Otherwise, please state what the problem is.

Jim

baze22

10:14 pm on Jan 24, 2005 (gmt 0)

10+ Year Member



Do you want to clear the query string? If so:
...
Otherwise, please state what the problem is.

It seems so clear to me when I post these things... :)

That was exactly what I getting at. Thanks again for your patience.

baze