Forum Moderators: phranque

Message Too Old, No Replies

Perm. redirect dynamic urls to new static urls

how to redirect dynamic urls to static urls

         

indev

4:11 am on Jun 26, 2011 (gmt 0)

10+ Year Member



Hi,

I'm trying to get .htaccess / modrewrite to function correctly but have hit a wall and am hoping that someone will be able to help me.

I currently have tons (over 1,000) of dynamic urls on my site that are named stuff.php?date=2011-06-25 and want to rewrite them to be /daily/2011-06-25/ (each "url" is essentially a different date of the year for the past 5+ years)

This is a new change to the site and so far I've been able to set everything up so the rewrites work, but I would like .htaccess to automatically redirect users if they go to one of the "old" links.

So, going to domain.com/stuff.php?date=2011-06-25 should automatically forward the user to domain.com/daily/2011-06-25/

The code I have in my .htaccess right now:

RewriteEngine On
RewriteRule ^daily/([^/]*)/$ /oldstuff.php?date=$1 [L]
RewriteRule ^daily/([^/]*)$ /oldstuff.php?date=$1 [L]

I've been searching all over the Internet for a solution, but nothing has really made sense/worked yet.

Thank you for taking the time to read this, I really hope someone has a solution. Thanks again!

g1smd

6:57 am on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use a RewriteCond to detect the parameter value, by looking at THE_REQUEST. This ensures that only direct client requests with parameters are redirected.
Use a RewriteRule for the redirect, the date being in %n (where n is some number from 1 to 9, and depends on the exact pattern matching used).
The redirect target will need the domain name and the rule will end with the [R=301,L] flags.

There are several thousand prior examples of this type of code in the WebmasterWorld Apache forum.

g1smd

8:14 am on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In your existing code, ([^/]*) allows blank date. You might want ([^/]+) instead.

I would use
^([12][0-9]{3}-[0-9]{2}-[0-9]{2})$
, or similar, to partially pre-filter requests.

Whatever you do, make sure the PHP script sends a 404 header for invalid date (like 20A5-890-LA) and for date with no content.

Your two rules allow duplicate content. I would redirect requests "with trailing slash" to "without trailing slash".

lucy24

8:24 am on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^daily/([^/]*)/$ /oldstuff.php?date=$1 [L]

You have it backward. The original version (the part your users ask for) comes first, and then the version you want to change it to. Here, you want to take

stuff.php?date=2011-06-25
and change it into
daily/2011-06-25/

You saved yourself a lot of aggravation by keeping the date in the same format, so you can capture it all with ([-\d]+). Is each of those dates a directory (trailing slash)?

You probably want your users to see and remember the new name format. So make it [R=301,L].

:: shuffling papers ::

Ah. I knew Apache had to be hiding it somewhere [wiki.apache.org]. Second item from bottom: "Making the Query String Part of the Path". Note the question mark at the end of the output. That's not your regex question mark meaning "this character may or may not be present", it's an apache question mark that prevents your address from being turned into

daily/2011-06-25/?date=2011-06-25

because queries are ignored in rewrite unless you specifically say to do something about them.

g1smd

8:44 am on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No. That's not backwards. The bit it in the OP is the rewrite. It accepts the URL request and rewrites it to the internal script to serve the content.

What is needed is an additional ruleset for the redirect part of the question. That ruleset will work "in the other direction" as explained in my first post.

Query string data is automatically re-appended unless you append a different query string or add a question mark to clear the query string.

lucy24

10:54 am on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



queries are ignored in rewrite unless you specifically say to do something about them.

Query string data is automatically re-appended unless you append a different query string or add a question mark to clear the query string.

indev, try not to be bewildered. We both meant exactly the same thing, but one of us said it more coherently than the other ;)

indev

2:36 pm on Jun 26, 2011 (gmt 0)

10+ Year Member



Hi g1smd & lucy24,

Thank you both for the quick replies and help. I'm still very new at editing .htaccess files, so if either of you are able to reply back with the exact code (from top to bottom) that I should put into my .htaccess file it would be really appreciated.

Also, in regards to the "trailing slash" - I would like the links to be in the format: domain.com/daily/2011-06-26/ and have the site re-direct/forward users that type in domain.com/daily/2011-06-26 to the version with the trailing slash. (just to keep everything in-order and not have duplicate pages.)

These "trailing slash directories" aren't real directories, instead would be the content from oldstuff.php?date=2011-06-26

Thanks again for your help!

indev

2:46 pm on Jun 26, 2011 (gmt 0)

10+ Year Member



In your existing code, ([^/]*) allows blank date. You might want ([^/]+) instead.

What does + do in this case?

I would use ^([12][0-9]{3}-[0-9]{2}-[0-9]{2})$, or similar, to partially pre-filter requests.

Can you explain what this pre-filter does? Is pre-filtering recommended for better performance?

Whatever you do, make sure the PHP script sends a 404 header for invalid date (like 20A5-890-LA) and for date with no content.

Is it better to send a 404 header? I was simply going to re-direct a user to the most recent /daily/todays-date page if what they entered doesn't exist without sending any headers in php.

Your two rules allow duplicate content. I would redirect requests "with trailing slash" to "without trailing slash".

Lastly, is it better to have it without the trailing slash?


Thanks again.

g1smd

4:48 pm on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The URL for a folder, or for the index page in a folder, ends with a trailing slash.

URLs for pages do not end in a trailing slash and may or may not have an extension.

URLs for stylesheets and images will have an extension.

.* is "zero or more characters" (i.e. allows blank), and .+ is "one or more characters" (i.e. does not allow blank).

The prefilter means that only things like 2011-06-26 and 2099-98-46 get passed to your script, and stuff like 2F45-B6-9K or BDTHH7FHH3234HH0005 gets bounced with a 404 error at the .htaccess stage.

indev

9:10 pm on Jun 26, 2011 (gmt 0)

10+ Year Member



Thank you for he response g1smd. Is there any way you could tell me what the RewriteCond and other .htaccess code I need to use?

g1smd

9:23 pm on Jun 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This forum focusses more on self-tuition. You need to understand your code because you will need to maintain it. Unfortunately there's not enough volunteers here to provide a free code-writing service. However, since WebmasterWorld has been here almost a decade you can be sure there are several hundred threads with the code you need right here in the archives. Let's see your code and specific questions on fixing it.

indev

1:18 am on Jun 27, 2011 (gmt 0)

10+ Year Member



Well, using the code below I can get one specific date (2011-06-25) to forward/redirect to the correct path, but not sure what to replace the date with to create a wildcard for all dates....

RewriteCond %{QUERY_STRING} ^date=2011-06-25$
RewriteRule ^oldstuff\.php$ http://www.domain.com/daily/2011-06-25/? [R=301,L]


Also, when using this code alone I get a 404 error (because no url rewrite is being done.) And, when trying to combine this with my original .htaccess code,

RewriteEngine On
RewriteRule ^daily/([^/]*)/$ /oldstuff.php?date=$1 [L]
RewriteRule ^daily/([^/]*)$ /oldstuff.php?date=$1 [L]


Going to the new url, domain.com/daily/2011-06-25/ give me a Browser Error that says "Firefox has detected that the server is redirecting the request for this address in a way that will never complete."

So while I'm all for learning, a little more direction would help (rather than me just doing trial and error.)

Thanks again!

lucy24

3:11 am on Jun 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



2011-06-25

That one's buried in one of g1's posts. Replace the date with
(\d\d\d\d-\d\d-\d\d)
in the RewriteCond, put it in the appropriate location in the Rule, and replace it with $1 in the second half of the rule. Not sure you even need to make a Cond, but it won't do any harm. Now you're capturing any set of numerals in the 4-2-2 form.

Also, when using this code alone I get a 404 error (because no url rewrite is being done.)

Careful, please. You are about to send g1 into a screaming rage. You cannot change the actual url (the physical location of the page) via .htaccess. You can only change what happens to the user's request.

"Firefox has detected that the server is redirecting the request for this address in a way that will never complete."

Clever Firefox. It has detected that you are sending yourself around in circles by changing oldstuff into newstuff into oldstuff into newstuff...

rather than me just doing trial and error

Nothing special about you. I just spent a couple hours swearing at my computer because I couldn't figure out how to get a dropdown to work, even with the combined efforts of html and javascript. (It's supposed to send you to another page. Is that too much to ask? Instead it kept sticking on a wholly unwanted query string. And I still haven't figured out how to make it use the "value" instead of the displayed text.)

indev

4:57 am on Jun 27, 2011 (gmt 0)

10+ Year Member



Okay, after doing much searching I found a post by jdMorgan from 2006 that has helped me to get it working. The code is below and included in the top portion is some code for redirecting index.php traffic.

Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule (.*) http://www.domain.com/$1 [R=301,L]
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php
RewriteRule ^index\.php$ http://www.domain.com/ [R=301,L]
#
RewriteRule ^daily/([^/]*)/?$ /oldstuff.php?date=$1 [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /oldstuff\.php\?date=([^\ ]+)\ HTTP/
RewriteRule ^oldstuff\.php$ http://domain.com/daily/%1/? [R=301,L]


My only question, if a user manually types in domain.com/daily/2011-06-25 (without the trailing slash "/") what's the best way to append this trailing slash "/" to the url with my current setup? (I would like to redirect
http://domain.com/daily/2011-06-25 
to
http://domain.com/daily/2011-06-25/
)

Thanks!

indev

5:17 am on Jun 27, 2011 (gmt 0)

10+ Year Member



I added:
RewriteCond %{REQUEST_URI} !(\.|/$)
RewriteRule (.+) http://www.domain.com/$1/ [R=301,L]


Following the
RewriteEngine on
code.

This now adds a trailing slash as I wanted, but am wondering how I can restrict it so a trailing slash is only added to urls/pages off of the domain.com/daily/ directory? Ideas?

lucy24

5:25 am on Jun 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My only question, if a user manually types in domain.com/daily/2011-06-25 (without the trailing slash "/") what's the best way to append this trailing slash "/" to the url with my current setup? (I would like to redirect http://example.com/daily/2011-06-25 to http://example.com/daily/2011-06-25/)

You don't really need to. The server does that automatically. In fact servers worldwide probably spend more time adding final slashes than all other work put together :)

And, ahem, you've forgotten Rule #1 ;)

g1smd

8:19 am on Jun 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The reason for the error in the browser was that you were sending yourself round in a loop from old to new to old to new until the server or browser gave up. As you have discovered, you needed to test THE_REQUEST to ensure it was a direct client request and not a previously rewritten request.

To get from "single URL redirected" code to "any URL redirected" code, all you needed to do was drop in my "pre-filter" RegEx pattern where the date was detected and $1 where the date was going to be re-used.

You have several more problems to solve. Look at you code and imagine a request with parameters but for the non-www version of your URL.

Your first rule redirects to www.
Your last rule redirects to the correct date format but at non-www.
Your first file redirects to www again.

This three step redirection chain is very bad news. You need to change the order of the rules.

List all redirects before all rewrites.
List the redirects in order of priority with the most specific first (affects the smallest number of URLs) and the most general last (affects the largest number of URLs, in this case the non-www to www rule).
List the internal rewrites last.

Add a blank line AFTER every RewriteRule. Add a plain text comment describing what each rule does.

indev

11:35 pm on Jun 27, 2011 (gmt 0)

10+ Year Member



Thanks for the feedback and additional info g1. Below is what I hope to be the "final" code- please let me know if the formatting and order of rewrites/redirects is correct.


# Enable mod_rewrite, start rewrite engine
Options +FollowSymLinks
RewriteEngine on
#
# Externally redirect client requests for old dynamic URLs to equivalent new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /oldstuff\.php\?date=([^\ ]+)\ HTTP/
RewriteRule ^oldstuff\.php$ http://www.domain.com/daily/%1/? [R=301,L]
#
# rewrite url without index.php
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php
RewriteRule ^index\.php$ http://www.domain.com/ [R=301,L]
#
# append "/" if requested URI contains no filetype and does not end in "/"
RewriteCond %{REQUEST_URI} !(\.|/$)
RewriteRule (.+) http://www.domain.com/$1/ [R=301,L]
#
# add "www" to beginning of domain name
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule (.*) http://www.domain.com/$1 [R=301,L]
#
# Internally rewrite search engine friendly static URL to dynamic filepath and query
RewriteRule ^daily/([^/]*)/?$ /oldstuff.php?date=$1 [L]


Thanks!

g1smd

11:50 pm on Jun 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



# rewrite url without index.php

should be:
# Redirect index.php requests to slash

which more correctly describes what is happening.

RewriteBase /

is the default and is not required.

RewriteCond %{HTTP_HOST} ^domain\.com [NC]

should be:
RewriteCond %{HTTP_HOST} !^(www\.domain\.com)?$

which redirects more types of non-canonical URLs.

RewriteRule ^daily/([^/]*)/?$

should be
RewriteRule ^daily/([^/]+)/$

With a * in the pattern,
example.com/daily// 
is a valid URL. The + replaces the * here. Additionally, you must allow only ONE URL format, with slash OR without slash to serve the content. To allow both, promotes duplicate content. Delete the question mark. The slash should be required, or should be missing. You decide, though the HTTP specs suggest that "missing" is the correct choice.

lucy24

2:27 am on Jun 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /oldstuff\.php\?date=([^\ ]+)\ HTTP/

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php

What does the space mean in these two lines?

To be safe, tack a [NC] onto any line containing alphabetics. (But that's not the part I don't understand.)

g1smd

7:12 am on Jun 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What does the space mean in these two lines?

Which space? Do you mean the escaped "\ " space?

It is a literal space, to match the "GET /index.php HTTP/1.1" request that the browser sends.

lucy24

6:49 pm on Jun 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is a literal space, to match the "GET /index.php HTTP/1.1" request that the browser sends.

Aha! The magic word was GET. Is this the only situation where you would legitimately have a space in the test string? (Assuming that "illegitimately" is when a page address contains a space character ;)) And, for that matter, the only situation where you would not need to worry about lower-case equivalents?

g1smd

7:43 pm on Jun 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is not the only place you might find spaces.

They appear after "GET" and again before "HTTP/1.1" in THE_REQUEST.

Spaces can also occur in HTTP_USER_AGENT. For those patterns you need to escape spaces as "\ " as well as escaping at least "(" and "." and ")" too.

Yes, spaces in URLs are invalid.