Forum Moderators: phranque
I need help with some regex for redirecting URLS with query strings form .htaccess.
The old URLS looks lige this:
/index.php?REQ=view&id=2629&page=&cat=2
So ID 2629 identifies the particular page.
I move the domain to a new platform where the same page will have this URL:
/better-looking-wordpress-url-id2629.html
So I need a solution that will redirect URLS from the old platform to the matching URLS on the new platform.
I think I will need something like this:
RewriteCond %{QUERY_STRING} ^(id=NEEDSOMHELPHERE)(.*)$ [NC]
RewriteRule ^(index\.php)$ [mydomain.com...] [R=301,L]
I don't know how to make the RewriteCond with the query string understand that it must look for an id number with 1-4 digits.
Also I don't know how to make it match the id(THE-SAME-NUMBER).html in the Rewriterule - my current solution has the home page as the destination URL.
Hope I'm on the right track here and someone can point me in the rigth direction...
Thanks!
Create a back-reference to those matched digits by adding parentheses:
^view&id=([0-9]{1,4})&page=&cat=2$
Now use that back-reference in the rule:
RewriteCond %{QUERY_STRING} ^view&id=([0-9]{1,4})&page=&cat=2$ [NC]
RewriteRule ^index\.php$ http://example.com/[b]%1[/b].html? [R=301,L]
See the concise regular expressions tutorial cited in our Apache Forum Charter for more information on regex patterns.
Be aware that if you also have an internal rewrite rule to 'deliver' requests for "eaxmple.com/1234.html" to your script, then that rule and this rule will conflict, and this rule will need to be further modified to prevent an 'infinite' redirect/rewrite loop. I suggest defining the *exact* query string pattern requirements before moving on to that subject: Specifically:
The answers to these questions can have a big impact on the required code.
Jim
Thanks for looking into my problem and help me ask the right question.
I understand the regex you use to match the 1-4 digit number. I also understand you create a back-reference and put that behind the destination URL.
The actual URL's looks like this:
index.php?REQ=view&id=1094&page=&cat=6&subcat=57&subsubcat=0
index.php?REQ=view&id=671&page=&cat=11&subcat=95&subsubcat=0
The subsubcat at the end is always =0. The number next to cat and the subcat varies, depending on what categories they belonged to. So I'm not sure how to deal with these value pairs, I suppose I'm looking for a intelligent way to have them ignored and prevent an infinite loop?
Only the id= is used to identify and match the correct destination page (if I get the pages right I think I will be able to make a redirect rule from old-category-pages to new-category-pages my self).
The value pairs in my examples above seems to be consistent, they exist in every page URL and in the same order.
Also - should I deal with the REQ part of the URL some way?
And will the RewriteRule line you suggest deal with this part of my new URLS: "better-looking-wordpress-url-id" ? This part of the URL varies depending on the post name. They will all end in with -id followed by the id number. If this not possible I could also make the destination URLS look like this: ID1234-here-comes-my-post-name.html - putting the matching part first.
This is by far the most compliated redirect task I have ever tied to solve, and I will learn a lot from it if you can show me how to accomplish this. Also I will definitely check the regular expressions tutorial you mention.
I can't believe how elaborate your answer to my question is - it is highly appreciated!
/Michael
The actual URL's looks like this:index.php?REQ=view&id=1094&page=&cat=6&subcat=57&subsubcat=0
index.php?REQ=view&id=671&page=&cat=11&subcat=95&subsubcat=0
It's critical to understand that you have broken the illusory equivalence of a URL and a filepath, and that they were in fact never equivalent, only 'associated' by the action of your server.
The subsubcat at the end is always =0. The number next to cat and the subcat varies, depending on what categories they belonged to. So I'm not sure how to deal with these value pairs, I suppose I'm looking for a intelligent way to have them ignored and prevent an infinite loop?Only the id= is used to identify and match the correct destination page (if I get the pages right I think I will be able to make a redirect rule from old-category-pages to new-category-pages my self).
The value pairs in my examples above seems to be consistent, they exist in every page URL and in the same order.
Also - should I deal with the REQ part of the URL some way?
If in fact the proper page can be generated based only on the id and "REQ=view" being the function requested, then these name/value pairs can in fact be ignored in the RewriteCond pattern, which will devolve to ^REQ=view&([^&]*&)*id=([0-9]{1,4})(&.*)?$
However, it's very likely that my caveat above applies, and that you do in fact have an internal rewrite, so the form of the RewriteCond and the rule *will* need to change: You'll need to examine "THE_REQUEST" in order to prevent looping, and that is discussed in the thread I cite in the next paragraph.
I'll be back later. In the meantime, do a thorough read of this thread [webmasterworld.com] in our Apache Forum Library. It will (hopefully) explain the whole process you're embarking on, and in more detail than I care to repeat here. -- and be aware that the wording is as technically-precise as I could make it, so URL means URL, and filepath means filepath, and never the twain shall meet... :)
Jim
If you require the new URL to be in the form www.example.com/witty-post-title-23456 then a different approach might be necessary.
That solution sees you rewrite requests for old URLs to an internal script that "looks up" (using the item number as the key) in the database what the new title for the item will be and uses that to build the new URL which is then sent out by the script within a set of 301 redirect HTTP headers.
That is, it is a special script that generates the redirect to the new URL if the old URL is requested.
Discuss.
I'm not in a hurry - please enjoy your holidays.
Please forgive if some of the questions below should be obvious - Mod rewrite is opening a new door for me when it comes to managing redirects and ten days ago I couldn't spell my own name with regex (or at least I didn't knew I could :-)
I have read the thread you refer to several times now. So just to make sure that I understand the environment correctly:
1)
I'm moving a domain from one server to another and at the same time I'm switching from one CMS to another (wordpress). On the old CMS a URL would look like this:
[mydomain.com...]
The purpose of the redirects is of course to preserve link value, user experience and to have the new URLS quickly indexed in search engines.
To accomplish this I will need to use the filepath rewrite and use RewriteCond % {THE_REQUEST} instead of the RewriteCond %{QUERY_STRING} ?
2)
I have read the regex tutorial you referred to earlier and I see that it clearly explained my first regex question, matching 1-4 digits - and it also gave me some basics about the back-reference.
I having trouble understanding all of this one:
^REQ=view&([^&]*&)*id=([0-9]{1,4})(&.*)?$
My problem is this middle part ^REQ=view --> &([^&]*&)* <-- id=([0-9]{1,4})(&.*)?$
The rest is clear but what is the purpose of not doing:
^REQ=view&id=([0-9]{1,4})(&.*)?$
3) Im closer to understand how you match the ID from the old URL (or filepath) to the same ID in the new URL. I still don't know what to do with the part of the new URL that is now "any-new-post-name" with no correlation to the old URL. In the example from the thread you referred to each element of the old filepath is reused in the new URL.
And on the same issue - will it make it a lot "easier" for the server if I use this format for the new URLS: id1234-new-post-name.hmtl than new-post-name-id1234 because it finds the match at the beginning?
/Michael
[edited by: jdMorgan at 7:27 pm (utc) on Dec. 24, 2009]
[edit reason] Disabled smilies. [/edit]
Thanks for your suggestion.
I have so many future redirect tasks so I feel I need to understand what I can accomplish from Mod rewrite without a script and a database lookup.
If your suggestion is the only solution or if it is a better/easier way to accomplish my goal I will of course follow that path. Basically I'm not able to determine what is the better way from what I know at this time...
Thanks!
/Michael
So in this case, there is no way for mod_rewrite to 'know' the post title, if the only thing in the requested URL is the post id number. mod_rewrite has no built-in mechanism to 'invent' the post title itself.
So that's why a script is required, either via RewriteMap or perhaps by adding a 'wrapper' around your CMS (bot as hard as it may sound) that can accept a request for a 'post id' URL and generate a redirect to a 'post title' URL. It would do this by accessing the main CMS database, looking up the title by using the requested id.
Jim
So I now understand that I can't match only a part of the destination URL without finding a way to construct the rest. So instead I found a Wordpress plugin that will allow me to do search replace on URL and Post Title with regex support (http://urbangiraffe.com/plugins/search-regex/).
By default Wordpress constructs the URL from the post title, but with this plugin I can afterwards remove the ID from the post title (looks much better) and I can remove everything but the ID in the destination post URL's.
So to redirect:
ht*p://mydomain.com/index.php?REQ=view&id=1094&page=&cat=6&subcat=57&subsubcat=0
to ht*p://mydomain.com/1094.html
Would this approach work:
RewriteCond %{QUERY_STRING} ^REQ=view&([^&]*&)*id=([0-9]{1,4})(&.*)?$ [NC]
RewriteRule ^index\.php$ http://example.com/%2.html? [R=301,L]
...or will I have to go through RewriteCond THE_REQUEST referred to above?
If I can't use the RewriteCond %{QUERY_STRING} I don't fully understand where the QUERY_STRING solution will apply, but hopefully you can point me in the right direction...
/Michael
I suspect that your new plugin changes the on-page URL, you've got a RewriteRule that rewrites those (or all) requests to your script, and now you've got a bunch of old-style dynamic URLs cited in links out on the Web that you're trying to get rid of by redirecting them. My impression is that we haven't built up the vocabulary sufficiently to discuss this, and I'm afraid that'll be up to you.
If my theory above is correct, then you will need to use a RewriteCond checking %{THE_REQUEST} and only redirect if the old-style dynamic URL (now only the script filepath) is being directly requested by a client. If you redirect for all cases instead of doing this test, then this rule and the one that rewrites requests to your script will countermand each other creating an 'infinite' redirect/rewrite loop.
Friendly URLs in three steps:
1) Change on-page URLs to 'static-looking' by editing HTML files or modifying scripts
2) Add or modify RewriteRule to internally rewrite new static-looking URL to pre-existing script flepath+query string.
3) (Optional) Redirect (only) direct client requests for old dynamic URLs to new static URLs.
In both step 2 and step 3, if you're coding this in .htaccess, the output URL or filepath can be constructed using only the data in the input URL or filepath. Step 1, being done by the script, can do database accesses to 'associate' IDs with titles and vice-versa.
Jim