Forum Moderators: phranque

Message Too Old, No Replies

Specific and catch-all redirect for CMS and site structure changes

Changed CMS and restarted blog with clean slate

         

xinqinyan

12:23 pm on Apr 27, 2010 (gmt 0)

10+ Year Member



Hi, I've read articles in the Library and some of the forum threads that already exist on changing CMS and did not find something that worked for me.

I have changed my blog from Nucleus CMS to Wordpress. This is not a pure migration as I only want to retain a handful (~20) blog items and start the blog afresh. The Nucleus CMS installation relied on dynamic URLs and did not use any pretty URLs. Old blog item URLs all have the following identifier:

http://example.com/index.php?itemid=N


where N can be 1 to 3 digits in length.

Nucleus CMS was installed in the root public directory while all the core Wordpress files were installed in the subdirectory /wordpress/ but with the blog configured to be at the root public directory.

I am trying to use mod_rewrite to:
1. Redirect the old URLs of the blog items I am keeping to their new URLs. I know the specific item IDs and I will be manually reposting them in Wordpress so I will know the specific new URLs.
2. Redirect all other old URLs of blog items I am NOT keeping to a single, specific Wordpress page (not the main page).
3. I also want this to play nice with the Wordpress .htaccess optimisation jdMorgan kindly posted at [webmasterworld.com...]
4. I would also like to ensure anything with a "www" in the URL gets automatically directed to the address with no "www".

My attempt so far:
RewriteEngine on

# Catch example items and redirect to new URL
RewriteCond %{QUERY_STRING} ^itemid=123$
RewriteRule ^itemid=[0-9]+$ http://example.com/new-blog-item-URL/ [R=301,L]

# Catch all other old items and redirect to specific new URL
RewriteCond %{QUERY_STRING} ^itemid=[0-9]+$
RewriteRule ^itemid=[0-9]+$ http://example.com/specific-new-page-URL/ [R=301,L]

# BEGIN jdMorgan's WordPress rewrite
RewriteCond $1 ^(index\.php)?$ [OR]
RewriteCond $1 \.(gif|jpg|ico|css|js)$ [NC,OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*)$ - [S=1]
RewriteRule . /index.php [L]
# END wordpress

# Strip "www" from the URL
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]


This seems to result in

http://example.com/index.php?itemid=123

being redirected to

http://example.com/?itemid=123

and not

http://example.com/new-blog-item-URL/

as I intended. I would be grateful if someone could point me in the right direction!

g1smd

12:45 pm on Apr 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your first two rules will never match a request. RewriteRule cannot see query string data.

Specifically, your first two rules are looking for requests like
example.com[b]/[/b]itemid=222[b]?[/b]itemid=222
which will fail to match your actual requests.

The RewriteRule pattern needs to match the path part of the request. That's the bit after the domain name (minus the leading slash) and before the question mark.

Once you fix that, you'll then find you have created a redirection chain as there is further work to do.

Use Live HTTP Headers to verify the chain of events for your requests. It will redirect multiple times.

Now, the fixes for that.

First you'll want to strip the query sting in the 'specific page' and 'catch all' redirects. Do that by appending a question mark to the end of the target URL.

Secondly, you are currently exposing rewritten internal filepaths as URLs for all www requests because you have listed the non-www redirect AFTER the rewrites. That rewrite code MUST be the last block of code in order to avoid this happening.

xinqinyan

1:17 pm on Apr 27, 2010 (gmt 0)

10+ Year Member



Thanks for the quick response. I changed it to the following (I hope I read your response correctly):


RewriteEngine on

# Strip "www" from the URL
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

# BEGIN jdMorgan's WordPress rewrite
RewriteCond $1 ^(index\.php)?$ [OR]
RewriteCond $1 \.(gif|jpg|ico|css|js)$ [NC,OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*)$ - [S=1]
RewriteRule . /index.php [L]
# END wordpress

RewriteCond %{QUERY_STRING} ^index\.php\?itemid=554
RewriteRule ^index\.php\?itemid=554 http://example.com/new-blog-item-URL/? [R=301,L]

RewriteCond %{QUERY_STRING} ^itemid=[0-9]+$
RewriteRule ^itemid=[0-9]+$ http://example.com/specific-new-page-URL/? [R=301,L]


This is still giving me the same result and not dropping the query string. The Live HTTP Headers gave me the following information:


http://example.com/index.php?itemid=554

GET /index.php?itemid=554 HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

HTTP/1.1 301 Moved Permanently
Date: Tue, 27 Apr 2010 13:08:36 GMT
Server: Apache
X-Pingback: http://example.com/wordpress/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Last-Modified: Tue, 27 Apr 2010 13:08:36 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Location: http://example.com/?itemid=554
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Via: 1.1 vhost.phx4.nearlyfreespeech.net:3128 (squid/2.7.STABLE7)
----------------------------------------------------------
http://example.com/?itemid=554

GET /?itemid=554 HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

HTTP/1.1 200 OK
Date: Tue, 27 Apr 2010 13:08:36 GMT
Server: Apache
X-Pingback: http://example.com/wordpress/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Last-Modified: Tue, 27 Apr 2010 13:08:36 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Via: 1.1 vhost.phx4.nearlyfreespeech.net:3128 (squid/2.7.STABLE7)
----------------------------------------------------------


It looks like it's still going in a loop, so I must still have the wrong order. Should I be going in this order instead?

Wordpress rewrites
non-www redirect
specific page redirect
catch all redirect

g1smd

2:06 pm on Apr 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Based on the very last thing you wrote, you need to change the order to 3 - 4 - 2 - 1.

The patterns in 3 and 4 are still wrong. A request consists of
domainname/path-part?query-string
and you need to test parts of that.

The
QUERY_STRING
RewriteCond pattern can see ONLY the "query string" data.

The
RewriteRule
pattern can see ONLY the "path part" of the URL request.

xinqinyan

3:14 pm on Apr 27, 2010 (gmt 0)

10+ Year Member



Hrmm...since I am trying to do a rewrite within the same domain on the same server, I just need to match everything from
path-part?query-string
?

I've gone back and looked at [httpd.apache.org...] which seems to suggest I don't need the RewriteCond lines for the specific page redirects? I've left it in there for the catch-all since it seemed the right thing to do.

I also had trouble finding the right server variable to use, since I tried not using server variables and didn't get anywhere with that. I could only find the list at [httpd.apache.org...] which didn't explain which parts of the request were returned by which server variables.

The most recent iteration I have is:


RewriteEngine on

RewriteRule ^index\.php\?itemid=554 http://example.com/new-blog-item-URL/? [R=301,L]

RewriteCond %{THE_REQUEST} ^itemid=[0-9]+$
RewriteRule ^(.*)$ http://example.com/new-specific-page-URL/? [R=301,L]

# Strip "www" from the URL
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

# BEGIN jdMorgan's WordPress rewrite
RewriteCond $1 ^(index\.php)?$ [OR]
RewriteCond $1 \.(gif|jpg|ico|css|js)$ [NC,OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*)$ - [S=1]
RewriteRule . /index.php [L]
# END wordpress


which is still giving me the same result thus far.

I am sorry for taking so long to understand, I feel like I am not anywhere near close!

g1smd

3:34 pm on Apr 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Again, the pattern part of the RewriteRule examines only the path part of the incoming URL request. It cannot see the query string part of the request. Test only the path part of the request in the RewriteRule.

Add a preceding RewriteCond to test the query string value. The RewriteCond will only see the query string part of the request. It cannot see the path part of the request. Test only the query string part with the RewriteCond.

RewriteEngine on

RewriteCond %{QUERY_STRING} &?[b]itemid=554[/b]&? [NC]
RewriteRule [b]^(index\.php)?$[/b] http://example.com/new-blog-item-URL/? [R=301,L]

RewriteCond %{QUERY_STRING} &?[b]itemid=[0-9]+[/b]&? [NC]
RewriteRule [b]^(index\.php)?$[/b] http://example.com/specific-new-page-URL/? [R=301,L]

# Strip "www" from the URL
RewriteCond %{HTTP_HOST} [b]!^(example\.com)?$[/b]
RewriteRule (.*) http://example.com/$1 [R=301,L]

# BEGIN jdMorgan's WordPress rewrite
RewriteCond $1 ^(index\.php)?$ [OR]
RewriteCond $1 \.(gif|jpg|ico|css|js)$ [NC,OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*)$ - [S=1]
RewriteRule . /index.php [L]
# END wordpress


Your first guess was the best, where the instructions were to move the rewrite last (so that 1 - 2 - 3 - 4 would become 1 - 2 - 4 - 3) and to edit the very first two items to use the right pattern matching.

xinqinyan

11:49 am on Apr 28, 2010 (gmt 0)

10+ Year Member



Oh, I see now. Thank you so much for taking the time to explain it to me. I didn't fully understand how RewriteRule links with RewriteCond. I kept thinking for some reason that it was constructed as:

RewriteCond where {QUERY_STRING} matches <this pattern>
RewriteRule <this pattern> needs to be rewritten as <<end result>>

which obviously is a logical failure, since as g1smd has pointed out much more nicely RewriteRule has no way of knowing what on earth <this pattern> is, because it can't see the QUERY_STRING.

The correct logic flow in g1smd's solution above goes something like:

RewriteCond where {QUERY_STRING} matches <this pattern>
RewriteRule <<the part of the old URL I don't want anymore>> needs to be rewritten as <<<end result>>>

Thank you once again for your patience and helping me to understand!

g1smd

12:12 pm on Apr 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Even more simple than that.

If the pattern in the RewriteRule matches the path part of the URL request, examine the RewriteCond(s) if there are any. Those can examine the requested domain name, requested port number, requested parameters, the entire literal GET line and/or several other things.

If all of the RewriteConds also match, then process/generate the target of the RewriteRule.