Forum Moderators: phranque

Message Too Old, No Replies

Redirect wiki queries to new domain, but keep old domain active

         

MortenBlaabjerg

12:36 pm on Jun 29, 2010 (gmt 0)

10+ Year Member



I have moved an old MediaWiki install to a new domain, and I wish to use .htaccess and mod_rewrite to redirect all incoming links aimed at the old wiki to the new domain, keeping all parameters intact.

I do not want to send inbound traffic pointing to the main domain or it's future subdomains to the new wiki, as I plan to use the old domain for a new site.

In other words, what I need is a Rewrite or Redirect that sends all 'wiki-like' links to their new domain locations, permanently, and keeps the rest of the traffic on the site.

I've searched high and low and all my attempts to construct a rewrite rule that works right have failed so far, so I'm asking you for guidance.

This is the working code I have

# If URL = (www.)domain1.com send to domain2.dk/$1
RewriteCond %{HTTP_HOST} ^(www\.)?domain1\.com$ [NC]
RewriteRule ^(.*)$ http://domain2.dk/$1 [R=301,L]


But this sends all traffic to domain1 to domain2, and keeps all the parameters intact.

I imagined I could do something like this :

# If URL = (www.)domain1.com/index.php?title= etc send to domain2.dk/$1
RewriteCond %{HTTP_HOST} ^(www\.)?domain1\.com/index\.php?title=$ [NC]
RewriteRule ^(.*)$ http://domain2.dk/index.php?title=$1 [R=301,L]


But this fails miserably. I have tried several other methods, but all seems to ignore the RewriteCond and not get rewritten at all.

Anyone who can help?

g1smd

10:12 pm on Jun 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You need to define "what a mediawiki URL looks like", and then construct RegEx patterns that only match those features.

Be aware that HTTP_HOST sees only the requested domain name. Nothing else.

You can test only the path part of the URL request with REQUEST_URI. Nothing else.

Requested parameters can be tested by looking at QUERY_STRING. Nothing else will match.

All of the above are used with RewriteCond.

The RewriteRule pattern can ONLY see the path part of the URL. Only the path part.

The code is a trivial two line directive once the requirements are fully defined.

What does a "to be redirected" URL look like?

MortenBlaabjerg

11:09 pm on Jun 29, 2010 (gmt 0)

10+ Year Member



Great! - Thank you for very much for those very helpful insights. If only one could learn so much from trimming the (otherwise helpful) mod_rewrite documentation. Thanks for taking the time to reply :-)

I tried using REQUEST_URI, but had no luck.

Here's a typical MediaWiki-URL in my old install :

http://domain1.com/index.php?title=wiki_article_title_here


The new ones are just as straightforward :

http://domain2.dk/index.php?title=wiki_article_title_here


So I believe something along these lines is needed :

RewriteCond {REQUEST_URI} index.php
RewriteCond {QUERY_STRING} ?title=.*$
RewriteRule .*$ http://domain2.dk/$1


Have not the slightest idea how to test QUERY_STRING, but will look that up.

g1smd

11:41 pm on Jun 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since index.php is optional at the old domain, you'll need a pattern like
^(index\.php)?$
which matches
example.com/index.php?param=value
as well as
example.com/?param=value
so that BOTH requests will be redirected to the new URL.

However, before coding even begins you have do define your requirements. This includes a list of URL patterns that will be redirected and a list of URL patterns that will not be redirected.

Are there other types of pages? What do those URLs look like? What do image URLs look like? CSS? JS? Are those to be redirected too?

Moving to coding too early will see ineffective code with flaws that you only find when your rankings tank... and that's way too late to be fixing the problem.

MortenBlaabjerg

8:37 pm on Jun 30, 2010 (gmt 0)

10+ Year Member



Thanks, sounds like very sound advice. You definitely have me thinking about this a bit deeper than I first anticipated.

The old wiki has at maximum about 220 inbound links, according to Google, and unfortunately much harm has been done already, as it has taken a while finalizing it's move from it's old webhost, where the site was dysfunctional for some time. The rewrite/redirect I want to do now is the final lifeboat for what inbound links that may remain. In other words, should be no big deal.

Yet now I've spent a couple of days looking into the workings of mod_rewrite (which has always been a great mystery to me), and I've decided to put some time into learning the basics, as I believe it will make things a lot easier for me down the road.

Thanks again :-) I'll check back here when I have something working, in case someone else with a similar problem comes by.

g1smd

8:41 pm on Jun 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Requirements - write them out here and you'll get some pointers as to things you might have missed.

It's a mistake to start coding before you know *exactly* what the code should do - for all types of valid requests and for all types of non-valid requests.

MortenBlaabjerg

9:28 pm on Jun 30, 2010 (gmt 0)

10+ Year Member



Will have to check what kind of URL's my future WordPress install generates and what kind of rewrites it performs, before I can be sure there won't be any pattern conflicts.

I suspect there might be, but am not quite ready to build the new site on the domain.

For now, I've created a simple redirect for the entire site, which works much better than the RewriteRule I posted previously :

Redirect / http://domain2.dk/


Thanks again for helping out. Will post my requirements for the rewrites here when I know more about what URL's WordPress creates and needs, in contrast with those I need to redirect/rewrite.

MortenBlaabjerg

9:14 am on Jul 1, 2010 (gmt 0)

10+ Year Member



Requirements :

Wordpress first - as it's important patterns don't collide with these URLs.

Besides requests for the root domain (with or without the www) or subdomains, Wordpress generates and needs URL's like these :

domain1.com/index.php?p=123
subdomain.domain1.com/index.php?p=123
domain1.com/index.php?attachment_id=123
subdomain.domain1.com/index.php?attachment_id=123


These seem to be rewritten in .htaccess to

domain1.com/?p=123
subdomain.domain1.com/?p=123
domain1.com/?attachment_id=123
subdomain.domain1.com/?attachment_id=123


with this :

RewriteBase /
RewriteRule ^index\.php$ - [L]


Wordpress seems to handle further rewriting of URLs internally, so that they get the format set in WP's admin settings, for instance :

domain1.com/2010/07/01/sample-post/


Wordpress also needs access to the wp-admin folder to access a number of .php files :

domain1.com/wp-admin/all-kinds-of-settings.php


And to uploaded file folder :
domain1.com/wp-content/uploads/2010/06/image_filename123.jpg


The full contents of the WP .htaccess are these :

# BEGIN WordPress

RewriteEngine On

RewriteBase /
RewriteRule ^index\.php$ - [L]

# uploaded files
RewriteRule ^files/(.+) wp-includes/ms-files.php?file=$1 [L]

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
RewriteRule . index.php [L]

# END WordPress


I don't understand the rewrite being done for uploaded files here.

Incoming links for the old wiki are of the form :

domain1.com/index.php?title=Article_Title_Here
domain1.com/index.php?title=You%27ve_Got_Mail%21
domain1.com/index.php?title=Maxim_maskingev%C3%A6r


Where new URLs are

domain2.dk/index.php?title=Article_Title_Here
domain2.dk/index.php?title=You%27ve_Got_Mail%21
domain2.dk/index.php?title=Maxim_maskingev%C3%A6r


where %27 is a ' and %21 is a !
The %C3%A6 (æ) part troubles me a little. When I do a simple redirect of the entire site as posted above the wiki has no trouble translating to the right article, but the URL is displayed right in the address bar of the browser with the Danish character æ : domain2.dk/index.php?title=Maxim_maskingevær

This is as everything should be.

But when rewriting the URL with the code in my first post in this thread, the wiki says the link was malformed and gives an empty page, as it doesn't seem to recognize the %C3%A6 as an æ. I don't get it, as the redirect and the rewrite seems to do nearly the same thing. Reason I mention this is because I may need to rewrite these chars, if they are not natively picked up by the wiki?

The wiki also uses image paths such as these :

/images/f/ff/image.png


where f and ff are hashed uploaddirectories and can be anything from a-z and 0-9

However, as I doubt there are many inbound links directly to the files, I can live with that they are not redirected. They are used and intended to be used internally by the wiki, at which point the user has already been redirected.

jdMorgan

1:43 am on Jul 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Before getting to 'code' I would like to strongly recommend that you 'install' your wiki in a subdirectory named '/wiki' or put WordPress in a subdirectory named /wp. Otherwise, you will have no simple way to detect their URLs separately.

To make your code domain-specific and fix the "/index.php" == "/" and encoded-character problems, look at this example:

RewriteCond %HTTP_HOST ^(www\.)?domain1\.dk
RewriteCond {QUERY_STRING} ^title=([^&]+)
RewriteRule ^(index\.php)?$ http://domain2.dk/?%1 [NE,R=301,L]

The [NE] flag should take care of the encoded "ae" character problem. If it doesn't, then you'll need to manually extract the query string title parameter using THE_REQUEST instead of QUERY_STRING :

RewriteCond %HTTP_HOST ^(www\.)?domain1\.dk
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(index\.php)?\?title=([^&]+)(&[^\ ]+)\ HTTP/
RewriteRule ^(index\.php)?$ http://domain2.dk/?%2 [NE,R=301,L]

Jim

MortenBlaabjerg

1:48 pm on Jul 22, 2010 (gmt 0)

10+ Year Member



Thanks a lot for those suggestions, Jim, and sorry for the late reply. I appreciate very much that you took a look at this.

Re: subdirectories : Could a solution be to put the new install in a subdirectory (in which case I need to move it) and then rewrite those urls so that they look like the root? (wouldn't want too long URL's)

Neither the first or the second suggested RewriteRule seems to work. Nothing gets redirected at all. Tried a page with and without the www. as well as without the index.php?title bits.

"Page not found" for all articles and no redirection of anything it seems. Trying to analyze why, but find it very difficult. Does QUERY_STRING automatically pick up the requested query part after "index.php?" ? Why exclude the & from the character class [^&] in the first example?

Doublechecked that I put in all the right domain names.

g1smd

7:12 pm on Jul 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



and then rewrite those URLs so that they look like the root?

Mod_Rewrite does not change URLs. What it does is accept URL requests from the outside world and then either tell the outside world to make a new request for a different URL OR it modifies the implied internal server path (implied by the path part of the URL request) so that it points to some other place inside the server.

So, when user requests www.example.com/somepage on your server there has to be some clue in the URL as to whether that URL should be dealt with by your wiki or by Wordpress. Putting one or both of them in a folder fixes that, just as long as that folder name appears in the URL as "the clue".

QUERY_STRING picks up everything after the question mark.

The pattern ([^&]+) matches all characters "until the next ampersand" (or the end of the query string, if that occurs first).

MortenBlaabjerg

8:50 pm on Jul 22, 2010 (gmt 0)

10+ Year Member



Thanks for pointing that out to me. I can see that now, and that what I wrote hardly made any sense ;-) Sorry. Just as soon as I think I understand something, I really don't.

For various reasons I don't find it optimal in my case. Shouldn't it be simple to detect whether "index.php?title=" or "index.php?p=" is used, and rely solely on this? As long as true articles are redirected, I can live with that links to files etc. are not.

EDIT : On a second thought, incoming requests will certainly really be different, as requests for the WP site will always be of the "(subdomain.)domain2.dk/2008/11/23/post-title-here" sort or "/category/categoryname" or "/tag/tagname" sort etc. Incoming requests for the wiki will be of the old full type "/index.php?title=" So no incoming requests for the WP-site will carry the index.php. Won't this make life a lot easier?

Thanks for clearing up the character class. Still trying to decipher what's going on with the help of a guidebook and your forum, but progress is slow.

g1smd

10:27 pm on Jul 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Shouldn't it be simple to detect whether "index.php?title=" or "index.php?p=" is used, and rely solely on this?

No. Your RewriteRule selects what internal resource will be used, so the clue has to be in the URL... either a specific "word" or a particular format, like your date part prefix.

You can make "the clue" whatever you want it to be - and then you design your RewriteRule so that it operates when only that type of URL request arrives at the server.

So, you need to make a list of all of the different "types" of URL that can exist, as well as what internal filepath will deal with the request to deliver the content.

Only then can you code the rewrite rules with any chance of success.