Forum Moderators: phranque

Message Too Old, No Replies

Some advanced rewriting issue

         

gregra

9:20 pm on Mar 22, 2010 (gmt 0)

10+ Year Member



Hi everyone.
I'm not to much experienced with mod_rewrite so please bear with me.
I'm running a MediaWiki (PHP) farm with all my php files inside a main folder named "wikis" (so that all the files are under http://example.com/wikis/). Now what the MediaWiki can do is simulate the path of the files with a "dummy path" with a special variable ($wgArticlePath).
In a simple MediaWiki installation $wgArticlePath is equal to the actual folder of the files, aka "wikis". In my case, $wgArticlePath will be different for every wiki (there will be hundreds or maybe thousands). So for example if $wgArticlePath is going to be site1, I will have to have this rewrite rule:

RewriteRule ^site1/(.*)$ /wikis/index.php?title=$1 [PT,L,QSA]
RewriteRule ^site1/*$ site1/ [L,QSA]
RewriteRule ^site1w/(.*)$ /wikis/$1 [PT,L,QSA]

The last line is needed for another variable that have a similar purpose as $wgArticlePath and differs $wgArticlePath only by "w" at the end.

Now, I can't do this rewrites manually and add new rewrites rules for every new wiki (I can, but then my .htaccess will be HUGE).

How can I translate these rules to something general?
So that instead of "site1" I will have any general path and rewrite it to the "wikis" folder and the same for "site1w", accordingly to the rules I mentioned above.

Thanks

g1smd

10:16 pm on Mar 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is the syntax of the second line correct? I don't think /* is what you want.

Also what is the *exact* URL request format that should match this pattern?

It is likely that nothing will ever match the second rule, as the first rule will already match the request.

gregra

7:11 am on Mar 23, 2010 (gmt 0)

10+ Year Member



Yes, it's correct.

There are several requests format:
1) http://example.com/site1
2) http://example.com/site1/"some page"
3) http://example.com/site1w/index.php?title="some page"

The 3 rewrites rules above rewrite perfectly all these types of requests.

It is likely that nothing will ever match the second rule, as the first rule will already match the request.

If the user asks for http://example.com/site1, the first rule isn't enough, and he will get a 404.
The second rule rewrites http://example.com/site1 to http://example.com/site1/ and http://example.com/site1/ rewrites to http://example.com/wikis/index.php?title=$1.

g1smd

8:06 am on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Surely, the pattern
^site1/*$
matches URL requests for
example.com/site1
and
example.com/site1/
and
example.com/site1//
and
example.com/site1///
and
example.com/site1////
etc?

I assuming that at least one of the rewrites is supposed to be a redirect, correcting URL requests to ad a trailing slash.

jdMorgan

12:24 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No doubt that the pattern is incorrect, and we will save time by stating that directly.

Also, unless the request is to be passed to mod_proxy or needs to be handled by a subsequently-executed module, the [PT] flag is most likely not needed.

Jim

gregra

1:09 pm on Mar 23, 2010 (gmt 0)

10+ Year Member



You know a picture is worth (at least) a thousand words, so here is one of my dev wikis:
<snip>

My .htaccess has:

RewriteRule ^test/(.*)$ /wikis/index.php?title=$1 [PT,L,QSA]
RewriteRule ^test/*$ test/ [L,QSA]
RewriteRule ^testw/(.*)$ /wikis/$1 [PT,L,QSA]

[edited by: jdMorgan at 1:13 pm (utc) on Mar 23, 2010]
[edit reason] Please see TOS and Apache Forum Charter [/edit]

jdMorgan

1:16 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please address the previously-posted comments. We cannot proceed until everyone understands --in terms of client-requested URL-paths and their desired disposition to server filepaths-- exactly what it is that you're trying to accomplish.

In addition, adding comments to your rules stating the intended purpose of each might prove quite helpful to all.

Thanks,
Jim

gregra

1:31 pm on Mar 23, 2010 (gmt 0)

10+ Year Member



As I mentioned before:
rewrite everything with http://example.com/site1/"some page" to /wikis/index.php?title=$1 and rewrite everything with http://example.com/site1w/"some page" to /wikis/$1. And also would be nice to rewrite http://example.com/site1 to http://example.com/site1/index.php

BTW: If you will go to my dev wiki site, you can see how the rewriting works...

gregra

1:33 pm on Mar 23, 2010 (gmt 0)

10+ Year Member



And I forgot to mention that site1 is only an example. It can be anything that the user types in the request URL after http://example.com

jdMorgan

2:47 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We cannot allow invitations to "go to my site" here. Please understand that given the number of threads dealt with here, we can have no idea if the sites are completely free of malware or not. I cannot afford to "find out the hard way," and neither can most users.

Many coding errors and omissions in posts create obvious exploitation opportunities. Posting the URL publicly is an open invitation. Posting your domain will see this thread rank highly for searches on your domain -- often within 20 minutes of the post. 'nuf said -- It can be bad.

The description in your previous post illustrates the problem I mentioned quite well, and the reason that I asked for details of URL-paths to be rewritten and their desired dispositions, and for comments in the code. I can easily give you a rule that will 'rewrite everything with http://example.com/site1/"some page" in the URL' and the result will likely be a very slow server, search rankings destroyed within the next week, and a site that doesn't work due to infinite looping on "index.php" requests.

That's because "everything" includes index.php itself, robots.txt, sitemap.xml, all images, css and external JavaScript files, documents, spreadsheets, and multimedia files -- and likely many other URLs that should not be rewritten to the wiki script.

What we need first is a solid specification. Only after that is done can code be safely and efficiently discussed. You'll likely end up with something like:

# Declare "index.php" as directory index page
DirectoryIndex /index.php
#
# Bypass two following internal rewrite rules for infrastructure
# files and known css, script, and media filetypes
RewriteCond $1 ^(index\.php|robots\.txt|sitemap\.xml)$ [OR]
RewriteCond $1 \.(gif|jpe?g|png|css|js|ico|swf|flv|wmv|mp3|pdf|doc)$
RewriteRule ^(.+\.[a-z0-9]+)$ - [S=2]
#
# Rewrite /test/<whatever> URL requests to /wikis/index.php?title=<whatever>,
# retaining any existing client-requested query parameters.
RewriteRule ^test/(.*)$ /wikis/index.php?title=$1 [QSA,L]
#
# Rewrite /testw/<whatever> URL requests to /wikis/<whatever>
RewriteRule ^testw/(.*)$ /wikis/$1 [L]

Jim

gregra

2:58 pm on Mar 23, 2010 (gmt 0)

10+ Year Member



OK, but
RewriteRule ^test/(.*)$ /wikis/index.php?title=$1 [QSA,L] 

is for specific path - "/test". What I need is a general rule that involves any path and not just "test", this is the first portion of the request URI of the user.

jdMorgan

3:22 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The pattern ^test/(.*)$ matches any requested URL-path which starts with "test" followed by a slash, and zero or more of any additional characters... So I'm not sure what you mean, or if you are calling this a "specific" path, or how/where you might want any "non-specific path" to be rewritten.

Sorry, but the devil is in the details, and mod_rewrite is 99.999% pure details... Lots and lots of devils, IOW.

Jim

gregra

3:29 pm on Mar 23, 2010 (gmt 0)

10+ Year Member



OK, what if every wiki will be under http://example.com/sites so that one wiki will be in http://example.com/sites/site1, the other one in http://example.com/sites/site2 and so on, will this work:

RewriteRule ^sites/(.*)$ /sites/wikis/index.php?title=$1 [PT,L,QSA]
RewriteRule ^sites/(.*)w$ /wikis/$1 [PT,L,QSA]

?

jdMorgan

5:43 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, because the first rule matches anything the second might ever match, and therefore the second rule will never execute. It also appears that I may be wasting my time trying to show you better methods and describing their advantages... :(

Jim

jdMorgan

5:53 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Reversing the rules would help.

Lose the [PT] flags unless these requests need to go through mod_proxy after being rewritten...

Jim

gregra

6:08 pm on Mar 23, 2010 (gmt 0)

10+ Year Member



OK, I will try it.
Thanks