Forum Moderators: phranque

Message Too Old, No Replies

URL re-writing for MediaWiki in htaccess

/kb/Main_Page to /kb/

         

badbadmonkey

9:59 am on Nov 8, 2008 (gmt 0)

10+ Year Member



Hi, brief question cos this stuff does my head in, and I can't see how to do this without getting an infinite loop.

I have a MediaWiki install which is using Rewrites as below to generate friendly URLs (the scripts are in /wiki/ ). So, the URL /kb/Article_Name actually calls: /wiki/index.php?title=Article_Name

All well and good and working fine.

RewriteCond %{REQUEST_FILENAME} !-f 
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^kb/(.+)$ /wiki/index.php?title=$1 [L,QSA]
RewriteRule ^kb*$ /kb/ [r=301,L]
RewriteRule ^kb/*$ /wiki/index.php?title=Main_Page [L,QSA]

What I would like to do is hide the Main_Page (default index) from the root URL, i.e. the URL to resolve to just /kb/ instead of /kb/Main_Page, the latter being what it does now.

So a request for /kb/ will stay as such, while calling /Main_Page, and an explicit request for /kb/Main_Page will resolve to /kb/ (if possible?).

I tried

RewriteRule ^kb/Main_Page$ /kb/ [L,QSA]

appended to the existing rules above, but it did not work.

I think I need to modify the last line with the ^kb/*$, but I can't think what. Some help would be much appreciated.

jdMorgan

5:12 pm on Nov 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What is this line intended to do -- What is the input URL from the client, and what is the expected output?

RewriteRule ^kb*$ /kb/ [r=301,L]

Jim

badbadmonkey

3:34 am on Nov 9, 2008 (gmt 0)

10+ Year Member



Well it is a virtual directory, so that means that /kb redirects to /kb/

Don't want multiple URLs presenting the same content.

jdMorgan

2:26 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your regular expressions pattern "^kb*$" says, "match URL-paths beginning with "k" and ending with zero or more "b" characters."

Therefore, URL-paths, "k", "kb","kbb", and "kbbbbbbbbbb" will all match your pattern.

See the regular expressions tutorial cited in our Forum Charter.

Furthermore, the protocol and domain name are missing from the substitution URL, and your external redirect rules should all be placed ahead of your internal rewrites to avoid "exposing" the internally-rewritten filepaths.

Jim

badbadmonkey

2:54 pm on Nov 9, 2008 (gmt 0)

10+ Year Member



As I said above it all works fine, and I cannot think of a failure case that illustrates your points, correct though I'm sure they are (can you provide one?). What does the [domain...] absence matter if it works?

Anyway, I have changed the existing code to:

RewriteCond %{REQUEST_FILENAME} !-f 
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^kb$ /kb/[r=301,L,NC]
RewriteRule ^kb/(.+)$ /wiki/index.php?title=$1[L,QSA]
RewriteRule ^kb/$ /wiki/index.php?title=Main_Page[L,QSA,NC]

This gives the following desired behavior:

  1. domain.com/kb redirects to domain.com/kb/ without case sensitivity
  2. domain.com/kbb* 404s
  3. domain.com/kb/Article returns real script at domain.com/wiki/index.php?title=Article
  4. domain.com/kb/ resolves to domain.com/kb/Main_Page, presenting real script at domain.com/wiki/index.php?title=Main_Page

It is #4 I wish to alter, I want the URL to remain presented as the naked directory, but the obvious ways I can think of do not work. In fact I cannot see why it is not already doing this. Perhaps MediaWiki is internally redirecting to an article page which then follows #2? If so, how to do an exception...?

jdMorgan

3:52 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If your host sets "UseCanonicalName on" or if you move your site to a server with that setting, then any redirect that you have specified will fail if you did not provide a protocol or domain in the substitution URL. The usual problem is that UseCanonicalName is "on" and the canonical name for the server is configured as example.com, whereas the domain used in all (or most) links on your site and on the Web is www.example.com. The opposite can also occur, but is much less likely.

The result is that all redirects go to the "wrong" domain (example.com instead of www.example.com), the search engines see duplicate content and get confused about which is your canonical domain, and your search engine rankings tank.

In server config, everything matters, and I would not have bothered to point out that problem if it didn't. When we get this code working, you'll also need to add directives to canonicalize the protocol, domain, FQDN, port numbers, script URLs, and other sources of duplicate content.

---

Another problem was with your uppercase or mixed-case "/kb/" requests. To the extent possible, rewrite only exactly-correct URLs to filepaths, and redirect any URLs which have anything wrong with them. This avoids duplicate-content vulnerabilities (e.g. malicious linking), and loss of search rankings due to duplicate-content problems. To be clear, to a search engine, "/kb/" and "/Kb" are two entirely-different URLs, and each will be evaluated and ranked separately. Allowing this situation to exist can result in the page's rank being split across the two URLs, or the "wrong" URL appearing in search results. Clever competitors might detect this vulnerability and intentionally arrange for links to your site using non-canonical URLs in order to trigger this flaw and demote your pages in search results. Nice game, huh?

---

I don't know if your wiki does a redirect; Use the Live HTTP Headers add-on for Firefox/Mozilla to "watch" a request for such a URL, and see if an unexpected redirect response is happening.

---

I'm not clear on what URL you are trying to obscure, so the rules below for "Main_page" are just examples, and you may need to tweak the URL-paths to do what you want.


# Externally redirect direct client requests (only) for Main_Page back to "friendly" URL
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /wiki/index\.php\?title=Main_Page(&(.*))?$
RewriteRule ^wiki/index\.php$ http://www.example.com/kb/?%2 [R=301,L]
#
# Externally redirect direct client requests (only) for script URLs back to "friendly" URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /wiki/index\.php\?title=([^&]+)(&(.*))?$
RewriteRule ^wiki/index\.php$ http://www.example.com/kb/%1?%3 [L]
#
# Externally redirect requests for /kb/Main_Page URL to /kb/
RewriteRule ^kb/Main_Page$ http://www.example.com/kb/ [NC,R=301,L]
#
# Externally redirect requests for upper-or mixed-case /kb/ to lowercase /kb/
RewriteCond $1 [KB]
RewriteRule ^(kb)/(.*)$ http://www.example.com/kb/$2 [NC,R=301,L]
#
# Externally redirect to canonicalize the domain name, including case, FQDN, or port number
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite requested /kb/ URL-paths which do not resolve to existing files to /wiki/index.php script
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^kb/(.+)$ /wiki/index.php?title=$1 [QSA,L]
#
# Internally rewrite requests for exact /kb/ URL-path to /wiki/index.php with Main_Page as calling parameter
RewriteRule ^kb/$ /wiki/index.php?title=Main_Page[QSA,L]

Jim

[edited by: jdMorgan at 3:55 pm (utc) on Nov. 9, 2008]