Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite - sub directory to query string with exceptions

mod rewrite

         

loveunit

1:43 am on Mar 19, 2010 (gmt 0)

10+ Year Member



I've read a million tutorials and posted in this forum a number of times before - I'm sorry to say, I simply cannot compute reg_exp...

long story short:

I need to find a solution to rewrite selected non-existent sub directories to a querystring - the only exceptions are that the home page of the site loads if no sub directory is passed and that a selection of existent sub-directories containing system files are no rewritten.

I'd like to be able to allow users to access their files at domain.com/username - which should be rewritten to domain.com/users?user=username

here's an example structure, this is the root folder

system_1
system_2
includes
tools
users

here's my code - it re-writes to the user files fine, but does not allow the home page to show and rewrites requests inside the system folders - basically, it barely works!

RewriteCond %{HTTP_HOST} (www\.)?domain\.com
RewriteCond %{REQUEST_URI} !^/System_1/ [NC]
RewriteCond %{REQUEST_URI} !^/System_2/ [NC]
RewriteRule ^(.+)$ /users/user=$1 [L]

any help would be very much appreciated - thanks in advance.

loveunit

4:27 pm on Mar 19, 2010 (gmt 0)

10+ Year Member



using sub domains would also be a good solution: I've set up wild card DNS and my host has edited my virtualhosts files accordingly:

here's my code:

# rewrite rules ##
<IfModule mod_rewrite.c>

# switch on mod_rewrite ##
Options +FollowSymlinks
RewriteEngine On
RewriteBase /

# For non-www subdomain requests, the query string parameter 'user' is taken from the requested subdomain (%1)
rewriteCond %{HTTP_HOST} ^([^.]+)\.domain\.com [NC]
rewriteCond %1 !^(www|wiki|forum|login)$ [NC]
rewriteCond $1 !^HTML/ [NC]
rewriteRule ^([^/.]*)(.*)$ users.php?user=%1 [NC,L]

</IfModule>

I'm waiting for the DNS to propagate, but wonder if anyone can give some feedback on this - will it work, what's missing?

thanks.

jdMorgan

10:08 pm on Mar 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code will loop because "users.php" is not excluded from being rewritten (again).

The second set of parentheses in the RewriteRule is unnecessary because $2 is never back-referenced. Therefore, the enclosed ".*" subpattern and trailing end-anchor are also unnecessary...

And therefore, this rule will create "duplicate content" unless users.php check that both the URL-path-part back-referenced as $1 and anything that follows it, is valid. Otherwise, your 'friends' may cause you problems by linking to such URLs as "joe.example.com/sleazy-merchant-with-shoddy-products/true-junk", and your code will happily rewrite that to /users-php?user=joe, and produce a page which will then be associated in search with the terms in that URL...

Jim

loveunit

9:37 am on Mar 21, 2010 (gmt 0)

10+ Year Member



Hi Jim - and thanks for your reply.. amazing after all these years you're still willing to offer help to people lost in rewrite voodoo!

I made a mistake in the code I copied in, the 3rd condition should have taken out the loop - but I simplified the code for examples sake - in fact it's most or less copied from something you posted before on WebmasterWorld - here's the updated version:

--------------------------------

# rewrite rules ##
<IfModule mod_rewrite.c>

# switch on mod_rewrite ##
Options +FollowSymlinks
RewriteEngine On
RewriteBase /

# For non-www subdomain requests, the query string parameter 'user' is taken from the requested subdomain (%1)
rewriteCond %{HTTP_HOST} ^([^.]+)\.domain\.com [NC]
rewriteCond %1 !^(www|wiki|forum|login)$ [NC]
rewriteCond $1 !^users.php [NC]
rewriteRule ^([^/.]*) users.php?user=%1 [NC,L]

</IfModule>

-------------------------

is that better - the code, with the change to the 3rd condition works very well, but I cannot see the logical faults that you seem to perceive so easily.

I will do some very secure checking and send the required headers for existing and none-exiting users account - pulled from a DB.

thanks again for your generous help.

jdMorgan

2:25 pm on Mar 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That looks OK, other than an un-escaped literal period and incorrect casing of the directives and the duplicate-content problem previously mentioned.

You also do not need the <IfModule> container <i>unless you want this code to fail silently if mod_rewrite is not installed on this server.</i> Otherwise, it is a waste of filespace and CPU time.

--------------------------------
#
# rewrite rules
#
# switch on mod_rewrite
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
#
# For non-www subdomain requests, the query string parameter 'user' is taken from the requested subdomain (%1)
RewriteCond %{HTTP_HOST} ^([^.]+)\.domain\.com
RewriteCond %1 !^(www|wiki|forum|login)$
RewriteCond $1 !^user[b]s\.p[/b]hp
RewriteRule ^([^/.]*) users.php?user=%1 [L]
#
-------------------------


You should also almost never use [NC] in internal rewrites. Any casing errors, if present in the requested URL, should have already been corrected by redirecting the request to the properly-cased URL. Otherwise, you are again creating duplicate content.

Bear in mind that for any 'page' or 'content' on the Web, that page should be directly-reachable with one and only one unique, canonical URL. *Any* difference in the requested URL from that canonical URL should trigger a 301 redirect to that canonical URL. This includes the protocol (http vs. https) the hostname (www vs. non-www, etc.) FQDN vs non-FQDN -format domain (trailing period after hostname), appended port numbers, "/" vs. "/index.xyz", character-casing, and query strings -- If any of these change, then that is a different and non-canonical URL.

If these non-canonical URL problems are not addressed, it is often possible to have 16 or more "home page" URLs competing with each other for links and for search results ranking... and if query strings are not canonicalized (or removed), it is possible to have a practically-infinite number of "home pages" all competing with each other -- You are competing with yourself! This is not good. It is an exploitable weakness in your site, and it is not a situation that you want to create or to allow to exist.

If you cannot 301 redirect a non-canonical URL to a corrected URL, then let that non-canonical URL request go 404. This is far better than rewriting it to content and returning a 200-OK response.

Jim

loveunit

3:11 pm on Mar 21, 2010 (gmt 0)

10+ Year Member



thank you very much.

I will look further into addressing all the duplicate content issues you have highlighted as I develop the code further - for now the theory works, which is a relief, just needs to be fine-tuned.

thanks again.

loveunit

9:41 am on Apr 16, 2010 (gmt 0)

10+ Year Member



well, the code is now working on the test site, but now there are a couple more requirements, not envisioned before.

basically, to make the site SEO happy and bookmarkable, we need to pass the querystring, ideally also hash anchors - however, I think that sub directories would look nicer.

so - and example:

the current code to redirect sub domains to username, via the querystring - where the sub domains do not exist - thanks to wild-card DNS:

rewriteCond %{HTTP_HOST} ^([^.]+)\.domain\.com
rewriteCond %1 !^(www|wiki|forum|login)$
rewriteCond $1 !^folder/
rewriteRule ^([^/.]*) folder/user.php?user=%1 [L]

this works fine, but has some odd behaviour, the code added to add hash anchors for browser history no longer manipulates the browser address bar - no hashes are added..

so, I now need to find a way to pass page names, ideally via folders, such as:

username.domain.com/pagename/

rewrites to

domain.com/folder/user=username&page=pagename

but without breaking the working conditions above.. I've tried several million variations, but no success yet

my best effort is:

rewriteRule ^([^/.]*)(.*)$ folder/user.php?user=%1&folder=$2 [L]

but that simply shows the folder of the rewrite rule

any help would be great - thanks in advance.

jdMorgan

4:15 pm on Apr 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You need two rules, one to work with "page names" and other (your original) to work without them.

"Hashes" are never passed from the browser to the server, so the server cannot manipulate them. They are considered to be "on-page" navigation elements, and therefore, since servers don't navigate within pages, they mean nothing to a server -- it only serves "whole pages."

This is one reason that Google is pushing to use a different "hash" indicator for AJAX -- The current dual-usage of "#" for both AJAX and for "named anchors" a.k.a. "URL-fragments" is incompatible with current HTML design rules and servers.

Jim

loveunit

7:46 pm on Apr 17, 2010 (gmt 0)

10+ Year Member



HI Jim, thanks as ever for taking the time to reply

I've not got my 2 rules just about sorted, but I've got an issue with non-www requests, which need to go to the same place as www, but at the moment simply load to the domain root - for now I'm doing this via a redirect in php to the www version of the site

here's the code:

# send [www...] || www requests to user "home"
rewriteCond %{HTTP_HOST} ^(|www)\.domain\.[a-z]{2,3}
rewriteCond $1 !^folder/
rewriteRule ^([^/.]*) folder/user.php?user=home [QSA,L]

# For non-www subdomain requests, the query string parameter 'user' is taken from the requested subdomain (%1)
rewriteCond %{HTTP_HOST} ^([^.]+)\.domain\.[a-z]{2,3}
rewriteCond %1 !^(www|wiki|forum|login)$
rewriteCond $1 !^folder/
rewriteRule ^([^/.]*) folder/user.php?user=%1 [QSA,L]

this is the line that is not working:

rewriteCond %{HTTP_HOST} ^(|www)\.domain\.[a-z]{2,3}

as it does not recognise the empty http_host - I also tried:

rewriteCond %{HTTP_HOST} ^$

but that did not seem to help..

how can I manage this all from .htaccess without needing to rely on an ugly php redirect?

any ideas.. thanks again.

jdMorgan

10:22 pm on Apr 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd suggest:
 RewriteCond %{HTTP_HOST} ^(www\.domain\.[a-z]{2,3}\?(:[0-9]+)?)$ 

Jim

loveunit

7:18 pm on Apr 18, 2010 (gmt 0)

10+ Year Member



thanks Jim, but that actually stops the www rewrite working also, now both www and non-www, but not another sub domain simply load the domain root directory.

loveunit

7:21 pm on Apr 18, 2010 (gmt 0)

10+ Year Member



also, I'm unclear if "http://" is a part of HTTP_HOST or not - could you please clarify?

if not, is it possible to run a check to see if there is no sub domain ( HTTP_HOST? ) and combine that with the rule for the www sub domain?

Cheers again.

g1smd

8:33 pm on Apr 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



HTTP_HOST holds just the hostname itself - no http, no path.

jdMorgan

12:41 am on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Typo in suggested pattern above... Should be:

RewriteCond %{HTTP_HOST} ^(www\.domain\.[a-z]{2,3}\?(:[0-9]+)?[b])?$[/b]

to allow a match on www- and blank hostnames.

If you also want to match the non-www hostname, then

RewriteCond %{HTTP_HOST} ^((www\.)?domain\.[a-z]{2,3}\?(:[0-9]+)?)?$

Jim

loveunit

7:53 pm on Apr 19, 2010 (gmt 0)

10+ Year Member



oddly, neither of those solutions seem to work, they both are ignored.. is there a problem due to the rules that follows?

thanks.

g1smd

8:39 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did you completely clear the browser cache before testing again?

loveunit

9:00 pm on Apr 19, 2010 (gmt 0)

10+ Year Member



I did, and I can see that it's clear as once I return to the semi-working code, it goes back to semi working..

Any ideas - thanks?

jdMorgan

1:08 pm on Apr 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It might prove helpful to re-post the relevant parts of your code, as we cannot be sure where you've made changes, or whether they are correct.

Jim

loveunit

7:34 pm on Apr 20, 2010 (gmt 0)

10+ Year Member



good point - the code has not changed really, here is all the entire mod-rewrite part of the root .htaccess - there are no other .htaccess files in sub directories:

# switch on mod_rewrite ##
Options +FollowSymlinks
RewriteEngine On
RewriteBase /

# send http:// ( no sub domain ) OR http://www requests to user "home"
rewriteCond %{HTTP_HOST} ^(www)\.example\.[a-z]{2,3}
rewriteCond $1 !^HTML/
rewriteRule ^([^/.]*) HTML/user.php?user=home [QSA,L]

# For non-www subdomain requests, the query string parameter 'user' is taken from the requested subdomain (%1)
rewriteCond %{HTTP_HOST} ^([^.]+)\.example\.[a-z]{2,3}
rewriteCond %1 !^(www|wiki|forum|login)$
rewriteCond $1 !^HTML/
rewriteRule ^([^/.]*) HTML/user.php?user=%1 [QSA,L]

[edited by: jdMorgan at 2:03 am (utc) on Apr 21, 2010]
[edit reason] Changed domain to "example" and de-linked. [/edit]