homepage Welcome to WebmasterWorld Guest from 54.225.1.70
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
adding a trailing slash in some rewrite rule
adding a trailing slash in some rewrite rule
adrianTNT




msg:4030058
 2:09 pm on Nov 23, 2009 (gmt 0)

Hello.

I have a domain that I use for domain parking, let's say parking.com, in root of this site I have a folder "domains" and I will keep all content of my parked domains inside this folder; actual directory structure on disk is:
parking.com/domains/somesite.com/
parking.com/domains/someothersite.com/

I am doing this by htaccess so that by accessing some-site.com it will show the content of parking.com/domains/some-site.com
I have a trailing slash problem with my code:
If I enter in browser somesite.com/sample/ it works ok but if I enter somesite.com/sample it redirects incorrectly in address bar, it goes to http://somesite.com/domains/somesite.com/sample/ but shows correct content though.
Is there a way to maybe redirect somesite.com/sample to somesite.com/sample/ ?

Any advices/fixes are welcome.
Let me know if I was unclear.
Thank you.

=================================================
Options +FollowSymLinks
RewriteEngine On

# Rewrite requests for <anything>.<domain>.<tld> to /domains/<domain>.<tld> subdirectory
# except for domain name parking.com, this will load from root
#
RewriteCond %{HTTP_HOST} !parking.com
RewriteCond %{HTTP_HOST} !208.xyz.209.186
RewriteCond $1 !^domains

# with or without www redirect to a folder without www
RewriteCond %{HTTP_HOST} ^(www.)?([^.]*)\.(.*)$ [NC]
RewriteRule ^(.*)$ /domains/%2.%3/$1 [L,QSA]

# redirect to index withhout any "if file exists" exception, otherwise root "/" exists and directory list will be printed
RewriteRule ^domains/([^/]*)/$ /domains/?domain=$1

# allow for existent files, redirect otherwise
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^domains/([^/]*)/(.*)$ /domains/?domain=$1&page=$2 [L,QSA]

[edited by: jdMorgan at 2:41 pm (utc) on Nov. 23, 2009]
[edit reason] Obscured specifics [/edit]

 

jdMorgan




msg:4030144
 4:28 pm on Nov 23, 2009 (gmt 0)

Disable MutliViews and Indexes to simplify this code, and do not use two rules in a row to rewrite to the script in the /domains/ subfolder. Using two rules in a row triggers a known bug in mod_rewrite that re-injects part of the path into the req_rec variable (causing the extra "/sample" in your path). Instead, do the whole process all in one rule or the other:

Options +FollowSymLinks -Indexes -MultiViews
RewriteEngine on
#
# If the requested parked domain URL-path resolves to an existing file
# or directory, rewrite the request to the parked domain's subfolder
RewriteCond $1 !^domains/
RewriteCond %{HTTP_HOST} !parking\.com [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.]+(\.[a-z]+)+)\.?(:[0-9]+)?$ [NC]
RewriteCond %{DOCUMENT_ROOT}/%2/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/%2/$1 -d
RewriteRule ^(.*)$ /domains/%2/$1 [L]
#
# Else if the requested parked domain URL-path does not resolve to an existing file or
# directory, rewrite the request to the index script in the parked domain subfolder
RewriteCond $1 !^domains/
RewriteCond %{HTTP_HOST} !parking\.com [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.]+(\.[a-z]+)+)\.?(:[0-9]+)?$ [NC]
RewriteRule ^(.*)$ /domains/?domain=%2&page=$1 [QSA,L]

Note that it is not necessary to check for your server's IP address, because an IP-address hostname will not match the pattern of the hostname RewriteCond.

The hostname pattern has been modified to accept additional hostname variations such as www.parked.co.uk, FQDN-format hostnames (with a trailing period, as in "example.com."), and hostnames with appended port numbers (as in "www.example.com:80"), all of which are valid.

Note that the "file exists" and "directory-exists" tests must look into the parked domain's subfolder itself, so this filepath is "constructed" using DocumentRoot + parked-domain-name + requested-filepath. Also, because they are highly-inefficient (very slow, resource-intensive) functions, these 'exists' checks are done last -- All other RewriteConds must match first before we call the OS filesystem to go check the disk.

Be careful with this code; No adjustments to variables, patterns, anchoring, RewriteCond order, or flags should be necessary. Everything here was done "on purpose" for correctness, efficiency, and robustness.

Jim

adrianTNT




msg:4030176
 5:05 pm on Nov 23, 2009 (gmt 0)

@jdMorgan: I hope you can help a bit more.
That code doesn't seem to detect existent files and folders. Even though I have a folder somedomain.com/sample , existent somedomain.com/sample.html is not detected eider, it seems to just match the last rule and go to that dynamic page that I have (/domains/?domain=%2&page=$1)

I am thinking it has to do with that %{DOCUMENT_ROOT} check, should/could I replace that with something else to find exsitent file/folder? I guess it should check for ^/domains/%domain%/%path% but I dont know how to write it.

jdMorgan




msg:4030203
 5:38 pm on Nov 23, 2009 (gmt 0)

Temporarily change that last RewriteRule line to a redirect for testing only and then check the "folderpath=" variable in your browsers address bar:

RewriteRule ^(.*)$ http://parking.com/domains/?domain=%2&page=$1&folderpath=%{DOCUMENT_ROOT}/%2/$1 [QSA,R=302,L]

Compare that "folderpath=" value to the actual filepath on the server. Usually when there is a problem with this construct, there is something missing or something extra in the tested filesystem path. You'll have to find out what the problem with that path is in order to adjust the -f and -d RewriteCond paths.

Jim

adrianTNT




msg:4030234
 6:15 pm on Nov 23, 2009 (gmt 0)

When typing in address bar somedomain.com/sample.html
It said that $folderpath is:

/var/www/html/somedomain.com/sample.html

I have to check in my server files but I think actual file on server would be

/var/www/html/PARKING.COM/domains/somedomain.com/sample.html

That is what should be tested if exsits or not in htaccess rule ?!

jdMorgan




msg:4030267
 7:00 pm on Nov 23, 2009 (gmt 0)

In that case, put "/parking.com" into the two RewriteConds' paths between "%{DOCUMENT_ROOT}" and "/%2".

That is, if what you say is correct, then %{DOCUMENT_ROOT}/%2/$1 should be %{DOCUMENT_ROOT}/parking.com/%2/%1

I encourage you to experiment, as it's likely going to be a lot faster than waiting for a reply here.

Jim

adrianTNT




msg:4030339
 8:08 pm on Nov 23, 2009 (gmt 0)

Yes, normally it goes faster if I experimnet myself but I am really really bad at these htaccess conditions.

I checked, I have the www root for parking.com directly into /var/www/html/ so all other domains are in /var/www/html/domains/somedomain.com , this means that condition would be changed to
RewriteCond %{DOCUMENT_ROOT}/domains/%2/%1
That seems to be correct but now I am back where I started.
[somedomain.com...] opens fine, [somedomain.com...] goes to [somedomain.com...]
With this code:

Options +FollowSymLinks -Indexes -MultiViews
RewriteEngine on
#
# If the requested parked domain URL-path resolves to an existing file
# or directory, rewrite the request to the parked domain's subfolder
RewriteCond $1 !^domains/
RewriteCond %{HTTP_HOST} !parking\.com [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.]+(\.[a-z]+)+)\.?(:[0-9]+)?$ [NC]
RewriteCond %{DOCUMENT_ROOT}/domains/%2/%1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/domains/%2/%1 -d
RewriteRule ^(.*)$ /domains/%2/$1 [L]
#
# Else if the requested parked domain URL-path does not resolve to an existing file or
# directory, rewrite the request to the index script in the parked domain subfolder
RewriteCond $1 !^domains/
RewriteCond %{HTTP_HOST} !parking\.com [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.]+(\.[a-z]+)+)\.?(:[0-9]+)?$ [NC]
RewriteRule ^(.*)$ /domains/?domain=%2&page=$1 [QSA,L]

jdMorgan




msg:4030356
 8:31 pm on Nov 23, 2009 (gmt 0)

Hmmmm... As you can see, both of these rules are internal rewrites, and they do not affect the client -- that is, they cannot change the browser address bar. So some other 'agent' is doing that.

I assume that this URL is supposed to be rewritten to /domains/?domain=somedomain.com&page=sample, since "sample" has no extension and therefore cannot exist as a 'real' file. So again, a good test would be to temporarily change the last rule back to an external redirect, and then see if you can 'catch' the server issuing two or more redirects -- either redirecting first to parking.com/domains/?domain=somedomain.com&page=sample and then to parking.com/domains/somedomain/sample/, or the other way around...

You will need to use a server headers checker to see this though, because the redirection will likely be so fast that you'll otherwise only see the last URL in your address bar. I use the Live HTTP Headers add-on for Firefox/Mozilla, although there are several other good ones.

Having seen what order the redirects take place in, it may be easier to figure out what agent is doing these redirects; Among the possibilities are mod_alias, mod_dir, mod_negotiation, mod_speling, or other mod-rewrite rules in your server configuration files, this .htaccess file, or another .htaccess file in /domains or /domains/somedomain.com. It's also possible that your script itself is doing the redirect.

Anyway, since this .htaccess code does not generate external redirects, the problem isn't confined to this code.

Jim

jdMorgan




msg:4030360
 8:37 pm on Nov 23, 2009 (gmt 0)

Oh, and if you're on Apache 2.x, you should try adding

DirectorySlash Off

at the top of this code.

Jim

adrianTNT




msg:4030373
 8:50 pm on Nov 23, 2009 (gmt 0)

I forgot to mention that before adding DirectorySlash Off, the root of a parked domain ( [somedomain.com...] ) returned:

Forbidden
You don't have permission to access /domains/somedomain.com/ on this server

Then after adding DirectorySlash Off, it shows this forbidden error even for somedomain.com/sample instead of doing that douplication in address bar.

Forbidden
You don't have permission to access /domains/somedomain/sample on this server.

Adding a trailing slash in address bar opens correct content (existing folder).

Any other tips? Sorry to waste your time, but for me it might take 20-30 hours of experiments to fix this myself.

jdMorgan




msg:4030382
 9:19 pm on Nov 23, 2009 (gmt 0)

As an experiment only, change the last rewriterule line to

RewriteRule ^([^/]*(/[^/]+)*)/?$ /domains/?domain=%2&page=$1/ [QSA,L]

Jim

adrianTNT




msg:4030399
 9:38 pm on Nov 23, 2009 (gmt 0)

With that line replaced at end it shows same doubled domain in URL:
somesite.com/sample > redirects to [somesite.com...]

And with slash at end it works ok:
somesite.com/sample/ > shows correct content (existent index in that folder)

And same forbiden error if I enter somesite.com in addres bar.

Maybe there is a way to just force site.com/sample to site.com/sample/

adrianTNT




msg:4030512
 12:41 am on Nov 24, 2009 (gmt 0)

I tested 6-7 more hours :O and best solution I found was to add this in front of my initial code:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}/)$
RewriteRule (.*)$ [%{HTTP_HOST}...] [R=301,L]

That would force redirect to an URL with trailing slash if address appears to be a directory but not ending in a slash.
Would this solution cause any problems? With Google maybe?
It does what I want but I don't know if it is correct.

jdMorgan




msg:4030517
 1:03 am on Nov 24, 2009 (gmt 0)

See my note above about ordering 'file exists' checks...

This would be much more efficient:

RewriteCond %{REQUEST_URI} !(\.[a-z0-9]{1,5}/)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]

because it avoids going out to check the disk *twice more* for every HTTP request to your server... It only checks if the trailing slash is missing and there is no 'filetype' on the URL.

But there's a problem with this at a much higher level, because you are rewriting URLs. Specifically, *no* URL in the filespace tested by %{REQUEST_FILENAME} will ever 'exist' unless it belongs to your parking.com site, and is hosted in the 'root' directory... Any of the pages in your parked domains will fail this test, even if they do exist, because at the point where these 'exists' checks are made, the URL-path has not yet been rewritten to the parked domain's folder.

So, you have four choices:
1) Do this check only after re-mapping to the parked domain filespace (and, unfortunatley, it appears that this did not work).
2) Check using the -U flag instead of -d and -f, to see if the URL will eventually resolve to a file after all rewriting is completed. This is *extremely* inefficient, even compared to the very-inefficient -f and -d checks, because it essentially causes mod_rewrite to 'call' your server again, watch itself translate the URL to a filepath, and then check to see if that filepath leads to an existing file or directory.
3) Forgo checking for 'exists' completely, and simply add a trailing slash to all extensionless URLs requested from your server.
4) Call your team and call your host, and find out what strange configuration or script is causing this problem. This is much better than just putting a band-aid on the problem and forgetting it. The problem that is causing this trouble may re-surface in another form later, and cause you even more grief. A normal server should not do what you are seeing here.

Jim

TheMadScientist




msg:4030882
 2:44 pm on Nov 24, 2009 (gmt 0)

Sorry to jump in here so late in the discussion, but personally, I'd go the other way and strip the trailing / from all requests, maybe this will work practically and more easily for some future reader who is not this far into adding them.

I'm not sure about jdMorgan's thoughts on the matter, but personally, I just strip the / and /index.ext if present, because it's such an easy check / rewrite to make happen when compared to what you seem to be going through. I haven't dug all the way into the thread to know exactly where the two of you are in the discussion, but thought I'd throw the idea in for those who may read this thread in the future.

Stripping the slashes and index.ext also eliminates the possibility of issues with duplicate content at /dir/ and /dir/index.ext and can be accomplished with a single rule.

jdMorgan




msg:4030892
 3:06 pm on Nov 24, 2009 (gmt 0)

Yeah, this thread discusses a deeper problem, though -- The mysterious invocation of an external redirect by an unknown agency when the trailing slash is missing...

As the code here comprises internal rewrites only, the source of that external redirect is currently unknown.

Jim

TheMadScientist




msg:4030906
 3:21 pm on Nov 24, 2009 (gmt 0)

I read through the thread a bit more, and my first thought is:

Are the files being accessed dynamic and is there a way they could be initiating the redirect, rather than it being done by mod_rewrite or some mysterious server configuration? I sometimes use PHP rather than mod_rewrite for redirects because I can eliminate rules which are generally unnecessary from my .htaccess and once in a while I forget I did, which usually gives me a headache and throws me for a loop for a bit...

adrianTNT




msg:4031300
 3:39 am on Nov 25, 2009 (gmt 0)

Thanks for your replies.

I checked for the php scripts, to make sure there isn't something else redirecting and making that double URL, but only thing that could do that would be /domains/index.php, I removed this file from server and when not forcing that trailing slash by htaccess it still did that double URL, so It must be something in my htaccess rules that makes the double URL.

@jdMorgan: I am adding the trailing slash by the last rules you recommended, I didn't really understood the problems you mentioned ("at much higher level...") or I did understood them but that is not happening. All the files inside my main domain (e.g parking.com) and files insinde parked domains seem to open nicely, so I guess I found my solution.
My only worry would be if anything I did would look fishy to Google, but I hope a 301 redirect would be ok with G :)

So what I use now is adding this in front of my initial code and seems to work:

RewriteCond %{REQUEST_URI} !(\.[a-z0-9]{1,5}/)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ [%{HTTP_HOST}...] [R=301,L]

TheMadScientist




msg:4031702
 6:39 pm on Nov 25, 2009 (gmt 0)

Have you tried testing rule-by-rule yet?

IOW: Either comment out all rules and uncomment until it breaks OR start commenting one rule at a time until it stops breaking? This should help you determine where the disconnect is happening in the file. It basically has to be either the commented rule, or something effecting the commented rule above it, so if it's not the commented rule you can keep moving that ruleset up the file one ruleset at a time until it stops breaking, then you know which ruleset if effecting the one cause the break negatively.

I hope this makes a bit of sense, but an example might be better:

RewriteRule #1 - Does Not Cause the Faulty Rewrite

RewriteRule #2 - Does Not Cause the Faulty Rewrite

RewriteRule #3 - Does Not Cause the Faulty Rewrite

RewriteRule #4 - Does Not Cause the Faulty Rewrite

RewriteRule #5 - Causes the Faulty Rewrite

# RewriteRule #6 - Does Not Cause the Faulty Rewrite

# RewriteRule #7 - Does Not Cause the Faulty Rewrite

# RewriteRule #8 - Does Not Cause the Faulty Rewrite

You now know the issue is either with Rule 5 itself (most likely) or with Rule 1 thru 4... If it's not Rule 5 itself, move it above Rule 4... If it doesn't stop 'breaking' move Rule 5 above Rule 3 and so on, until you find the rule or combination of rules causing the error.

If you cannot find the error this way, it must be in the server configuration... I've had an issue with detecting paths on one host when it involves a PHP file, which forces me to remove the start anchor from rules I want to effect the PHP files and has to do with the parsing of PHP as CGI, so it may be something small that needs to be edited in your ruleset to get it to work correctly. I would definitely check the server headers to see which kinds of redirect it is, because hosts are notorious for using 302 undefined redirects and if it is, you may be able to narrow the issue to the server configuration by simply making sure all external redirects in your file are properly defined as 301. If they are you know the error is most likely not in your file.

You might also try removing all rules from the file and seeing if the redirect happens... If it does, then it's obviously not your .htaccess.

Anyway, just some thoughts I was having about an interesting issue.

adrianTNT




msg:4031713
 6:52 pm on Nov 25, 2009 (gmt 0)

Hello. I already tested that, the double URL is caused by last rule in these lines, the last line triggers a match from everything I had.

RewriteCond %{HTTP_HOST} !parking.com
RewriteCond $1 !^domains

RewriteCond %{HTTP_HOST} ^(www.)?([^.]*)\.(.*)$ [NC]
RewriteRule ^(.*)$ /domains/%2.%3/$1 [L,QSA]

It might be related to the ^domains right above, I am not sure.

TheMadScientist




msg:4031730
 7:14 pm on Nov 25, 2009 (gmt 0)

My guess is it has something to do with rule order, the trailing / and the variables used for comparison...

I'm thinking this through on-the-fly, so go with the idea more than exactly what I say if it's not exactly technically correct, but:

When you Redirect to the / at the beginning of the file it's not happening, so my guess is when you remove that line the Rewrite is happening prior to the Redirect to / and the Redirect is based on a variable which is updated 'in real time', which would cause the external redirect to happen after the Rewrite and Redirect to the Rewrite location if the directories all run off the same .htaccess.

Here's a bit easier way of saying it:

Request to no-trailing/slash
Rewrite to /domains/example.com/no-trailing/slash
Redirect to trailing / kicks in based on a variable updated to the path of the Rewrite...

End Result: /domains/example.com/no-trailing/slash/

If this is the case, I would probably add the no-trailing-slash redirect to the canonicalization ruleset and make sure all rewrites happen after the canonicalization.

g1smd




msg:4031947
 12:13 am on Nov 26, 2009 (gmt 0)

Is $1 in the second condition still defined by the time parsing reaches that line?

RewriteCond $1 !^domains

jdMorgan




msg:4031955
 12:39 am on Nov 26, 2009 (gmt 0)

The $n variables set by RewriteRule are 'global' in the RewriteCond scope, since RewriteConds themselves only update %n variables.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved