homepage Welcome to WebmasterWorld Guest from 54.211.231.221
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
mod rewrite issue where directory actually exists
mod_rewrite issue where directory or file actually exists
FaceOnMars




msg:4537268
 9:18 pm on Jan 18, 2013 (gmt 0)

I'm not sure if this is possible, but am currently in the midst of trying to restructure my site architecture & am running into some many walls. I currently have real sub-directories setup for each state with a real static index file in each sub-directory:

/California/index.shtml
/Kansas/index.shtml
etc.

However, it's now necessary for me to implement pagination for each state sub-directory and have opted to use mod_rewrite and scripting such that URL's will now look like:

/California/
/California/2/
/California/3/
etc.

The mod_rewrite code(placed in a .htaccess file directly under the document root) - which works fine when I remove "California" and replace with a non-existent directory "Testing" - is as follows:

RewriteCond %{REQUEST_URI} !states_camps_root.cgi
RewriteRule ^California/([A-Za-z0-9-]+)$ California/$1/ [R=301,L]
RewriteRule ^California$ California/ [R=301,L]
RewriteRule ^California/(.*)\/$ /states/states_root.cgi?State=CA&Page=$1
RewriteRule ^California/?$ /states/states_root.cgi?State=CA
RewriteCond %{REQUEST_URI} !California/$
RewriteRule ^California(.*)/$ /California/? [R=301,L]

I then placed the following code in an .htaccess file directly inside /California:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.shtml\ HTTP/
RewriteRule ^(.*)index\.shtml$ /California/$1 [R=301,L]

I still get the static real file "index.shtml" being served. I've tried removing this file and simply get a directory index. I've also tried changing setting in apache:

DirectoryIndex test.html

... but still no luck.

The reason for the 301 inside each states' directory is due to the fact that Google has all of my URL's indexed as /California/index.shtml, etc. (for all states) and am a bit concerned about getting dinged for duplicate content.

I'm sorry if this is an obvious no no, but I'm a bit under the weather and can't connect the dots. Any advice is much appreciated!

 

g1smd




msg:4537269
 9:25 pm on Jan 18, 2013 (gmt 0)

First thing is to change the code in the following ways:

Every rule needs the
[L] flag.

Every redirect should include the protocol and hostname in the rule target.

Rules that rewrite (rather than redirect) must not include the protocol and hostname in the rule target, just the path and file.

List all of the rules that redirect before the rules that rewrite.

Put all of the rules in the root htaccess, modifying the RegEx pattern to include the full path when you move a rule from the htaccess file in a folder, i.e. pattern like
^filename becomes ^folder/filename

Slashes in patterns do not need to be escaped.

List the redirects in order from "most specific" to "most general".

Add a blank line after every
RewriteRule to make the code more readable.

Once we've got those out of the way, we can look deeper.

FaceOnMars




msg:4537279
 9:55 pm on Jan 18, 2013 (gmt 0)

Thank you g1, one of the items you've listed apparently fixed the issue ... just need to figure out which one :-)

Although I'm curious when you said "Put all of the rules in the root htaccess", whether or not this is necessary or if you were simply suggesting this in effort to help debug?

I was under the impression that it helps to mitigate resource utilization to place .htaccess files as deep as possible within the directory structure.

Regardless, thank you for your help!

g1smd




msg:4537282
 10:07 pm on Jan 18, 2013 (gmt 0)

Rules in deep folders run after rules in the root.

It's possible that a root rule matches a request that should not do so.

I find it less complicated to have all the rules in the root.

There's major problems if a deep htaccess external redirect runs after a root htaccess internal rewrite.

Make sure you comment your code for when you look at it a long time in the future.

Let's see the new code, as there's a couple of minor issues to finish off.

FaceOnMars




msg:4537295
 11:18 pm on Jan 18, 2013 (gmt 0)

As you've alluded to, I believe there might have been a conflict with root .htaccess rules vs. rules (redirect) contained within the sub-dir .htaccess ... since when I pulled the first line out of the root .htaccess and placed it in it's own .htaccess file under the sub-dir, I was served the static file /California/index.shtml with no redirect - nor a rewrite to the script.

Yes, I agree commenting is your friend! (although not sure what the comment character is for .htaccess?)

Here's the code I came up with (root .htaccess file). Not sure if it's 100% up to speed, but it seems to work as intended.

RewriteRule California/index.shtml http://www.example.com/California/ [R=301,L]

RewriteCond %{REQUEST_URI} !/states/states_root.cgi

RewriteRule ^California/([A-Za-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,L]

RewriteRule ^California$ http://www.example.com/California/ [R=301,L]

RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]

RewriteRule ^California/(.*)/$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]

BTW, I deleted the last two lines of the original code ... since I realized I didn't need it for this particular application (I was recycling it from another section which it helped with canonicalization)

g1smd




msg:4537301
 11:49 pm on Jan 18, 2013 (gmt 0)

Use example.com to suppress URL auto-linking in this forum.

Escape the literal period in the first rule and ^start anchor the pattern.
Add a RewriteCond to prevent an infinite loop.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /California/index\.shtml[^ ]*\ HTTP/
RewriteRule ^California/index\.shtml http://www.example.com/California/ [R=301,L]


Use the NC flag and remove A-Z from the pattern.
Escape the literal period. This redirect adds a slash.
RewriteCond %{REQUEST_URI} !/states/states_root\.cgi
RewriteRule ^California/([a-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,NC,L]

The RewriteCond isn't actually needed as the
[a-z0-9-] rule pattern doesn't allow for any slashes or periods so it can never match "states/states_root\.cgi" anyway.
For pages you should redirect to remove the slash, not add a slash if you want to stick to the HTTP specs.

This redirect adds a slash for the folder URL.
RewriteRule ^California$ http://www.example.com/California/ [R=301,L]

This rewrite allows trailing slash to be present or not be present.
Only one should work. This prevents Duplicate Content issues.
RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]
Your pattern should be
^California/$ here.
Since there is no "capture", $2 will always be empty. You'll need to fix this rule.

This rule is also broken, needs fixing. $2 will always be empty as currently coded.
RewriteRule ^California/(.*)$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]
Perhaps replace
/(.*)$ with /([^/]+)/([^/]+)/$ or perhaps even /([^/]+)/([^/.]+)$ if you decide to use URLs that do NOT end with a trailing slash for pages.

If you can, you should go with all lower-case URLs on the new site. It will make the code a lot simpler.

Comments begin with #

lucy24




msg:4537308
 12:19 am on Jan 19, 2013 (gmt 0)

Hm, now that's interesting. It can be example dot absolutely anything-- but it has to be example dot something.

RewriteRule California/index.shtml http://www.example.com/California/ [R=301,L]

RewriteCond %{REQUEST_URI} !/states/states_root.cgi
RewriteRule ^California/([A-Za-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,L]

RewriteRule ^California$ http://www.example.com/California/ [R=301,L]

RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]

RewriteRule ^California/(.*)/$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]


I made one tiny change to the group of rules you posted. Other than adding .com, I mean. Do you see why?

FaceOnMars




msg:4537309
 12:22 am on Jan 19, 2013 (gmt 0)

Thanks again g1!

For some reason your first conditional redirect causes an internal server error (RewriteCond: bad flag delimiters):

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /California/index\.shtml[^ ]*\ HTTP/
RewriteRule ^California/index\.shtml http://www.example.com/California/ [R=301,L]

Sorry about my sloppy coding on $2, I've recycled this from another section which makes use of it.

As far as the trailing slashes, unfortunately it's a bit of a site-wide policy for consistency. This is not to say we can't drop the slash, but would like to do it all at once after enough time has passed with some other 301's which have been implemented.

g1smd




msg:4537310
 12:37 am on Jan 19, 2013 (gmt 0)

I can't immediately see what the issue is. Make sure there's a space between / and [ in the rule.

FaceOnMars




msg:4537311
 12:51 am on Jan 19, 2013 (gmt 0)

lucy, I compared your code to mine (line by line) and could've find a change.

FaceOnMars




msg:4537313
 1:03 am on Jan 19, 2013 (gmt 0)

g1, there's not a space in the rule ... also, when I comment out the condition the error goes away

lucy24




msg:4537361
 7:17 am on Jan 19, 2013 (gmt 0)

lucy, I compared your code to mine (line by line) and couldn't find a change

Hint: Rotate your head 90 degrees ;)

btw, in case you couldn't tell: both g1 and I habitually go through the Forums by opening a string of new posts in adjoining tabs and then reading them one by one. Therefore you will sometimes get posts that seem to either ignore or contradict the preceding six posts. In this case, only one-- but a pretty substantive one.

For some reason your first conditional redirect causes an internal server error (RewriteCond: bad flag delimiters)

I found the problem in MAMP. It's not the trailing space after the flag-- server doesn't seem to care about that. And "bad flag delimiter" is a little bit of a red herring.

Further hint: "space" is the operative word. You're going to kick yourself. (Not going to say any more, because it's so useful to find it yourself :))

FaceOnMars




msg:4537461
 3:42 pm on Jan 19, 2013 (gmt 0)

lucy, in addition to being sick with a bad headache, I've got a herniated disc in my neck ... so can only rotate to the left, hopefully it doesn't require 90 to the right :-) Anyhow, I've posted our code - mine is on top & your's is below. The only thing I can see is an extra newline character on my last line - is that it?

RewriteRule California/index.shtml http://www.example.com/California/ [R=301,L]
RewriteRule California/index.shtml http://www.example.com/California/ [R=301,L]

RewriteCond %{REQUEST_URI} !/states/states_root.cgi
RewriteCond %{REQUEST_URI} !/states/states_root.cgi

RewriteRule ^California/([A-Za-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,L]
RewriteRule ^California/([A-Za-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,L]

RewriteRule ^California$ http://www.example.com/California/ [R=301,L]
RewriteRule ^California$ http://www.example.com/California/ [R=301,L]

RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]
RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]

RewriteRule ^California/(.*)/$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]
RewriteRule ^California/(.*)/$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]

Regarding the "bad flag delimiter error", I did change the conditional as follows which seems to avert the error ... although I'm not exactly sure if it it accomplishes the same task:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /California/index\.shtml([^\ ]*)\ HTTP/

lucy24




msg:4537513
 9:46 pm on Jan 19, 2013 (gmt 0)

I did change the conditional as follows which seems to avert the error

You got it :)

In mod_rewrite, the space itself has meaning. "First piece of rule", space, "second piece", space, and so on. So if you need to use a space as a literal character-- most often in conditions involving THE_REQUEST, like here-- you have to escape it. When you look at your code you'll notice that you did this automatically in the other two places in the same line, but left it out of the grouping brackets. It's an easy oversight, since most things in brackets don't need to be escaped. (Other exceptions are literal brackets, hyphens and carets.) But mod_rewrite syntax overrides RegEx syntax, so the space counts as punctuation. And this in turn means that mod_rewrite thinks it's meeting an orphaned flag-closing bracket without the preceding opening bracket plus flag of some kind. That's where the error message comes from.

I think mod_rewrite-- and possibly everything else in Apache-- simply ignores trailing spaces. That's why you can't use \ (escaped literal space) as the very last character in a line. I tried it once. Ouch.

The only thing I can see is an extra newline character on my last line

Huh, that's funny, the significant change was that I deleted one blank line. But that's because you already had blank lines before each rule. More often, people have to be urged to add them. Separate the rules, but keep each condition(s)-plus-rule package together.

FaceOnMars




msg:4537752
 8:08 pm on Jan 20, 2013 (gmt 0)

thanks lucy, until I brushed up on the docs, I actually didn't realize THE_REQUEST equated to the full kit and kaboodle vs. REQUEST_URI ... so that makes sense about spaces. Wouldn't it be easier to just use REQUEST_URI if we're only concerned with the URL?

Now I see what you mean about looking sideways :-)

g1smd




msg:4537788
 9:29 pm on Jan 20, 2013 (gmt 0)

REQUEST_URI is updated to point to a different internal resource after an internal rewrite.

THE_REQUEST contains the original
GET /thisthing HTTP/1.1
HTTP request and is not changed as rewrites are processed.

You need to look at
THE_REQUEST to be sure that you're looking at what was requested by the browser, rather than as a result of a previous internal rewrite.

1. Request
example.com/index.php non-canonical URL.
2. Redirect to
www.example.com/ canonical URL.
3. Internally rewrite to
/index.php internal path to fetch the content.

If you test
REQUEST_URI instead of THE_REQUEST in (2) then when step (3) is executed, the reparse of the htaccess file will re-match the redirecting rule (htaccess is reparsed until no more rules match the current request), invoke it and expose the previously rewritten internal path back out on to the web as a new URL. Requesting example.com/index.php will redirect to www.example.com/ and then requesting www.example.com/ will internally rewrite to index.php and then immediately redirect to www.example.com/index.php. The browser requests www.example.com/index.php and around you go again. You've now got an infinite loop.

Apologies for the typo yesterday. The
[^ ]* should have been [^\ ]* as Lucy pointed it.
It was well after midnight here.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved