Welcome to WebmasterWorld Guest from 54.160.131.144

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

mod rewrite issue where directory actually exists

mod_rewrite issue where directory or file actually exists

     

FaceOnMars

9:18 pm on Jan 18, 2013 (gmt 0)



I'm not sure if this is possible, but am currently in the midst of trying to restructure my site architecture & am running into some many walls. I currently have real sub-directories setup for each state with a real static index file in each sub-directory:

/California/index.shtml
/Kansas/index.shtml
etc.

However, it's now necessary for me to implement pagination for each state sub-directory and have opted to use mod_rewrite and scripting such that URL's will now look like:

/California/
/California/2/
/California/3/
etc.

The mod_rewrite code(placed in a .htaccess file directly under the document root) - which works fine when I remove "California" and replace with a non-existent directory "Testing" - is as follows:

RewriteCond %{REQUEST_URI} !states_camps_root.cgi
RewriteRule ^California/([A-Za-z0-9-]+)$ California/$1/ [R=301,L]
RewriteRule ^California$ California/ [R=301,L]
RewriteRule ^California/(.*)\/$ /states/states_root.cgi?State=CA&Page=$1
RewriteRule ^California/?$ /states/states_root.cgi?State=CA
RewriteCond %{REQUEST_URI} !California/$
RewriteRule ^California(.*)/$ /California/? [R=301,L]

I then placed the following code in an .htaccess file directly inside /California:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.shtml\ HTTP/
RewriteRule ^(.*)index\.shtml$ /California/$1 [R=301,L]

I still get the static real file "index.shtml" being served. I've tried removing this file and simply get a directory index. I've also tried changing setting in apache:

DirectoryIndex test.html

... but still no luck.

The reason for the 301 inside each states' directory is due to the fact that Google has all of my URL's indexed as /California/index.shtml, etc. (for all states) and am a bit concerned about getting dinged for duplicate content.

I'm sorry if this is an obvious no no, but I'm a bit under the weather and can't connect the dots. Any advice is much appreciated!

g1smd

9:25 pm on Jan 18, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



First thing is to change the code in the following ways:

Every rule needs the
[L]
flag.

Every redirect should include the protocol and hostname in the rule target.

Rules that rewrite (rather than redirect) must not include the protocol and hostname in the rule target, just the path and file.

List all of the rules that redirect before the rules that rewrite.

Put all of the rules in the root htaccess, modifying the RegEx pattern to include the full path when you move a rule from the htaccess file in a folder, i.e. pattern like
^filename
becomes
^folder/filename


Slashes in patterns do not need to be escaped.

List the redirects in order from "most specific" to "most general".

Add a blank line after every
RewriteRule
to make the code more readable.

Once we've got those out of the way, we can look deeper.

FaceOnMars

9:55 pm on Jan 18, 2013 (gmt 0)



Thank you g1, one of the items you've listed apparently fixed the issue ... just need to figure out which one :-)

Although I'm curious when you said "Put all of the rules in the root htaccess", whether or not this is necessary or if you were simply suggesting this in effort to help debug?

I was under the impression that it helps to mitigate resource utilization to place .htaccess files as deep as possible within the directory structure.

Regardless, thank you for your help!

g1smd

10:07 pm on Jan 18, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Rules in deep folders run after rules in the root.

It's possible that a root rule matches a request that should not do so.

I find it less complicated to have all the rules in the root.

There's major problems if a deep htaccess external redirect runs after a root htaccess internal rewrite.

Make sure you comment your code for when you look at it a long time in the future.

Let's see the new code, as there's a couple of minor issues to finish off.

FaceOnMars

11:18 pm on Jan 18, 2013 (gmt 0)



As you've alluded to, I believe there might have been a conflict with root .htaccess rules vs. rules (redirect) contained within the sub-dir .htaccess ... since when I pulled the first line out of the root .htaccess and placed it in it's own .htaccess file under the sub-dir, I was served the static file /California/index.shtml with no redirect - nor a rewrite to the script.

Yes, I agree commenting is your friend! (although not sure what the comment character is for .htaccess?)

Here's the code I came up with (root .htaccess file). Not sure if it's 100% up to speed, but it seems to work as intended.

RewriteRule California/index.shtml http://www.example.com/California/ [R=301,L]

RewriteCond %{REQUEST_URI} !/states/states_root.cgi

RewriteRule ^California/([A-Za-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,L]

RewriteRule ^California$ http://www.example.com/California/ [R=301,L]

RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]

RewriteRule ^California/(.*)/$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]

BTW, I deleted the last two lines of the original code ... since I realized I didn't need it for this particular application (I was recycling it from another section which it helped with canonicalization)

g1smd

11:49 pm on Jan 18, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Use example.com to suppress URL auto-linking in this forum.

Escape the literal period in the first rule and ^start anchor the pattern.
Add a RewriteCond to prevent an infinite loop.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /California/index\.shtml[^ ]*\ HTTP/
RewriteRule ^California/index\.shtml http://www.example.com/California/ [R=301,L]


Use the NC flag and remove A-Z from the pattern.
Escape the literal period. This redirect adds a slash.
RewriteCond %{REQUEST_URI} !/states/states_root\.cgi
RewriteRule ^California/([a-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,NC,L]

The RewriteCond isn't actually needed as the
[a-z0-9-]
rule pattern doesn't allow for any slashes or periods so it can never match "states/states_root\.cgi" anyway.
For pages you should redirect to remove the slash, not add a slash if you want to stick to the HTTP specs.

This redirect adds a slash for the folder URL.
RewriteRule ^California$ http://www.example.com/California/ [R=301,L]


This rewrite allows trailing slash to be present or not be present.
Only one should work. This prevents Duplicate Content issues.
RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]

Your pattern should be
^California/$
here.
Since there is no "capture", $2 will always be empty. You'll need to fix this rule.

This rule is also broken, needs fixing. $2 will always be empty as currently coded.
RewriteRule ^California/(.*)$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]

Perhaps replace
/(.*)$
with
/([^/]+)/([^/]+)/$
or perhaps even
/([^/]+)/([^/.]+)$
if you decide to use URLs that do NOT end with a trailing slash for pages.

If you can, you should go with all lower-case URLs on the new site. It will make the code a lot simpler.

Comments begin with #

lucy24

12:19 am on Jan 19, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Hm, now that's interesting. It can be example dot absolutely anything-- but it has to be example dot something.

RewriteRule California/index.shtml http://www.example.com/California/ [R=301,L]

RewriteCond %{REQUEST_URI} !/states/states_root.cgi
RewriteRule ^California/([A-Za-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,L]

RewriteRule ^California$ http://www.example.com/California/ [R=301,L]

RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]

RewriteRule ^California/(.*)/$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]


I made one tiny change to the group of rules you posted. Other than adding .com, I mean. Do you see why?

FaceOnMars

12:22 am on Jan 19, 2013 (gmt 0)



Thanks again g1!

For some reason your first conditional redirect causes an internal server error (RewriteCond: bad flag delimiters):

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /California/index\.shtml[^ ]*\ HTTP/
RewriteRule ^California/index\.shtml http://www.example.com/California/ [R=301,L]

Sorry about my sloppy coding on $2, I've recycled this from another section which makes use of it.

As far as the trailing slashes, unfortunately it's a bit of a site-wide policy for consistency. This is not to say we can't drop the slash, but would like to do it all at once after enough time has passed with some other 301's which have been implemented.

g1smd

12:37 am on Jan 19, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I can't immediately see what the issue is. Make sure there's a space between / and [ in the rule.

FaceOnMars

12:51 am on Jan 19, 2013 (gmt 0)



lucy, I compared your code to mine (line by line) and could've find a change.

FaceOnMars

1:03 am on Jan 19, 2013 (gmt 0)



g1, there's not a space in the rule ... also, when I comment out the condition the error goes away

lucy24

7:17 am on Jan 19, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



lucy, I compared your code to mine (line by line) and couldn't find a change

Hint: Rotate your head 90 degrees ;)

btw, in case you couldn't tell: both g1 and I habitually go through the Forums by opening a string of new posts in adjoining tabs and then reading them one by one. Therefore you will sometimes get posts that seem to either ignore or contradict the preceding six posts. In this case, only one-- but a pretty substantive one.

For some reason your first conditional redirect causes an internal server error (RewriteCond: bad flag delimiters)

I found the problem in MAMP. It's not the trailing space after the flag-- server doesn't seem to care about that. And "bad flag delimiter" is a little bit of a red herring.

Further hint: "space" is the operative word. You're going to kick yourself. (Not going to say any more, because it's so useful to find it yourself :))

FaceOnMars

3:42 pm on Jan 19, 2013 (gmt 0)



lucy, in addition to being sick with a bad headache, I've got a herniated disc in my neck ... so can only rotate to the left, hopefully it doesn't require 90 to the right :-) Anyhow, I've posted our code - mine is on top & your's is below. The only thing I can see is an extra newline character on my last line - is that it?

RewriteRule California/index.shtml http://www.example.com/California/ [R=301,L]
RewriteRule California/index.shtml http://www.example.com/California/ [R=301,L]

RewriteCond %{REQUEST_URI} !/states/states_root.cgi
RewriteCond %{REQUEST_URI} !/states/states_root.cgi

RewriteRule ^California/([A-Za-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,L]
RewriteRule ^California/([A-Za-z0-9-]+)$ http://www.example.com/California/$1/ [R=301,L]

RewriteRule ^California$ http://www.example.com/California/ [R=301,L]
RewriteRule ^California$ http://www.example.com/California/ [R=301,L]

RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]
RewriteRule ^California/?$ /states/states_root.cgi?State=CA&specialty_name=$2 [L]

RewriteRule ^California/(.*)/$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]
RewriteRule ^California/(.*)/$ /states/states_root.cgi?State=CA&Page=$1&specialty_name=$2 [L]

Regarding the "bad flag delimiter error", I did change the conditional as follows which seems to avert the error ... although I'm not exactly sure if it it accomplishes the same task:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /California/index\.shtml([^\ ]*)\ HTTP/

lucy24

9:46 pm on Jan 19, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



I did change the conditional as follows which seems to avert the error

You got it :)

In mod_rewrite, the space itself has meaning. "First piece of rule", space, "second piece", space, and so on. So if you need to use a space as a literal character-- most often in conditions involving THE_REQUEST, like here-- you have to escape it. When you look at your code you'll notice that you did this automatically in the other two places in the same line, but left it out of the grouping brackets. It's an easy oversight, since most things in brackets don't need to be escaped. (Other exceptions are literal brackets, hyphens and carets.) But mod_rewrite syntax overrides RegEx syntax, so the space counts as punctuation. And this in turn means that mod_rewrite thinks it's meeting an orphaned flag-closing bracket without the preceding opening bracket plus flag of some kind. That's where the error message comes from.

I think mod_rewrite-- and possibly everything else in Apache-- simply ignores trailing spaces. That's why you can't use \ (escaped literal space) as the very last character in a line. I tried it once. Ouch.

The only thing I can see is an extra newline character on my last line

Huh, that's funny, the significant change was that I deleted one blank line. But that's because you already had blank lines before each rule. More often, people have to be urged to add them. Separate the rules, but keep each condition(s)-plus-rule package together.

FaceOnMars

8:08 pm on Jan 20, 2013 (gmt 0)



thanks lucy, until I brushed up on the docs, I actually didn't realize THE_REQUEST equated to the full kit and kaboodle vs. REQUEST_URI ... so that makes sense about spaces. Wouldn't it be easier to just use REQUEST_URI if we're only concerned with the URL?

Now I see what you mean about looking sideways :-)

g1smd

9:29 pm on Jan 20, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



REQUEST_URI
is updated to point to a different internal resource after an internal rewrite.

THE_REQUEST
contains the original
GET /thisthing HTTP/1.1

HTTP request and is not changed as rewrites are processed.

You need to look at
THE_REQUEST
to be sure that you're looking at what was requested by the browser, rather than as a result of a previous internal rewrite.

1. Request
example.com/index.php
non-canonical URL.
2. Redirect to
www.example.com/
canonical URL.
3. Internally rewrite to
/index.php
internal path to fetch the content.

If you test
REQUEST_URI
instead of
THE_REQUEST
in (2) then when step (3) is executed, the reparse of the htaccess file will re-match the redirecting rule (htaccess is reparsed until no more rules match the current request), invoke it and expose the previously rewritten internal path back out on to the web as a new URL. Requesting
example.com/index.php
will redirect to
www.example.com/
and then requesting
www.example.com/
will internally rewrite to
index.php
and then immediately redirect to
www.example.com/index.php
. The browser requests
www.example.com/index.php
and around you go again. You've now got an infinite loop.

Apologies for the typo yesterday. The
[^ ]*
should have been
[^\ ]*
as Lucy pointed it.
It was well after midnight here.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month