Forum Moderators: phranque

Message Too Old, No Replies

Seems Simple ReWrite/Redirect But Not Working

         

Poor_Knight

4:36 pm on Aug 11, 2009 (gmt 0)

10+ Year Member



Hello again Webmaster World,

I don't seem to be having success with rewrites or redirects when pertaining to directories.

I want to try to match "http://domain.com/dir/" with an unlimited number or possible "dir/" or none at all. My results so far have always been a 404 page. I want to take the last "dir" in the request and serve up a page for it. In the sample below I am just trying to get the match and see if I can redirect the request but to no avail.

RewriteCond %{THE_REQUEST} !-d
RewriteCond %{THE_REQUEST} !-f
RewriteCond %{THE_REQUEST} ^([^/]*/)*\ HTTP/
RewriteRule ^(index\.php)?$ [msn.com?...] [R,L]

Any advice?

jdMorgan

7:17 pm on Aug 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"THE_REQUEST" is a line that looks like this:
GET /requested-url-path?query-string HTTP/1.1

just as it appears in your raw server access logs.

So of course string does not refer to an existing file or directory - because it is not a valid file path of any kind. -d and -f can only be used with %{REQUEST_FILENAME} or with a string constructed using %{DOCUMENT_ROOT} plus a URL-path-part. But it must be a filepath in server's filespace, and not a URL.

It's not clear from your post exactly how you 'serve up a page for it', but here's an example:


RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(([^/]+/)*([^/]+)/))?$ path-to-your-dir-serving-script?last-dir=$3 [L]

This takes the last directory level (if any) from the requested URL-path and puts it into the query string variable "last-dir" and then calls the script at "/path-to-your-dir-serving-script"

Be aware that serving the same page for "any number of directory levels" is creating duplicate content, and leaving your site vulnerable to ranking problems - natural or maliciously-created. You should be sure to check the entire client-requested URL in your script and validate it against your database. Any requested URL that does not correspond to a valid database entry should result in a 404-Not Found response. In other words, you cannot simply ignore the 'extra' directory levels in URLs requested from your site, you have to check them to be sure they are valid.

Jim

Poor_Knight

8:49 pm on Aug 12, 2009 (gmt 0)

10+ Year Member



Thank you Jim. I ask and learn as usual. I understand your point about duplicate content. My initial method would have been vulnerable to that. /stuff/ would have served up the same page as /all/this/stuff/. I'm adjusting to examine the entire path and returning 404s where appropriate. I think it was laziness on my part :(

Anyway, your response did the trick for me. I did get a 505 error when used as is and the server log gave this error:
RewriteRule: cannot compile regular expression '^(([^/]+/)*([^/]+)/))?$'

I modified to this:
^((([^/]+)/)*)?$ path-to-your-dir-serving-script?last-dir=$1 [L]

I am returning the entire path if !-f and !-d. Seems to be working OK.

I did get a 404 page when there was no trailing slash so I've added another rule to include the trailing slash if !-f and !-d.

The intent here has been to mimic the directory index page dynamically but now I wonder if this is appropriate. /dir/ will serve up content as if it it were /dir/index.html.

Thanks again!

Mike

jdMorgan

11:06 pm on Aug 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> RewriteRule: cannot compile regular expression '^(([^/]+/)*([^/]+)/))?$'

That is very odd, since the expression -as posted- is quite valid. Perhaps a stray character or a character encode using a non-standard encoding in your version? I suppose if you've opted to pass the entire path, it doesn't matter, as long as...

> I did get a 404 page when there was no trailing slash so I've added another rule to include the trailing slash if !-f and !-d.

> Just be sure that this is an external 301 redirect from the slashless URL to the slashed URL. If you do not inform the client with a 301, then again you get duplicate content.

I presume you've seen the several concurrent threads we've had running on adding the trailing slash.

Jim