Welcome to WebmasterWorld Guest from 34.201.121.213

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Rewrite rule to ignore certain paths

     
9:42 am on Dec 3, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 25, 2003
posts:94
votes: 0


I want to take a URL like below

http://www.myserver.com/_some_random_text/optional_subdirectory/page.html

and convert it via mod_rewrite to

http://www.myserver.com/optional_subdirectory/page.html

such that the directory in the path that begins with an underscore is dropped, but direct requests to

http://www.myserver.com/optional_subdirectory/page.html

are still parsed correctly.
Here is what I have, but it doesn't seem to work.

RewriteCond %{REQUEST_URI} ^(.*)/_([^/]+)/(.*)$
RewriteRule ^(.*)/_([^/]+)/(.*)$ $1/$3

I also tried

RewriteCond %{REQUEST_URI} ^(.*)(/_[^/]+)/(.*)$
RewriteRule ^(.*)(/_[^/]+)/(.*)$ $1/$3

I simply get the expected 404 page.
Any suggestions?
Thanks
Wing
5:41 pm on Dec 3, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


winglian,

Welcome to WebmasterWorld [webmasterworld.com]!

Where do you intend to place the rewrites, in http.conf, or in .htaccess?
Do you have other RewriteRules working already?

Jim

6:18 pm on Dec 3, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 25, 2003
posts:94
votes: 0


I have access to both a user.conf and .htaccess. I haven't tried any other basic rewrites.
I have tried it on both places, but to no avail. I did a little more research and it seems that it is not matching because I am not using non-greedy regex. I believe I need to match the first parameter with a lookahead to match the "/_", but not exactly sure how to do that.

Here is my new regex that works fine in python and matches perfectly


^(.*?)(/_(.*?))(/.*?)$

however i get the following error in the error log in apache

RewriteCond: cannot compile regular expression '^(.*?)(/_(.*?))(/.*?)$'

Thanks
Wing
6:48 pm on Dec 3, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Wing,

You only need to worry about 'greedy' vs. 'non-greedy' when you are trying to match parts of the URL that look identical to the pattern, but you want to put, say, one of the identical parts into the first backreference, and two into the second... Or something similar to that. In that case, you need to be careful with using ".*" to match substrings, because ".*" is greedy, and will match as much of the string as possible, possibly "eating up" more than you expect. In the example at hand, if you used if for the first back-reference, it might match *two* of the identical substrings, leaving only one (or none) to match in the second back-reference. Since this is sort of a secondary topic, I'll leave it at that.

It looks to me like you had a problem with defining the 'borders' between the optional and non-optional parts of the requested URL -- matching the 'required' versus the 'optional' slashes, in particular.

Try this in.htaccess:


Options +FollowSymLinks
RewriteEngine on
RewriteRule ^([^/]+)(/_[^/]+)?/(.+)$ /$1/$3 [L]

You should not need to use a RewriteCond, unless there are considerations outside the scope of the problem you describe, such as only wanting to do this rewrite to *specific* subdirectories.

To help get it working, you may want to use an external redirect at first, so you can see the result in your browser address bar:


RewriteRule ^([^/]+)(/_[^/]+)?/(.+)$ /$1/$3 [L] http://www.example.com/$1/$3 [R=301,L]

The above code has not been tested, but should get you close.

I'd recommend trying very simple rewrites first, and getting those working before diving into complex regular-expressions rewrites. Something like:


Options +FollowSymLinks
RewriteEngine on
RewriteRule ^silly\.html$ /index.html [L]

For this code, any request for the non-existent file 'silly.hrml' should return your home page.

Jim,

9:38 pm on Dec 3, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 25, 2003
posts:94
votes: 0


Thanks, It seems to work well now with a little tweaking.
Here is what I finally ended up with in my .htaccess file

RewriteEngine On
RewriteRule ^(.+/)?(_[^/]*)/(.*)$ $3 [L]

It turns out that $1 is usually null.
However another problem arises. I am playing with it in a user directory for now on my laptop so it is located under

/Users/username/Sites/site1/test.php

so calling the url

http://localhost/~username/site1/_test/test.php

should point to

/Users/username/Sites/site1/test.php

but i get a 404 not found, the error_log shows

File does not exist: /Library/WebServer/Documents/Users/username/Sites/site1/test.php

however, if i use

RewriteEngine On
RewriteRule ^(.+/)?(_[^/]*)/(.*)$ $1 [L]

where i replace the $3 with $1, i get

File does not exist: /Library/WebServer/Documents/Users/username/Sites/site1/

indicating that it is calling from the /Users/username/Sites/site1/ directory so the REQUEST_URI does not have an actual prefix, hence the null $1. I can use

RewriteEngine On
RewriteRule ^(.+/)?(_[^/]*)/(.*)$ http://localhost/~username/site1/$3 [L]

and it works fine, but i am trying to avoid redirects for the sake of SEO.
Thus, it seems to have stripped out the "_test" properly, but it points to the wrong DocumentRoot. Is their something wrong with my httpd.conf? My .htaccess file is in the /Users/username/Sites/site1/ directory so shouldn't it stay within the same directory?

I noticed in my /etc/httpd/httpd.conf that i have the following lines


<Directory "/Library/WebServer/Documents">

and a little further down i have
<IfModule mod_userdir.c>
UserDir Sites
</IfModule>

Is there conflict here? If so, is there a resolution that I can put into the .htaccess file without messing with my httpd.conf?

Thanks Again
Wing

10:07 pm on Dec 3, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Wing,

I'm not sure, but your setup for UserDir [httpd.apache.org] doesn't look right... It needs a directive with the keyword "enabled" in there, too.

You also might try putting the whole path (from Apache root down) in there.

For yet another layer of complexity, see the RewriteBase [httpd.apache.org] directive in mod_rewrite.

<added>
From message #4 above:

I'd recommend trying very simple rewrites first, and getting those working before diving into complex regular-expressions rewrites.
Use a simple filename-only rewrite to get your UserDir and RewriteBase set up properly before attempting to rewrite directories - it will save you a lot of time.
</added>

Jim

10:35 pm on Dec 3, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 25, 2003
posts:94
votes: 0


The httpd.conf is the deafult for Mac OSX (Panther). I believe the way the directive for UserDir is that it is enabled by default (otherwise I would not be able to access the ~username subdirectory) as far as I can tell.

A little further down is


Include /private/etc/httpd/users/*.conf

and in the /private/etc/httpd/users/username.conf file is

<Directory "/Users/username/Sites/">
Options Indexes MultiViews +FollowSymLinks
AllowOverride All
Order allow,deny
Allow from all
</Directory>

which all seems pretty straightforward.
In regards to sticking with a simple rewrite, I used

Options +FollowSymLinks
RewriteEngine On
RewriteBase /Users/username/Sites/site1
RewriteRule ^now\.php$ index.php [L]

and returned a 404. The error_log for [localhost...] was

File does not exist: /Library/WebServer/Documents/Users/username/Sites/site/index.php

indicating that even the simple rewrite worked (transformed now.php to index.php), but still thought it was in the "main DocumentRoot" despite defining the RewriteBase.
also, changing it to

RewriteBase /Users/username/Sites/site1/anotherrandomsubfortesting

returned a similar error

File does not exist: /Library/WebServer/Documents/Users/username/Sites/site/anotherrandomsubfortesting/index.php

indicating that the rewritebase was being used, but not from the top level

Thanks Again
Wing

11:00 pm on Dec 3, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 25, 2003
posts:94
votes: 0


Okay. I hope we can close this topic. What I needed was


RewriteBase /~username/site1

showing the http base rather than

RewriteBase /Users/username/Sites/site1

which would be where the docbase is

Thanks for all the help!

Wing Lian

P.S
Just in case anybody wants the final code, here goes


Options +FollowSymLinks
RewriteEngine On
RewriteBase /~username/site1
RewriteRule ^(.+/)?(_[^/]*)/(.*)$ $1$3 [L]
11:21 pm on Dec 3, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Wing,

Glad you got it working!

Jim