Forum Moderators: phranque

Message Too Old, No Replies

.htaccess redirect to remove middle directory with unknown value?

         

Sgt_Kickaxe

6:27 pm on Jul 8, 2011 (gmt 0)



Hi,

I'm trying to remove a directory from all urls that has random names (including letters, numbers and dashes). This is an example, there are 100's of these to change.

example.com/stuff/asdf-123/123456
needs to become
example.com/stuff/123456

The value 'asdf-123' above is different for every page and I need to remove it as a directory. The value 123456 is also different for every page but it needs to stay.

Best effort
^stuff/(.*)/(.*)$ http://www.example.com/stuff/$2 [R=301,L]

which obviously doesn't work. suggestions?

g1smd

6:53 pm on Jul 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The first (.*) pattern effectively says "put all of the requested URL in here". So what's going to go in the second (.*) then?

The parser has to make tens of thousands of guesses for every URL request hitting your server to work out what you actually meant. This is very slow and inefficient. The .* pattern is greedy, promiscuous and ambiguous. Never use it unless the next character is a $ end anchor.

So use a different pattern to capture "everything to the next slash", something like
[^/]+/
perhaps?

You'll need three of those and the first and last will each be captured as a backreference to be used as $1 and $2 in the target. Don't capture the middle bit and therefore don't re-use it in the rule's target.

Your existing rule redirects requests for the old URL to a new URL.

The pages of your site need to link out to that NEW URL.

You'll also need some sort of rewrite to match the new URL requests up with the correct server internal location that is going to deliver the content.

Sgt_Kickaxe

10:38 pm on Jul 8, 2011 (gmt 0)



Thanks g1smd, That to the next slash suggestion got me closer but I still am missing something. The second (.*) was a mistake, I rarely need to use htaccess, not my strong suit.

I have the site working properly without htaccess, everything links to where it should and all traces of the /asdf-123/ are removed. Google however still links to the pages with /asdf-123/ in them and all are getting the 404 page. That's what I'm trying to take care of.

I'll post an answer if/when I come up with one but any other suggestions are welcome. The section of this site is low on my totem of importance, this is/has been a great learning experience except I'm mostly learning what DOESN'T work so far :)

g1smd

10:52 pm on Jul 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have a request containing: "
 stuff/something/somethingelse
"

[^/]+/
matches to the next slash.

[^/.]+$
matches an extensionless filename.

If you need to capture something for re-use, enclose it in ( ) and refer to it as $1, $2 etc. If you don't need to re-use it, don't enclose it in ( ) or don't re-use the $n backreference.

Sgt_Kickaxe

12:22 am on Jul 9, 2011 (gmt 0)



This works
RewriteRule ^stuff/[^/]+/(.*)$ http://www.example.com/stuff/$1 [R=301,L]

I can't get the [^/.]+ to replace (.*), this doesn't work
RewriteRule ^stuff/[^/]+/([^/.]+)$ http://www.example.com/stuff/$1 [R=301,L]

What am I doing wrong with that last bit?

g1smd

12:30 am on Jul 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The key word was "extensionless". Your examples used extensionless URLs.

Are your URLs extensionless?

If they are not extensionless, you need a different pattern, one that does handle extensions.

Sgt_Kickaxe

4:35 am on Jul 9, 2011 (gmt 0)



they don't have extensions.

example.com/stuff/123456
and
example.com/stuff/widgets

I remembered that I redirect requests for .html and .php to the extensionless versions and that was conflicting with this redirect. Problem solved, thanks for the help.