Forum Moderators: phranque

Message Too Old, No Replies

But . . it *doesn't* start with these characters.

         

rocknbil

5:10 pm on Oct 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm sure I'm headed for another facepalm moment. :-) We are setting up a Worpress site, and have incoming uri's like this . . .

2008/08/12/some-topic-here/
2009/10/14/another-topic-here/
911/2009/08/06/blah-de-blah/

which need redirection, easy enough, I thought . . .

RewriteRule ^2008|2009|2010|911/\d{2,4}/\d{2}/\d{0,2}/*.* /some-category/ [R=301,NC,L]

It may seem excessive, but what comes after the dates vary greatly, from / to no / to long indeterminate strings, and there's about 400 of them, this one line captures them all.


However, our direct links to images . . .

/wp-content/uploads/2010/08/10/123456.jpg

are getting 301'ed and it's this one line that's doing it. Which they shouldn't, right? ^ = beginning of pattern . . .

I've got it temporarily fixed with

RewriteCond %{REQUEST_URI} !wp-content
RewriteRule ^2008|2009|2010|911/\d{2,4}/\d{2}/\d{0,2}/*.* /some-category/ [R=301,NC,L]

But I shouldn't need it. Sure I'm missing something stupid . . .

g1smd

5:38 pm on Oct 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The *.* notation is incorrect

It should be .* only. In fact, if you aren't capturing it in a backreference the .* can be completely omitted.

You need parentheses around the entirety of the ORed items, otherwise the month and day part of the pattern is only ever paired with the 911/ value. Additionally, without parentheses, only 2008 is start anchored.

At the moment the pattern decodes as "begins with 2008" OR "contains 2009" OR "contains 2010" OR contains "911, slash, two to four digits, slash, two digits, slash, digits (the digit count zero, one or two), optional slash (the slash zero, one or many times), optional characters".

That's not at all what you wanted.

jdMorgan

6:05 pm on Oct 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So,

RewriteRule ^(2008|2009|2010|911)/\d{2,4}/\d{2}/\d{1,2}/([^/]+/)+some-category/ [R=301,NC,L]

should do it.

By using a "one or more characters not slash, followed by a slash, and all one or more times" subpattern instead of ".*", you at least tell the matching engine that it need attempt tail-matching only at slash boundaries, rather than on a character-by-character basis. The result should be at least 14 times faster, and even better if the tail after "/some-category" is non-blank.

Assuming you don't really want to match "911/2009/08//blah-de-blah/" (note blank month and therefore two consecutive slashes), that sub-pattern quantifier should be {1,2} as shown, and not {0,2}.

Jim

g1smd

6:13 pm on Oct 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...and if it really is supposed to be an optional value then /(\d{1,2}/)* will allow for no value there and take out one of the slashes at the same time.

The devil is in the details.

Working out "exactly" what you want is the difficult bit: "begins with" vs. "contains" vs. "ends with" and "zero or more times" vs. "one or more times", are just two decisions to be made of many.

rocknbil

7:48 pm on Oct 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All taken . . . but

The *.* notation is incorrect


What about this?

911/2009/08/06/blah-de-blah/
911/2009/08/06/blah-de-blah

The first asterisk is for zero or more slashes.

"begins with 2008" OR "contains..."


There it is, and my "moment." Doh.

^(2008|2009|2010|911)

Been staring at it too long, over 5600 URI redirects, this one manages around 400 of them. So far the rewrites are still under 100 lines.



Will apply the other updates as well. Of course I will try it verbatim, but I think the problem with this

^(2008|2009|2010|911)/\d{2,4}/\d{2}/\d{1,2}/

is that in the first pattern there's three delimited sets of digits followed by characters, and in the second, there are four.

The reason I didn't use this ([^/]+/) there are indeed more potential slashes in the strings after the initial numbers. But this could to it: /(\d{2}/)* - there will always be two if they are present.

Thank you gents.

g1smd

11:03 pm on Oct 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The first asterisk is for zero or more slashes.

Use a question mark instead.

Using /* would allow a URL ending ///////////////// to be valid.