Welcome to WebmasterWorld Guest from 54.144.246.252

Forum Moderators: Ocean10000 & incrediBILL & phranque

RewriteRule not capturing full stops

rewriterule capture full stop

   
11:18 am on Apr 13, 2012 (gmt 0)

5+ Year Member



Hi guys,

I have a RewriteRule that I though was working well but I have discivered many 404's being thrown up whenever the filename contains a full stop in the name.

The current code is as follows

RewriteRule ^Libraries/Emulation/NES/(([^/]+/)*)([^/.]+\.(html?|zip))$

Its purpose is to capture any request for an html or zip file for however many folders deep. It does work until it encounters a filename that has a full stop in it.

Can anyone suggest where I have went wrong?
2:06 am on Apr 14, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You didn't go wrong anywhere. The Rule does exactly what you told it to do:

(([^/]+/)*)

is correct for "there may be more directories here". You may want to set a non-capture flag on the inner part, but it isn't essential:

((?:[^/]+/)*)

The second piece

([^/.]+\.(html?|zip))$

means "now capture any leftover text that contains neither a directory slash nor a full stop, and then get your full stop, and wind up with "html" or "zip".

Ordinarily this too is exactly the right way to do things. But it breaks when there are full stops before the one setting off the final extension. (I assume you've got things like blahblah.xtn.zip with duplicate extensions.) You will have to change the final capture to either

([^/]+?\.(html?|zip))

or

(([^/.]+\.)+(html?|zip))

If the filename can only contain one extra full stop, it may run a little faster as

([^/.]+\.([^/.]+\.)?(html?|zip))

Finally, if your potential multiply-stopped filename is always in the form

blahblah.html.zip

--or in some other specific form-- you can further fine-tune the Rule... but I'll stop here, because I'm getting tired :)
11:23 am on Apr 16, 2012 (gmt 0)

5+ Year Member



Thank you lucy for this. I think I follow the logic but dont fully understand how this one works

([^/]+?\.(html?|zip))
3:06 pm on Apr 16, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



"Capture non-slashes, stopping as soon as possible. Grab the first full stop you meet, and then capture the following 'html' or 'zip'."

There should be a closing anchor after this and all other expressions, but I forgot.

The question mark after a * or + changes a Regular Expression from its regular "greedy" mode to "stingy" mode. (No, I do not know why RegEx terminology all has to do with food.) It's generally a last-resort option when things are too complicated to write an absolutely perfect capture. The idea is to save a few nanoseconds of backtracking. So if it turns out that your ".html" or ".zip" is followed by another full stop plus extension, the [^/] permits the RegEx to count the first ".xtn" as part of its capture and continue gobbling.
10:11 am on Apr 17, 2012 (gmt 0)

5+ Year Member



Thank you Lucy, I follow most of this but sometimes I really do think it is a black art reserved for those that have a hut in the forest.
1:10 pm on Apr 23, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The alternative begins
RewriteRule ^Libraries/Emulation/NES/(([^/]+/)*)(([^/.]+\.)+)(zip|html)$

but the value passed in $3 will have a trailing period attached.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month