Msg#: 4440415 posted 2:06 am on Apr 14, 2012 (gmt 0)
You didn't go wrong anywhere. The Rule does exactly what you told it to do:
is correct for "there may be more directories here". You may want to set a non-capture flag on the inner part, but it isn't essential:
The second piece
means "now capture any leftover text that contains neither a directory slash nor a full stop, and then get your full stop, and wind up with "html" or "zip".
Ordinarily this too is exactly the right way to do things. But it breaks when there are full stops before the one setting off the final extension. (I assume you've got things like blahblah.xtn.zip with duplicate extensions.) You will have to change the final capture to either
If the filename can only contain one extra full stop, it may run a little faster as
Finally, if your potential multiply-stopped filename is always in the form
--or in some other specific form-- you can further fine-tune the Rule... but I'll stop here, because I'm getting tired :)
"Capture non-slashes, stopping as soon as possible. Grab the first full stop you meet, and then capture the following 'html' or 'zip'."
There should be a closing anchor after this and all other expressions, but I forgot.
The question mark after a * or + changes a Regular Expression from its regular "greedy" mode to "stingy" mode. (No, I do not know why RegEx terminology all has to do with food.) It's generally a last-resort option when things are too complicated to write an absolutely perfect capture. The idea is to save a few nanoseconds of backtracking. So if it turns out that your ".html" or ".zip" is followed by another full stop plus extension, the [^/] permits the RegEx to count the first ".xtn" as part of its capture and continue gobbling.