homepage Welcome to WebmasterWorld Guest from 54.205.247.203
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
RewriteRule not capturing full stops
rewriterule capture full stop
hottrout




msg:4440417
 11:18 am on Apr 13, 2012 (gmt 0)

Hi guys,

I have a RewriteRule that I though was working well but I have discivered many 404's being thrown up whenever the filename contains a full stop in the name.

The current code is as follows

RewriteRule ^Libraries/Emulation/NES/(([^/]+/)*)([^/.]+\.(html?|zip))$

Its purpose is to capture any request for an html or zip file for however many folders deep. It does work until it encounters a filename that has a full stop in it.

Can anyone suggest where I have went wrong?

 

lucy24




msg:4440715
 2:06 am on Apr 14, 2012 (gmt 0)

You didn't go wrong anywhere. The Rule does exactly what you told it to do:

(([^/]+/)*)

is correct for "there may be more directories here". You may want to set a non-capture flag on the inner part, but it isn't essential:

((?:[^/]+/)*)

The second piece

([^/.]+\.(html?|zip))$

means "now capture any leftover text that contains neither a directory slash nor a full stop, and then get your full stop, and wind up with "html" or "zip".

Ordinarily this too is exactly the right way to do things. But it breaks when there are full stops before the one setting off the final extension. (I assume you've got things like blahblah.xtn.zip with duplicate extensions.) You will have to change the final capture to either

([^/]+?\.(html?|zip))

or

(([^/.]+\.)+(html?|zip))

If the filename can only contain one extra full stop, it may run a little faster as

([^/.]+\.([^/.]+\.)?(html?|zip))

Finally, if your potential multiply-stopped filename is always in the form

blahblah.html.zip

--or in some other specific form-- you can further fine-tune the Rule... but I'll stop here, because I'm getting tired :)

hottrout




msg:4441403
 11:23 am on Apr 16, 2012 (gmt 0)

Thank you lucy for this. I think I follow the logic but dont fully understand how this one works

([^/]+?\.(html?|zip))

lucy24




msg:4441507
 3:06 pm on Apr 16, 2012 (gmt 0)

"Capture non-slashes, stopping as soon as possible. Grab the first full stop you meet, and then capture the following 'html' or 'zip'."

There should be a closing anchor after this and all other expressions, but I forgot.

The question mark after a * or + changes a Regular Expression from its regular "greedy" mode to "stingy" mode. (No, I do not know why RegEx terminology all has to do with food.) It's generally a last-resort option when things are too complicated to write an absolutely perfect capture. The idea is to save a few nanoseconds of backtracking. So if it turns out that your ".html" or ".zip" is followed by another full stop plus extension, the [^/] permits the RegEx to count the first ".xtn" as part of its capture and continue gobbling.

hottrout




msg:4441878
 10:11 am on Apr 17, 2012 (gmt 0)

Thank you Lucy, I follow most of this but sometimes I really do think it is a black art reserved for those that have a hut in the forest.

g1smd




msg:4444224
 1:10 pm on Apr 23, 2012 (gmt 0)

The alternative begins
RewriteRule ^Libraries/Emulation/NES/(([^/]+/)*)(([^/.]+\.)+)(zip|html)$
but the value passed in $3 will have a trailing period attached.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved