homepage Welcome to WebmasterWorld Guest from 54.167.75.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Fun with FilesMatch
lucy24




msg:4554126
 11:14 pm on Mar 12, 2013 (gmt 0)

First the good news: This is me asking, so you know the identical question has not been asked 84 times in the past week.

Then the bad news: This is me asking, so I am not likely to step in with an answer, saving everyone else the trouble.

And the further bad news: This is a "why" question, not a "how-to" question.

Background: In an earlier thread [webmasterworld.com] I was asking about using FilesMatch envelopes to contain all RewriteRules pertaining to image files. I ended up doing this, and overall it seems to work.

More background: Lately I've been vexed with referer-less requests for isolated image files. Probably has to do with the New ::cough-cough:: Improved ::cough-cough:: Image Search. But unless there's a specific UA involved, I can't be sure.

I ended up making this rule:

RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ! {short list of authorized search engines here}
RewriteCond %{REQUEST_METHOD} !HEAD
RewriteRule ^(ebooks/\w+/images|rats/images|paintings/\w+/blowups) http://www.example.com/boilerplate/sorry.html [R=301,L]


The redirect of course only works if it was a direct request for the image. If it's an <img src= from a browser that doesn't send a referer, all that happens is that the image is not shown.

The problem and the question: This rule will not work as shown, with opening anchor. Leading / has no effect. Wouldn't expect it to, since we're in htaccess, but had to double-check. Omit the opening anchor and all is well. (Detour here as I edit two other rules that I hadn't realized weren't working :()

Further experiment shows that inside this FilesMatch envelope, mod_rewrite --and presumably other mods, though I haven't checked --has lost its aliasing info. The opening anchor will work if it's used with the full filepath

RewriteRule ^/home/my-user-name/example.com/rest-of-pattern-here

(The beginning is the format used by my error logs, so that's what I tried. Note leading slash.)

Can someone explain this? Preferably in words of two syllables.



Elsewhere...

I started out with two questions, but was able to work out the second answer myself. I post it here because I am utterly certain I will sooner or later forget the answer. And sooner or later I or someone else will have the same question again.

Second question (with answer):

The above quoted rule, with or without anchor and path stuff, will not work at all outside the FilesMatch envelope. Much experimentation

:: boilerplate about why everyone should have a test site and it's definitely worth the $10 for an extra domain name ::

reveals that:

IF
you have a Files or FilesMatch envelope
AND
this envelope contains the "RewriteEngine on" directive (whether or not there are actual RewriteRules)
THEN
RewriteRules outside the envelope for files that match the envelope will not work
UNLESS
the envelope also contains the "RewriteOptions inherit" directive

Yup, the self-same RewriteOptions directive that we were talking about the week before last [webmasterworld.com]. So it doesn't only apply to nested htaccess files. It also applies to Files(Match) envelopes within the same htaccess.

Urk.

 

g1smd




msg:4554149
 1:37 am on Mar 13, 2013 (gmt 0)

Somewhere in a previous rule the full internal filepath is being exposed.

I've had this happen and it takes forever to track it down.

lucy24




msg:4554159
 3:16 am on Mar 13, 2013 (gmt 0)

The issue applies to everything inside the envelope.

Say I put an exact copy of the domain-name-canonicalization rule inside the envelope-- something you ordinarily wouldn't need to do, since image requests come from pages, and therefore already have the correct form. The pattern is the ultra-generic unanchored (.*) the kind that makes people foam at the mouth in all other circumstances. If I then request an image file with the wrong domain name-- after disabling the lines that deal with referer-less requests, duh-- I end up with

http://www.example.com//home/my-user-name/example.com/the-rest-of-the-file-name

Note double slash, because the leading slash in "/home" was captured.

I think there has to be something about the envelope itself that's doing it. Normally you wouldn't notice, since Files(Match) envelopes almost by definition look at the end of the filename. And generally the envelope is for things like access and authorization that don't pertain to the requests. As noted elsewhere, I've already learned that in spite of the name, Files doesn't only apply to physical files. It also applies to requests.

I also think that the envelope has two effects. It makes things run faster for non-image requests, since the server doesn't even have to stop and read the rule. But I suspect it makes things slower for image requests, exactly as if they had to pass through two full htaccess files. Execution of any RewriteRules is put on hold until the request has passed through the inner envelope.

I tested that part like this:

RewriteRule dunnykin http://www.example.com/hoosegow/test-two.html [R=301,L]

<FilesMatch "backgrounds\.html">
RewriteRule .+ http://www.example.com/hoosegow/test-one.html [R=301,L]
</FilesMatch>

Position of the envelope-- before or after the other rules-- has no effect. Apache docs say so by implication: Files(Match) rules execute after everything else. I tested it anyway. The "RewriteEngine on" directive has to be present in both places. Docs don't say so; I learned it by experiment.

Request for:
dunnykin/some-file-name
redirect to:
hoosegow/test-two.html

Request for:
hoosegow/backgrounds.html
redirect to:
/hoosegow/test-one.html

But watch!

Request for:
dunnykin/backgrounds.html
redirect to:
hoosegow/test-one.html

Although the envelope executes after the non-envelope rules, the result of the earlier rule is discarded. Note in particular that these rules are external redirects, not internal rewrites. So the request has not yet been allowed to leave the building. Rules inside the envelope are applied as if the non-envelope rules had never existed. This happens even if the outer rule-- the one that executed first, according to Apache-- ends in a flat-out [F].

Since this is a test site, I have set html and php files to instant expiration. This saves having to clear the browser cache and reload continuously.

Clearly there is more to FilesMatch than meets the eye. I've read the page on Configuration Sections [httpd.apache.org] backward, upside-down and sideways and simply can't find any discussion of the pattern

<Directory>
blahblah
<Files>
other blahblah
</Files>
</Directory>

as it applies to mod_rewrite

lucy24




msg:4555624
 11:30 am on Mar 16, 2013 (gmt 0)

Postscript:

Here's yet another detail I didn't notice earlier. Thanks to one of those heart-stopping moments when you're doing routine testing and you think your entire htaccess has stopped working...

Coincidentally this is the same rule I started out with. The pattern, unanchored, is

ebooks/\w+/images/

further constrained to image files (jpe?g|gif|png).

Outside the <Files> envelope, this would not only work for
/ebooks/perez/images/cover.jpg
(real file)

but for
/ebooks/perez/images/etcetera.jpg
/nonsense/ebooks/perez/images/cover.jpg
/ebooks/gibberish/images/cover.jpg
/ebooks/perez/images/others/cover.jpg
(nonexistent files and/or directories-- so long as they fit the regex)

Inside the envelope the rule will still work for made-up filenames-- but not for paths involving nonexistent directories. This seems paradoxical since the envelope is called <FilesMatch> ... but it makes sense if you think of mod_rewrite continuing its search all the way to the document's home directory. It has to make sure that the current rule will not be superseded by subsequent RewriteRules. So it doesn't matter whether the file does exist, only that it can exist.

This, in turn, means that unwanted visitors asking for nonexistent files may end up with a 404 where they would otherwise have been whacked with an unconditional 403. Hm. With some robots, this may actually help. "You can't have it!" is a challenge. "It ain't here" leaves nothing more to say.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved