Forum Moderators: phranque

Message Too Old, No Replies

A day in the life of a request

         

lucy24

3:33 am on Oct 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've decided that I would understand this stuff a lot better if I had a clearer sense of what happens along the way, from your computer to the page your browser shows you. Apache is fine on the hair-splitting details, but not so great when it comes to putting things into words of two syllables.

At the beginning, you ask your browser to give you such-and-such URL.

The domain name goes the rounds of DNSs until it meets one that says "Sure, I know him, he lives at 12.34.56.78". Or not, in which case the browser puts up a rude message suggesting that you can't type, or the cat stepped on your router switch.

The request now dashes off to 12.34.56.78 and knocks on the door, where it waits for an answer. In the two browsers that I tried (using an URL that I knew would time out), it waits 75 seconds; I couldn't find a place to change it. If no answer after 75 seconds, the browser puts up another message, this time with rude comments along the lines of "I haven't got all day, you know". I don't know what happens if the browser's time limit is longer than the time limit at the far end. (Apache's default is 300 seconds.) Rude message from Apache instead?

OK... now what happens?

If you're looking for a particular file, does the config file tell you upfront if it doesn't exist, or do you have to go in and look? Along the way, do you have to obey any htaccess instructions you happen to meet, whether or not your requested file is where it's supposed to be?

If the Directory Slash Redirect is enabled, do you pick up the slash immediately, or do you have to look for yourself and confirm that your destination is a directory? (Apache docs seem to imply that this is already known, though they don't say by whom. All I know for sure is that it doesn't come through as a 301 the way a server-level www redirect does.)

If the request is for a directory, does the config file say "Go on in, you're looking for a file called index.htm"? Or does it give you a list of possibilities? (In the case of my server, the request must have picked up this information before it reaches the top-level htaccess, because the rules only work for \.html)

What if there isn't a named Index file? Does the config file say "Here's your auto-generated Index, you'll need it"? Or do you go in, look around, and then go all the way back and report that there's nobody there, but along the way you passed an htaccess file that said Options +Indexes, so I'll take an auto-index now if it isn't too much trouble? Or do you have to keep running back and forth until you've found out whether there's a FallbackResource (mod_dir), and only then get to ask if mod_autoindex even exists?

Now, about those htaccess files.

Does the request have to go all the way up to its destination on the off chance that there might be an htaccess file containing SetEnvIf instructions, and only then go back to the beginning to pick up Rewrites, and then look for Options and so on? Or does the server tell you what's where?

If you meet a RewriteRule that ends in [F], do you not pass go, not collect $200, and proceed directly to the 403 page, or are there further detours before you are allowed to drop dead?

If you get redirected within the same named domain, how far back do you go? Do you knock on the 12.34.56.78 door all over again? Or do you get kicked back to the server? to the lobby of the userspace (assuming shared hosting)? to the top level of the domain?

What if you get rewritten? Where do you go?

And, finally, what about all those core-level Admits and Denys? Do those happen up front, or only after you've jumped through the whole htaccess alphabet? Or both before and after, to allow for mod_setenvif activity?

tangor

10:22 am on Oct 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy! You make my head hurt! g1smd! Please rescue this damsel!

Actually, these are interesting questions, some of which you already know the answers (probably the same I know), but go way beyond anything I've ever required since I KISS everything. If it doesn't work I NUKE it (No Use Keepin' Errors).

g1smd

6:16 pm on Oct 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The HTTP request hits the server and consists of many lines of information. See Live HTTP Headers for examples of what your browser sends. It includes the requested path and domain as well as details of a number of things that the browser would prefer to see in a reply.

The request is processed by one Apache module at a time, the order defined by the reverse order the modules are loaded in the Apache configuration.

Some modules are always present, and others are optional and might not be installed on your server.

Mod_access will process the request and look in the main config for settings it can use, then look in each htaccess file in turn from the root, to lower levels.

Mod_auth will process the request and look in the main config for settings it can use, then look in each htaccess file in turn from the root, to lower levels.

Mod_alias will process the request and look in the main config for settings it can use, then look in each htaccess file in turn from the root, to lower levels.

Mod_rewrite will process the request and look in the main config for settings it can use, then look in each htaccess file in turn from the root, to lower levels.

The next Apache module listed in the configuration will then process the request.

This happens in turn until all configuration settings have been processed by all active modules.

The result of each process might be to deny the request and return an error, or modify the request and then pass it on to another module for further processing.

The DirectorySlash directive redirects to add a trailing slash when example.com/foo does not exist as a page and does potentially exist as a folder.

The DirectoryIndex directive maps the URL "/" to the file "index.html". It's essentially a special type of rewrite.

There's many more directives that have an effect on each and every request.

Finally, the point will be reached when there's no more directives to process in any level of the configuration. The server now sends back the HTTP headers and then the content for the HTML page.

There's diagrams deep in the Apache documentation explaining this stuff, but it's quite dry reading.

lucy24

6:17 am on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Aha! I had pictured it exactly backward, then. So the request waits on the doorstep while the modules go running back and forth like short-order cooks.

The advantage to making a supplementary local .htaccess file is that when you type $ by mistake for #, only requests for that directory get a 500. The Error Log was remarkably polite about it, all things considered.

g1smd

6:34 am on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The disadvantage of using multiple .htaccess files is that it is very easy to inadvertently introduce unwanted redirection chains for certain requests. This is a problem that might kill your site indexing without you really knowing why, unless you just happened to test the site with that particular URL and look very carefully at the Live HTTP Headers output.

lucy24

8:50 am on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The one I don't understand is the conventional hotlink rewrite. The new name still has an image-type extension, which is what triggers the initial rewrite, so why doesn't it get rewritten again and again and again until the server puts its foot down?

g1smd

6:00 pm on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You'll have to supply a code fragment to demonstrate what you mean...

lucy24

7:55 pm on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This kind of thing:

RewriteCond %{HTTP_REFERER} ! {various exemptions here} 
RewriteRule \.(jpe?g|gif|png)$ pictures/hotlink.png [NC,L]


(with pause for a D'oh! moment as I realize I could save time and space my using pipes instead of [OR] for most of the exemptions).

It would be understandable if the rewrite silently changed the referer-- either to nothing or to my own domain-- but if logs can be believed, it doesn't. In fact, far as I can tell, the logs don't give any hint at all that a rewrite has taken place. I have often caught myself asking why that so-and-so walked blandly in with a 200, before remembering that they did nothing of the sort.

Samizdata

8:13 pm on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The one I don't understand is the conventional hotlink rewrite. The new name still has an image-type extension, which is what triggers the initial rewrite, so why doesn't it get rewritten again and again and again until the server puts its foot down?

Presumably there will be an exception made for the "substitute" filename:

# Blank referrer exception
RewriteCond %{HTTP_REFERER} !^$
# Your domain exception
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com [NC]
# Substitute filename exception
RewriteCond %{REQUEST_URI} !^/hotlink\.
# Everyone else
RewriteRule \.(gif|jpe?g|png)$ http://www.example.com/hotlink.$1 [NC,L]

In this simplified example any file at root level named "hotlink" - regardless of the file extension - is exempt from triggering the subsequent rewrite rule (as it needs to be to avoid the infinite loop of despair).

Files named "hotlink.gif", "hotlink.jpg", or "hotlink.png" would be substitued for image requests that include a referrer from an outside domain - blank referrers will get through, but hotlink defence is referrer-based and therefore not perfect.

There are commonly other conditions making exemptions for desirable referrers - possibly your other domains or image search engines.

But as suggested, a code snippet might be easier to address.

...

lucy24

9:45 pm on Oct 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Presumably there will be an exception made for the "substitute" filename

That's the thing: There isn't. It never occurred to me. The code started with cut-and-paste (from someone who is using a more-or-less identical function with the same host). I've added referer exemptions but nothing else. And they blithely come through as 200, not as a mountain of redirects, so it obviously is working as intended.

Which is why I rarely go through a day without the "noidea" smiley ;)

or image search engines

Funny you should say that. Search engines as such don't need exemptions* because they're covered under null referers. But I did have to put in exemptions for the various forms of google translate-- the person really is viewing the page, it's just filtered through google-- and "imgres" which behaves similarly.

But as suggested, a code snippet might be easier to address.

You were probably typing while I was pasting :)

Edit:
Do you really use three different hotlink images to match the originally requested format? I've got a single 16-color (really just three) png that stands in for everything. Gif would work too, but the design would be pretty catastrophic for a jpg.


* Although sometimes they need out-and-out barriers, because I have learned from experience that images from two specific directories are used for nothing but hotlink fodder.

Samizdata

1:20 pm on Oct 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you really use three different hotlink images to match the originally requested format?

I actually used 12 different formats (including audio and video) on one multimedia site.

In all cases hotlinkers got an advertisement instead of the file they were leeching.

It seemed reasonable to me.

...