Forum Moderators: phranque
<LINK REL="SHORTCUT ICON" HREF="http://www.example.com/customname.ico">
-- to my home page to kick-start at least the MSN-related gang into taking ONE specific favicon file, and ideally only once. (My original
/favicon.ico is in my root directory, too.) Alas, I'm still beset with the likes of the snippets below -- and those aren't all of the requests for the exact same file from the exact same hosts thus far today. Some browsers even get a favicon for every single file opened, and with message boards and archived posts, some of my daily visitors retrieve scores and scores of the exact same favicon every single day, day after day after day!
Oh, and the worst offenders, like the multiple waproxy hosts, never get anything BUT favicons. Huh-wha?
Sure a boatload of 4k favicons isn't going to break the bank -- the additional lines in access and rewrite logs practically total more than the files do -- but I'm tired of the pestilence. Short of figuring out how to remove the requests from my logs and restarting the server and all that, is there some way to deal with these besides 403'ing the file requests or, going to the extreme, 403'ing the Hosts/IPs?
TIA for any/all thoughts.
-----
waproxya29.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; MSN 9.0;MSN 9.1; MSNbMSNI; MSNmen-us; MSNcIA)
06/27 12:19:03 /favicon.ico 403 -
06/27 12:19:04 /favicon.ico 403 -
06/27 12:19:04 /favicon.ico 403 -
06/27 12:19:05 /favicon.ico 403 -
06/27 12:19:05 /favicon.ico 403 -
06/27 12:19:06 /favicon.ico 403 -
06/27 12:43:40 /favicon.ico 403 -
06/27 12:43:41 /favicon.ico 403 -
06/27 12:43:41 /favicon.ico 403 -
06/27 12:43:42 /favicon.ico 403 -
06/27 12:43:42 /favicon.ico 403 -
06/27 12:43:43 /favicon.ico 403 -
06/27 12:45:43 /favicon.ico 403 -
06/27 12:45:43 /favicon.ico 403 -
06/27 12:45:44 /favicon.ico 403 -
06/27 12:45:45 /favicon.ico 403 -
06/27 12:45:45 /favicon.ico 403 -
06/27 12:45:46 /favicon.ico 403 -
06/27 12:47:04 /favicon.ico 403 -
06/27 12:47:05 /favicon.ico 403 -
06/27 12:47:05 /favicon.ico 403 -
06/27 12:47:06 /favicon.ico 403 -
06/27 12:47:06 /favicon.ico 403 -
06/27 12:47:07 /favicon.ico 403 -
06/27 12:48:36 /favicon.ico 403 -
06/27 12:48:37 /favicon.ico 403 -
06/27 12:48:38 /favicon.ico 403 -
06/27 12:48:38 /favicon.ico 403 -
06/27 12:48:39 /favicon.ico 403 -
06/27 12:48:40 /favicon.ico 403 -
-----
24-117-xx-xx.cpe.cableone.net
Mozilla/4.0 (compatible; GoogleToolbar 4.0.629.4924-big; Windows XP 5.1; MSIE 7.0.5346.5)
06/27 11:11:58 /favicon.ico 403 -
06/27 11:20:17 /favicon.ico 403 -
06/27 11:20:45 /favicon.ico 403 -
06/27 11:22:05 /favicon.ico 403 -
06/27 11:22:24 /favicon.ico 403 -
06/27 11:32:30 /favicon.ico 403 -
06/27 12:00:25 /favicon.ico 403 -
06/27 12:59:28 /favicon.ico 403 -
06/27 13:07:06 /favicon.ico 403 -
06/27 13:53:26 /favicon.ico 403 -
06/27 13:55:11 /favicon.ico 403 -
06/27 14:19:55 /favicon.ico 403 -
You can use any of the on-line header checkers to look at the Cache-control and Expires headers; I prefer to use the Live HTTP Headers extension for Firefox myself -available from mozilla.org.
Jim
I'm not 100% sure what I'm about to reply is what you're asking about, so with that caveat, here you go --
Here are the results of a wget -S (which some info I found said "prints out the headers returned from the server"):
HTTP request sent, awaiting response... 200 OK
2 Date: Tue, 27 Jun 2006 23:14:11 GMT
3 Server: Apache/1.3.22
4 Last-Modified: Mon, 26 Jun 2006 21:21:20 GMT
5 ETag: "408310-3a01-44a04fd0"
6 Accept-Ranges: bytes
7 Content-Length: 14849
8 Connection: close
9 Content-Type: text/html
(Astoundingly, I negated my .htaccess for a mere six seconds to run wget against the site because wget is forever 403'd, and in that eye-blink's time, I got nailed by "IrssiUrlLog" pointed right at a significant, and significantly URL-rich page. DogGONEit!... Okay. Onward.)
I also have the following in .htaccess, even though I'd have to re-read all of my notes to remember exactly what everything's doing exactly. That said, the goal for the quickie .jpg Expires coupled with the "no-cache" main index.html is because every cached copy of the index tries to call a .jgg trap and, being off the server, the request gets 302'd as hijacked. (Aside: The site is 99.9% GIFs.) This way, the cached file must call home. I think... Regardless, the combo immediately cut out the 'false' hijackings.
## 1 hour = 3600 seconds; 4 hours = 14400
## 1 day = 86400; 3 days = 259200
## 7 days = 6040800; 30 days = 2592000
ExpiresActive On
ExpiresByType image/gif A259200
ExpiresByType image/jpg A3600
ExpiresByType text/html A259200
ExpiresByType image/ico A259200
##
<Files index.html>
Header append Cache-Control "no-cache, must-revalidate"
</Files>
<Files 403.html>
Header append Cache-Control "no-cache, must-revalidate"
</Files>
Apres your reply, I added that "image/ico" entry and unblocked one of the less damaging favicon-suckers. Now I figure I'll wait and see if anything changes. Might be tomorrow before I have enough data.
So-o-o what do you think, please? Are the above Headers and Expires and Files containers and such laughingly awry or maybe sorta-kinda okay?
(crosses fingers)
I'd suggest:
<Files *>
Header set Cache-Control "no-cache, must-revalidate"
</Files>
Also, I note that you have your .jpg image files set to expire every hour, and pages set to expire after one month. You may have special requirements, but this seems backwards to me... Usually, named images almost never change, and pages are updated much more frequently (or could be). So I generally recommend setting images at two weeks or longer, and pages at one day or shorter -- But it all depends on how 'timely' your pages need to be, and whether you change images without changing the filename.
Also generally, I recommend setting the expiry time at one-half the average update interval. That is, if you update a page daily, set its expiry at 12 hours maximum. That way, if someone loads and caches the page *just before* you update it, they will have a chance to see the new page 12 hours later -- which is at least possibly 'today'.
Also, note that the cache-control 'protocol' is a mess, and that "no-cache, must-revalidate" doesn't really mean the client shouldn't cache the file. What that whole line does mean is that the client can cache the resource, but should check with the server every time the resource is requested, to be sure it hasn't been updated in the meantime (you will see the resulting 304-Not Modified responses in your logs). Then some clients use Expires over Cache-control, others use Cache-control over Expires. But one thing is certain: best results are usually had if you specify *something* workable for both headers, rather than letting the client decide.
References:
AOL Caching Info [webmaster.info.aol.com] (simple tutorial)
Caching Tutorial for Web Authors and Webmasters [mnot.net] (much more detail)
Jim
> Astoundingly, I negated my .htaccess for a mere six seconds to run wget against the site because wget is forever 403'd, and in that eye-blink's time, I got nailed by "IrssiUrlLog" pointed right at a significant, and significantly URL-rich page. DogGONEit!... Okay. Onward.
Replace 192.168.1.2 with your own IP address:
# Block Wget (unless I'm using it)
RewriteCond %{HTTP_USER_AGENT} wget [NC]
RewriteCond %{REMOTE_ADDR} !^192\.168\.1\.2$
RewriteCond %{REQUEST_URI} !^/403\.html$
RewriteRule .* - [F]