Forum Moderators: DixonJones

Message Too Old, No Replies

Cache Settings Make Confusing Stats

         

l008comm

8:37 am on Sep 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



I've always done zero caching on my site. The main reason is because the content is dymanic. There are other reasons too.
But recently, I was thinking about it and realized that even if I have plenty of spare bandwidth and server CPU, theres still no reason not to cache the basic static files.

In the attached image below, the days above the grey line are essentially un-cached.
Below the grey line, I have it caching images and sounds files for 1 week, and CSS & javascript files for 1 day. No caching of HTML. I only have a few small audio files, and nearly all of my images are small .svg files.

I anxiously awaited the stats to see how things changed. Well as expected, my http requests are way down. Close to 20-25% drop in http requests. Not bad. But heres where it gets strange. NO CHANGE in daily bandwidth! That doesn't even seem possible. Technically bandwidth has gone up a hair but statistically it's basically the same as pre-cache.

23% of total http requests is a LOT. My images may be small but they are images, they are not blank files. Also my 1-day caches, the CSS and Javascript files are probably the biggest files on the whole site. I host my own jquery library file, but even besides that, those files would still be the largest.

These stats are generated with Awstats.
Visits - Honestly not entirely sure how this is calculated, possibly unique IPs over a given time frame
Pages - html page loads
Hits - All HTTP requests
Bandwidth - Bandwith transferred as calculated through apache access logs, which means this should only be file size, not HTTP headers
All of the average values are numbers i calculated myself.

(Looks like you can't embed images here so you'll have to click to see the spreadsheet)
[i.imgur.com ]

I've been pondering this for 10 days now, these numbers do not make any sense to me. Do they make sense to anyone else?

engine

8:45 am on Sep 1, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Can you get data on what is using the bandwidth? For example, humans, good bots, bad bots?
Some ISPs cache sites, so I wonder if there's something going on there.

l008comm

8:49 am on Sep 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



I can't get much more detail on the bandwidth use, not in any way that will help here. But the thing is, even if I had lots of bots (and I probably do), I've still definitely reduced my http requests by ~150,000 requests. 150,000 x even my smallest files, would still work out to a ~60 MB savings. But plenty of the cached files would be larger files. So I guess what I'm saying is that I don't see how ISP-side caching or heavy bot use could give me these results?

keyplyr

8:54 am on Sep 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These stats are generated with Awstats
I'm not a fan.

Can't you get stats directly from your server? Or host?

l008comm

8:56 am on Sep 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



I run awstats on my own server.

keyplyr

9:04 am on Sep 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's JavaScript in the mark-up of your pages?

l008comm

9:07 am on Sep 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm not sure what you're asking? Are you asking about my javascript regarding my odd results, or are you asking about how my Awstats works?

keyplyr

9:10 am on Sep 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, your Awstats report gets its data from JS code on each of your page mark-ups, isn't that correct?

l008comm

9:13 am on Sep 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



No Awstats is a log analyzer, it parses my apache access logs every night.

l008comm

9:15 am on Sep 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



ALSO, I was wrong, i DO have a bit more info on bandwidth. I can view the whole month's bandwidth divided up by filetype. This image is from July, not August, so we could see the numbers unaffected by being half cached. The PHP files are my HTML files, and they alone are only ~26% of bandwidth, and are not cached. This just doesn't add up at all.

[i.imgur.com ]

keyplyr

9:17 am on Sep 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm aware of what Awstats is. No need to be defensive, I'm trying to help determine why you're seeing discrepencies in your bandwidth reports.

Isn't that the purpose why you're asking why "Cache Settings Make Confusing Stats?"

But as long as your getting data directly from the server and not report software, it should be valid.

l008comm

9:21 am on Sep 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Not sure how "No Awstats is a log analyzer, it parses my apache access logs every night." is "being defensive", and also if you already know what awstats is, why are you asking me about it?

keyplyr

9:23 am on Sep 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Have a good night.

phranque

5:26 pm on Sep 1, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I would look at the before and after for human versus bot bandwidth usage.

lucy24

5:53 pm on Sep 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



NO CHANGE in daily bandwidth!

No reason there would be a noticeable change, if you're only caching static files such as images and css.

-- Changes will only become visible when humans return for their second visit after the change. (On the first post-change visit, they don't know that caching has changed.) Even then, unless you had previously set the site explicitly to cache nothing ever, human browsers will do some caching on their own initiative.

-- The vast majority of robots don't request non-page files in the first place. The ones that do--mainly search engines--use their own rules about how often to re-request material. (It's the same thing as putting information in a meta or sitemap about change frequency of pages: the search engine will follow its own judgement, not yours.) Search your raw logs for 304 responses. Those are search engines verifying that suchandsuch content hasn't changed since last time.

The only scenario where I would expect a noticeable change in bandwidth is if many pages shared the same enormous image, and this image file's caching was previously set to none-at-all so human visitors had to request it over and over.

But that's your total bandwidth, measured globally over the course of a day, week or month. It will still make a difference for individual human users as pages load up faster.

l008comm

8:24 pm on Sep 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



But clearly humans are returning for their second visit. I have cut my total HTTP requests by 150,000 per day!
Even if bots are loading HTML (php) files directly and therefore not being affected by cache settings at all... I still reduced my daily HTTP request load by 150,000, so clearly LOTS of humans are loading the page several times, or navigating the site a bit. What your suggesting would make sense if I had lots more bot traffic than i do, AND most importantly, HTTP requests had also not changed.

phranque

2:28 am on Sep 2, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I believe you are making my case for a segmented analysis of the server access log files.
Simply filtering the 304 responses as suggested by Lucy24 might be enlightening.

You might learn more by analyzing the raw log files instead of relying on awstats.
You can dump them into a spreadsheet but I often use Unix command line tools such as grep, split, sort, uniq...

l008comm

8:06 am on Sep 2, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm not sure specifically what you're suggesting? There's no easy way to determine which traffic is a bot and which is not. And the full month's worth of access logs are 4.5 GB so they're sure not going to fit into a spreadsheet. I'm very confused by your suggestion at this point.

phranque

12:52 am on Sep 3, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i certainly wasn't suggesting it would be easy but i have done this type of analysis many times using various tools.
4.5G/month means your average daily log file is ~150M which is well within typical spreadsheet data model size limits.

phranque

1:20 am on Sep 3, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



which traffic is a bot and which is not

we have a forum dedicated to the study of bots...
Search Engine Spider and User Agent Identification [webmasterworld.com]