Welcome to WebmasterWorld Guest from 18.210.27.34

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

Cache Settings Make Confusing Stats

     
8:37 am on Sep 1, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


I've always done zero caching on my site. The main reason is because the content is dymanic. There are other reasons too.
But recently, I was thinking about it and realized that even if I have plenty of spare bandwidth and server CPU, theres still no reason not to cache the basic static files.

In the attached image below, the days above the grey line are essentially un-cached.
Below the grey line, I have it caching images and sounds files for 1 week, and CSS & javascript files for 1 day. No caching of HTML. I only have a few small audio files, and nearly all of my images are small .svg files.

I anxiously awaited the stats to see how things changed. Well as expected, my http requests are way down. Close to 20-25% drop in http requests. Not bad. But heres where it gets strange. NO CHANGE in daily bandwidth! That doesn't even seem possible. Technically bandwidth has gone up a hair but statistically it's basically the same as pre-cache.

23% of total http requests is a LOT. My images may be small but they are images, they are not blank files. Also my 1-day caches, the CSS and Javascript files are probably the biggest files on the whole site. I host my own jquery library file, but even besides that, those files would still be the largest.

These stats are generated with Awstats.
Visits - Honestly not entirely sure how this is calculated, possibly unique IPs over a given time frame
Pages - html page loads
Hits - All HTTP requests
Bandwidth - Bandwith transferred as calculated through apache access logs, which means this should only be file size, not HTTP headers
All of the average values are numbers i calculated myself.

(Looks like you can't embed images here so you'll have to click to see the spreadsheet)
[i.imgur.com ]

I've been pondering this for 10 days now, these numbers do not make any sense to me. Do they make sense to anyone else?
8:45 am on Sept 1, 2017 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:26233
votes: 995


Can you get data on what is using the bandwidth? For example, humans, good bots, bad bots?
Some ISPs cache sites, so I wonder if there's something going on there.
8:49 am on Sept 1, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


I can't get much more detail on the bandwidth use, not in any way that will help here. But the thing is, even if I had lots of bots (and I probably do), I've still definitely reduced my http requests by ~150,000 requests. 150,000 x even my smallest files, would still work out to a ~60 MB savings. But plenty of the cached files would be larger files. So I guess what I'm saying is that I don't see how ISP-side caching or heavy bot use could give me these results?
8:54 am on Sept 1, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


These stats are generated with Awstats
I'm not a fan.

Can't you get stats directly from your server? Or host?
8:56 am on Sept 1, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


I run awstats on my own server.
9:04 am on Sept 1, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


It's JavaScript in the mark-up of your pages?
9:07 am on Sept 1, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


I'm not sure what you're asking? Are you asking about my javascript regarding my odd results, or are you asking about how my Awstats works?
9:10 am on Sept 1, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Yes, your Awstats report gets its data from JS code on each of your page mark-ups, isn't that correct?
9:13 am on Sept 1, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


No Awstats is a log analyzer, it parses my apache access logs every night.
9:15 am on Sept 1, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


ALSO, I was wrong, i DO have a bit more info on bandwidth. I can view the whole month's bandwidth divided up by filetype. This image is from July, not August, so we could see the numbers unaffected by being half cached. The PHP files are my HTML files, and they alone are only ~26% of bandwidth, and are not cached. This just doesn't add up at all.

[i.imgur.com ]
9:17 am on Sept 1, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


I'm aware of what Awstats is. No need to be defensive, I'm trying to help determine why you're seeing discrepencies in your bandwidth reports.

Isn't that the purpose why you're asking why "Cache Settings Make Confusing Stats?"

But as long as your getting data directly from the server and not report software, it should be valid.
9:21 am on Sept 1, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


Not sure how "No Awstats is a log analyzer, it parses my apache access logs every night." is "being defensive", and also if you already know what awstats is, why are you asking me about it?
9:23 am on Sept 1, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Have a good night.
5:26 pm on Sept 1, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11770
votes: 223


I would look at the before and after for human versus bot bandwidth usage.
5:53 pm on Sept 1, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15752
votes: 826


NO CHANGE in daily bandwidth!

No reason there would be a noticeable change, if you're only caching static files such as images and css.

-- Changes will only become visible when humans return for their second visit after the change. (On the first post-change visit, they don't know that caching has changed.) Even then, unless you had previously set the site explicitly to cache nothing ever, human browsers will do some caching on their own initiative.

-- The vast majority of robots don't request non-page files in the first place. The ones that do--mainly search engines--use their own rules about how often to re-request material. (It's the same thing as putting information in a meta or sitemap about change frequency of pages: the search engine will follow its own judgement, not yours.) Search your raw logs for 304 responses. Those are search engines verifying that suchandsuch content hasn't changed since last time.

The only scenario where I would expect a noticeable change in bandwidth is if many pages shared the same enormous image, and this image file's caching was previously set to none-at-all so human visitors had to request it over and over.

But that's your total bandwidth, measured globally over the course of a day, week or month. It will still make a difference for individual human users as pages load up faster.
8:24 pm on Sept 1, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


But clearly humans are returning for their second visit. I have cut my total HTTP requests by 150,000 per day!
Even if bots are loading HTML (php) files directly and therefore not being affected by cache settings at all... I still reduced my daily HTTP request load by 150,000, so clearly LOTS of humans are loading the page several times, or navigating the site a bit. What your suggesting would make sense if I had lots more bot traffic than i do, AND most importantly, HTTP requests had also not changed.
2:28 am on Sept 2, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11770
votes: 223


I believe you are making my case for a segmented analysis of the server access log files.
Simply filtering the 304 responses as suggested by Lucy24 might be enlightening.

You might learn more by analyzing the raw log files instead of relying on awstats.
You can dump them into a spreadsheet but I often use Unix command line tools such as grep, split, sort, uniq...
8:06 am on Sept 2, 2017 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 22, 2004
posts: 142
votes: 0


I'm not sure specifically what you're suggesting? There's no easy way to determine which traffic is a bot and which is not. And the full month's worth of access logs are 4.5 GB so they're sure not going to fit into a spreadsheet. I'm very confused by your suggestion at this point.
12:52 am on Sept 3, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11770
votes: 223


i certainly wasn't suggesting it would be easy but i have done this type of analysis many times using various tools.
4.5G/month means your average daily log file is ~150M which is well within typical spreadsheet data model size limits.
1:20 am on Sept 3, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11770
votes: 223


which traffic is a bot and which is not

we have a forum dedicated to the study of bots...
Search Engine Spider and User Agent Identification [webmasterworld.com]