homepage Welcome to WebmasterWorld Guest from 23.23.8.131
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

This 55 message thread spans 2 pages: 55 ( [1] 2 > >     
Why Your Server Log Files are Complete Fiction
War on Web Sites Claims Analytics Casualty
incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 10:17 pm on Jul 15, 2008 (gmt 0)

If your online business relies on analyzing your web server logs you're already in trouble.

Many online services pretend to be humans by hiding as browsers in order to gain access to your server unnoticed so you can't stop them and as a side effect they are added to the visitor count on your server log analytics.

Just recently there was a massive outcry about AVG [webmasterworld.com] doing this deceptive practice, which has now almost stopped, but many others continue doing this unabated.

Some of the companies that are skewing your analytics include Picscout [webmasterworld.com], Cyveillance [webmasterworld.com], Munax [webmasterworld.com], WebSense [webmasterworld.com], and many more, too numerous to mention here, an increasingly long list growing daily.

Not only companies are doing this but scrapers, spam harvesters, spambots, botnets and all sorts of other illicit activities trying to avoid being stopped as well.

To put it mildly, your server log analytics are complete and utter fiction.

That means clients who think these server side stats are actually meaningful are probably making life miserable for their web designers, SEOs and marketing staff because it's obvious they're doing a bad job considering the low or decreasing conversion rates.

Is there a solution?

Short of writing very complicated software that can detect and filter out all these numerous sources of deceptive activity, switch to javascript based analytics such as Google Analytics. The downside of javascript analytics are that many people are now blocking javascript for security reasons and ad blockers are stopping javascript based tracking systems but it's a small percentage. The upside is that most automated tools don't use javascript so at least you're not counting fake hits, not many anyway.

So your options are slightly under-counting with javascript based solutions or massively over-counting with raw log file analysis.

Sadly, there is no accurate analytics solution at this time.

You can get close, but no cigar.

The best advice at this time is to use both javascript based and raw log file based analytics and you know the truth lies somewhere in the middle and it's closer to the low count than the high.

[edited by: incrediBILL at 10:22 pm (utc) on July 15, 2008]

 

Gomvents

5+ Year Member



 
Msg#: 3699288 posted 11:43 pm on Jul 15, 2008 (gmt 0)

what % of normal people do you think are blocking javascript? Curious because we have e-commerce sites that rely on it...

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3699288 posted 11:44 pm on Jul 15, 2008 (gmt 0)

Right on - the situation today is a total mess.

In one case, I got so desperate to measure real human traffic that I threw out every IP address that had more than 50 page views in a day. Then I ran the analytics over the revised logs. Only at that point I did I start to see patterns that were more actionable, but I know my heavy handed tactics introduced their own distortions.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3699288 posted 11:52 pm on Jul 15, 2008 (gmt 0)

I only use server logs to see which bots are trying to hit a site, and to see which search engines are visiting and then monitor how much content they take.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 12:26 am on Jul 16, 2008 (gmt 0)

what % of normal people do you think are blocking javascript? Curious because we have e-commerce sites that rely on it...

I think that depends on the type of site you have and the type of visitors you attract.

If you have a site that attracts a lot of highly technical types it drastically improves the odds that there will be ad blockers enabled and/or javascript disabled.

However, if you're selling something that attracts the masses, like shoes or clothing, it's more likely they'll be less inclined to have javascript disabled but may have ad blockers enabled.

The upside to running an ecommerce site is people looking for your products are more inclined to allow javascript so your menus and such operate properly.

A couple of years ago I used to think people running with javascript disabled was much higher, but at that time I knew less about the other crawling sources than I do now, so I've revised my earlier estimates.

The average with javascript disabled is below 5%, probably closer to 1-2% for most sites.

signor_john



 
Msg#: 3699288 posted 1:52 am on Jul 16, 2008 (gmt 0)

How many of you have used more than one JavaScript analytics service at the same time?

I've been using Google Analytics and added the Quantcast script to my pages last weekend. It will be interesting to see how the UV and PV numbers compare after a month or so.

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 2:23 am on Jul 16, 2008 (gmt 0)

This could be a golden opportunity for a seo consultant. Get hired by client and then switch them from underreporting javascript to overreporting server logs. Then claim credit for the bump in usage numbers.

As for people wondering about using javascript solutions, it is not a simple situation. I posted some data below that was gathered from the same site using Google Analytics and Quantcast.

Google Analytics vs. Quantcast
34,644 Visits 35,231
30,165 Absolute Unique Visitors 29,360
165,603 Pageviews 160,728
4.78 Average Pageviews 4.56

What is weird is that GA gives me credit for more page views but Quantcast lets me have a higher visitor count. I never noticed this before because I only check one stat on a daily basis - site revenue.

cfx211

10+ Year Member



 
Msg#: 3699288 posted 2:32 am on Jul 16, 2008 (gmt 0)

When have web stats ever been accurate? Sadly this is nothing new. Trying to discern humans from bots has been a problem from the start of web analytics.

Thanks for bringing the latest difficulty to light. This is just another affirmation that letting absolute numbers go several years back was the right decision.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 2:33 am on Jul 16, 2008 (gmt 0)

Get hired by client and then switch them from underreporting javascript to overreporting server logs. Then claim credit for the bump in usage numbers.

Then run and hide when the client starts screaming about his very low conversion rates.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 2:39 am on Jul 16, 2008 (gmt 0)

Trying to discern humans from bots has been a problem from the start of web analytics

Not the bots that play by the rules and properly identify themselves like the major search engines.

Humans used to be the be the dominant species on the web opposed to a few meandering bots but that has completely changed to some small web sites being dominated by more bots than humans.

It's the proliferation of bots that hide under the radar, an ever increasing escalating trend even among the corporate variety, that's quite troublesome.

Week after week the posts keep coming of new webmasters wondering why their stats claim they have all these visitors yet they have no sales, affiliate or adsense revenue.

Someone has to break the bad news to them that those aren't humans on their site.

docbird

10+ Year Member



 
Msg#: 3699288 posted 2:46 am on Jul 16, 2008 (gmt 0)

what of awstats - does it make a somewhat decent attempt at sorting the humans from the chaff?
Had been my impression.

AjiNIMC

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3699288 posted 3:05 am on Jul 16, 2008 (gmt 0)

I have worked on products like Google analytics in past. You are damn correct, it can be close but no cigar.

I think I have used sawmill, awstats and a lot more for better log analyzer but there was a always a big gap.

My only suggestion is that if you are not using log files, stop writing to log to save server resources. What I don't like about the analytics, which I liked about raw file is the raw data. Once you have raw data, you can play a lot with it, like making a click path for IPs and then making an average path for certain keywords etc.

Aji

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3699288 posted 3:06 am on Jul 16, 2008 (gmt 0)

Excellent topic incrediBILL and it couldn't have come at a better time.

The best advice at this time is to use both javascript based and raw log file based analytics and you know the truth lies somewhere in the middle and it's closer to the low count than the high.

Great advice too! GA does a fairly decent job for small to medium sites. Add to that a bit of log file analysis, maybe a custom application to track specific logfile activity and you are one step ahead of the Jones'.

I'm looking forward to more discussion on this as I'm getting ready to Ban 75% of the Planet on 2009-01-01 for one site as an experiment for the year. You'll have to knock and wait for someone to answer the door if you are not on the Allowed List. ;)

By the way, I think the 2009 year will be an eye opener for many at which time these topics will be Front Page every day. ;)

what of awstats - does it make a somewhat decent attempt at sorting the humans from the chaff?

I hope you have a locked down version. < Is there such a thing?

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 3:18 am on Jul 16, 2008 (gmt 0)

what of awstats

robots.pm (the robots list for awstats) is just a big long list of known bots and it seems they have some mechanisms to detect some additional details, but I'm not so sure it would be real good at detecting the types I'm talking about here that don't want to be detected in the first place.

None of the bots in their list would ever skew the stats of humans vs. bots.

If you merely sort out everything that isn't MSIE/FF/OPERA you have a good start and you can do that with a whitelist, not that big list of bots, but their list provides links to their owners sites which is nice.

However, the bots I'm talking about always claim to be MSIE/FF/OPERA so you have to have ranges of hosting data centers and such to filter out all of the other automated noise.

Once you've done that, then you have to filter out rogue activity which isn't always so obvious, things that aren't even stored in the log files.

For instance, a scraper using AOL will hop from IP to IP on a timer and the only way to really tell it's the same scraper is if that scraper is accepting your cookie which many do these days.

You don't find cookie data in log files.

I could go on and on, but there's quite a bit of information that you don't see in a post-mortem analysis simply because the data retention would be astronomical and so would the time to process it all.

carguy84

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3699288 posted 3:25 am on Jul 16, 2008 (gmt 0)

THE SKY IS FALLING THE SKY IS FALLING...

So what's the solution?

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3699288 posted 3:29 am on Jul 16, 2008 (gmt 0)

So what's the solution?

Take a proactive approach and do as much as you can to lock things down. Let's talk more about banning specific IP ranges that we know to be the major abusers of our local networks. I'm almost certain that all of us here in the US are getting pounded by the same damn bots and it sure would be nice to just press a button and be done with them. I know, I know, some of you already do that. Hey, I want about 10 of those buttons, where can I get one?

For Windows?

maximillianos

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3699288 posted 3:53 am on Jul 16, 2008 (gmt 0)

I pretty much gauge my traffic on Google's Adsense impressions lately. They are pretty consistent. Even when my visitors are bouncing all over the board due to bots, my Adsense stats remain my one constant North star... ;-)

Clark

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3699288 posted 6:01 am on Jul 16, 2008 (gmt 0)

Is there a way to tie in apache's logfile writes to a dynamically inserted piece of javascript when you have a dedicated server?

Tastatura

5+ Year Member



 
Msg#: 3699288 posted 6:47 am on Jul 16, 2008 (gmt 0)

For instance, a scraper using AOL will hop from IP to IP on a timer and the only way to really tell it's the same scraper is if that scraper is accepting your cookie which many do these days.

sort of true IMO....scraper (app) could accept the cookie and immediately delete it (automatically). Then on a next page visit webmaster would not know for sure if it is same person/scraper or "regular" AOL user

zett

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3699288 posted 7:42 am on Jul 16, 2008 (gmt 0)

I'm using a server log analyzer from a Ukrainian software maker; this does a very good job at filtering out most illegitimate traffic, as well as hotlinkers. It's still higher than the Adsense count (as Adsense is not present on all the pages), but pretty much in line and consistent.

I agree on the notion that the rawness of the raw data is what makes log files so attractive to analyze.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 7:45 am on Jul 16, 2008 (gmt 0)

So what's the solution?

I gave you my best solution, Google Analytics.

Somewhere between GA and AWSTATS lies the truth.

scraper (app) could accept the cookie and immediately delete it

Many scrapers tend to keep the cookie so sites like WebmasterWorld that require cookies won't punt them to the curb.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3699288 posted 8:00 am on Jul 16, 2008 (gmt 0)

I agree on the notion that the rawness of the raw data is what makes log files so attractive to analyze.

Ain't that the truth!? I always thought of raw logfiles as stuff that the programmers looked at every now and then. Or my Server Admins were screaming at me because they were too damn large. That all changed a while back. I scour those puppies now. I have an app to do it for me and it does it in real time. Keeps us on our toes too.

GA has actually helped some of my smaller clients to uncover some unusual stuff. We typically don't watch things like many on a daily basis. Nah, that is boring, it really is. Once a site reaches certain levels, you tend to look more at overall trends on a weekly and/or monthly basis and then dig for the pearls when time permits. Small budgets, minimal time, it has to be spread out elsewhere. Mining anayltics is a tedious task and requires a certain skill that I have yet to really grab hold of, I'm learning though.

For example, I have one client who has a publication. It just happens to utilize a three letter word that is very popular in another industry. Someone ran a little experiment against their domain and hotlinked to the publication with related terms and they were sure to use a few of the exact terms from the publication title. Want to see your traffic go nuts for a little while? That term ranked number one for an entire month. It actually increased sales of the pub too just by default, go figure. Either way, in looking under the microscope at the effects, they were not good and not worth the added revenue, that one was blocked permanently. Within 30 days or so, things were back in order. I watched that term vanish from the list.

Out of all the analytics packages out there, GA can't be beat. You pretty much sign your soul over to the devil and in return you get something that many pay a pretty penny for. Now, Omniture stats are a sight to behold. Especially when there are high volume numbers. Try analyzing the logfiles of a site that large. Anyone here do that? I mean, are you micro-managing more than a million pages?

Sadly, there is no accurate analytics solution at this time.

I wasn't real happy to read that. Are we absolutely sure of that? With all this technology we don't have a solution that is "almost" accurate? ;)

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 8:33 am on Jul 16, 2008 (gmt 0)

Almost accurate isn't accurate.

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3699288 posted 10:00 am on Jul 16, 2008 (gmt 0)

I'm looking at less than 1,000 pages monthly getting hit more than a million times a month. Log files are important... as well as filtering out all the bots and scrappers... and it looks like I get about 200 uniques a month. Which is okay... means more than I thought as NEWBIES to the site, which is very narrow niche.

But I would like to narrow that metric down to REAL USERS if I could.

And I can't. Not yet.

infp

5+ Year Member



 
Msg#: 3699288 posted 11:51 am on Jul 16, 2008 (gmt 0)

Oh, no. Heaven is not falling.

If the internet was infested with bot traffic as the OP claims, then there would be no sites with 5 hits per day. :-)

Receptional Andy



 
Msg#: 3699288 posted 11:56 am on Jul 16, 2008 (gmt 0)

If the internet was infested with bot traffic as the OP claims, then there would be no sites with 5 hits per day. :-)

Even bots have to have a way to find a site before spidering it. "Build it and they will come" doesn't even work for scrapers, I'm afraid ;)

Miamacs

5+ Year Member



 
Msg#: 3699288 posted 11:59 am on Jul 16, 2008 (gmt 0)

Somewhere between GA and AWSTATS lies the truth.

bingo. my fave combination.

...

there's one major upside to server logs though:
reporting all requests and traffic for all files

... for you can't put urchin code into an image
And won't notice access to areas you didn't use js tracking on.

hotlinking / download trends, media files, security issues... unless you serve every single request through jump pages, you won't know what takes up MOST of your data traffic for some sites. And even then you might not notice accesses or break-in attempts to sections you thought to be safe.

So I'd rather say that your agent log is complete fiction...

...traffic reports are not.
ugh... could they be any more real

Edge

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3699288 posted 12:17 pm on Jul 16, 2008 (gmt 0)

Almost accurate isn't accurate.

"Good enough" is good enough.

Folks, bots and humans are going to visit you, just look for the spikes in traffic to chase robots. Work on sorting out or blocking the robots, but donít spend all of your days on it. Focus on attracting humans, for most of us the money is in humans not blocking robots.

[edited by: Edge at 12:18 pm (utc) on July 16, 2008]

Scarecrow

10+ Year Member



 
Msg#: 3699288 posted 1:55 pm on Jul 16, 2008 (gmt 0)

How many of these rogue bots and scrapers are grabbing the images from the page? Would a 1x1 transparent GIF on the page give you a reasonable count of real "eyeballs"?

A lot of people are getting smart and disabling Javascript. I use three browsers:
1) K-Meleon with JS and Java disabled
2) Firefox with prefetch disabled. JS is enabled, but still there's no Flash installed. Some sites that I visit often are bookmarked only in Firefox, because I already know and trust those sites, and know that JS is required for some functions on those sites.
3) Explorer with JS and Flash.

I do my daily "news" surfing with K-Meleon. If I come across something that needs JS and I have to see it, I paste the URL into Firefox.

If I must see some Youtube or news video (maybe a couple times a week) I reload the page in Explorer.

I don't update my browsers; my versions are fairly old. I've never used anti-virus, and never had any infection problems on my Windows XP. Once a week I religiously back up recent files using xcopy in a command window, and copy them to another computer or flash stick. I also export the registry to a backup file, and also set a new restore point on the XP. All cookies on all browsers are killed automatically several times a day, with a routine that was added to a different program that I have to use several times a day anyhow.

ralent

10+ Year Member



 
Msg#: 3699288 posted 2:13 pm on Jul 16, 2008 (gmt 0)

I think the vast majority of web users have no idea what javascript is or how to disable it. W3C publishes javascript stats and even for thier visitors the number who browsed with javascript diabled had dropped quite a bit the last couple of years.

[edited by: ralent at 2:23 pm (utc) on July 16, 2008]

This 55 message thread spans 2 pages: 55 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved