Forum Moderators: phranque
Web Server Logs, Unable to Distinguish Human Activity From Robots and Spiders, Are Providing Inaccurate Data for Web Operators, Advertisers
They put a media advisory press release for that?
[biz.yahoo.com...]
Most robots harvest only textual content and do not download images or execute JavaScript code. By using JavaScript and an image request, HitBox Enterprise filters robots with a very high success rate.
Thanks for the article, I've been asked to provide evidence to refute the traffic claims of a consultant/developer that's running a web project for a state agency (a massive pork barrel). I'll add this to the file.
What's even more bizzare is how often a particular company counts bot traffic that they created as human visitors. Someone in the IT department signs up for one of those annoying monitoring services that randomly hit your site every 10 minutes. A week later, someone in the marketing department logs into the WebTrends report and sees the increase in traffic. The next thing you know, he/she is firing off emails (with every important executive listed in the CC: ) announcing the great success the marketing department has achieved.
I have even run into situations where the agency a company was using intentionally created a monitoring account and then reported it to the client as traffic.
See my comment above about the pork barrel project. Man, the b-s I'm digging through. In this case, one of the commissioners happened to know just enough to smell something was rotten.
What I got 2 days ago was 99% from infected servers! :)
The spider/human ratio depends directly on the demand of keyword the site uses. Niche markets sites are easy to position and will generally get more spider traffic than anything else. If your client in some competitive market spends enough to get good positions, spider traffic, yet welcomed, will be under 5%.
Since the best SE's found just a portion of pages trown at the Internet sea, it is difficult to put any global numbers. It also depends on wich SE spiders your site. marvin from NL crawls the whole shabang every week and gets me .001 visitor a year. :)
One thing is for sure, too much Web sites get no visits from spiders and a few visits by the owners.
I'm really nailing the group I'm helping to audit... The site has an information request form, I'm asking for the mailing list it has generated, names/address/zipcode. I'll then suggest that this be verified by cross-matching to the postage expense roster from their general ledger.