| 1:53 pm on Oct 15, 2007 (gmt 0)|
Yes. You have to turn on (or make and turn on) spider filters.
This assumes you are using server logs as your raw data. If you are getting data through page scripting, most spiders will not be in those kinds of logs.
| 10:13 am on Oct 17, 2007 (gmt 0)|
I've also realised that there are other agents that 'scrape' our content and publish it on their other sites. We actually don't mind this happening but it would be better to filter them out too.
Problem is, though, how are we going to determine which ones these are? Would it be possible to filter out data from IP addresses that show odd behaviour (eg. very high pageviews per visit ratios)?
| 10:45 am on Oct 17, 2007 (gmt 0)|
One more thing...
I've just noticed that in September 34% of visits (as reported in the Site Design > Browsers & Systems > Platforms screen) came from spiders.
Would this explain why the average page views per visit is 21 in a 31 second (average) timeframe?
Either that or out users are very keen on our content and read pages very quickly.
I'm also wondering whether all this spider traffic is costing us given we regularly exceed our page views quota.
| 1:30 am on Oct 18, 2007 (gmt 0)|
Yes, the high pageviews per visit ratio is a really good way to identify IP addresses needing to be filtered out. But if you make a WebTrends filter for those IPs, you'll still be charged for the act of filtering --- because WT is, in a way, processing those lines (by applying filters).
If you can write a script that removes those IP address lines and the strings in the User Agent field, you can save yourself a lot of pageview quota. There are a lot of ways to do such a script and you can also use something like Log Parser to do it. The lines have to be removed before WT processes the logs. You may find that once you've identified the main culprits you won't have to keep modifying your script, at least not more than once or twice a year.
WebTrends has a file called browsers.ini that has a section that contains the strings or IPs that are in its existing spider filters, so that's a place to start for values to filter. But you'll definitely find others by looking at the Visitors table. The best way to use the Visitors Table is in a profile that is set up to have IP/User Agent as its sessionizing method, because the Visitor Table will then display the IP/UA (instead of the cookie value).
(I'm assuming you're using WT software and can actually preprocess the logs)
I am pretty sure this is accounting for the strange pattern you're seeing, and you'll probably be pretty happy with how many page views you save. I've seen it be as much as 50% for mid-size sites.
| 3:20 pm on Oct 18, 2007 (gmt 0)|
cgrantski, thank you very much for your feedback. I can now go back to IT and suggest what you have mentioned.
I've just been to a presentation for WT's new Marketing Lab 2 suite and must say that both the Visitor Intelligence and Score modules look impressive. But we gotta get our core stats looking half decent before doing anything remotely more sophisticated.
| 4:37 pm on Oct 18, 2007 (gmt 0)|
Yes the new products are really a big jump up, very cool.