Forum Moderators: DixonJones
I've also realised that there are other agents that 'scrape' our content and publish it on their other sites. We actually don't mind this happening but it would be better to filter them out too.
Problem is, though, how are we going to determine which ones these are? Would it be possible to filter out data from IP addresses that show odd behaviour (eg. very high pageviews per visit ratios)?
I've just noticed that in September 34% of visits (as reported in the Site Design > Browsers & Systems > Platforms screen) came from spiders.
Would this explain why the average page views per visit is 21 in a 31 second (average) timeframe?
Either that or out users are very keen on our content and read pages very quickly.
I'm also wondering whether all this spider traffic is costing us given we regularly exceed our page views quota.
If you can write a script that removes those IP address lines and the strings in the User Agent field, you can save yourself a lot of pageview quota. There are a lot of ways to do such a script and you can also use something like Log Parser to do it. The lines have to be removed before WT processes the logs. You may find that once you've identified the main culprits you won't have to keep modifying your script, at least not more than once or twice a year.
WebTrends has a file called browsers.ini that has a section that contains the strings or IPs that are in its existing spider filters, so that's a place to start for values to filter. But you'll definitely find others by looking at the Visitors table. The best way to use the Visitors Table is in a profile that is set up to have IP/User Agent as its sessionizing method, because the Visitor Table will then display the IP/UA (instead of the cookie value).
(I'm assuming you're using WT software and can actually preprocess the logs)
I am pretty sure this is accounting for the strange pattern you're seeing, and you'll probably be pretty happy with how many page views you save. I've seen it be as much as 50% for mid-size sites.
I've just been to a presentation for WT's new Marketing Lab 2 suite and must say that both the Visitor Intelligence and Score modules look impressive. But we gotta get our core stats looking half decent before doing anything remotely more sophisticated.