Welcome to WebmasterWorld Guest from

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

Does Web Trends exclude spiders by default?



8:49 am on Oct 15, 2007 (gmt 0)

10+ Year Member

In September we got an average of 21 page views per visit during an average of 30 seconds.

Either our users have that ability to read whole pages in just a few pages or something is wrong with our stats...

I suspect that spider hits are being counted as pageviews too. Is this a possibility?


1:53 pm on Oct 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Yes. You have to turn on (or make and turn on) spider filters.

This assumes you are using server logs as your raw data. If you are getting data through page scripting, most spiders will not be in those kinds of logs.


10:13 am on Oct 17, 2007 (gmt 0)

10+ Year Member


I've also realised that there are other agents that 'scrape' our content and publish it on their other sites. We actually don't mind this happening but it would be better to filter them out too.

Problem is, though, how are we going to determine which ones these are? Would it be possible to filter out data from IP addresses that show odd behaviour (eg. very high pageviews per visit ratios)?


10:45 am on Oct 17, 2007 (gmt 0)

10+ Year Member

One more thing...

I've just noticed that in September 34% of visits (as reported in the Site Design > Browsers & Systems > Platforms screen) came from spiders.

Would this explain why the average page views per visit is 21 in a 31 second (average) timeframe?

Either that or out users are very keen on our content and read pages very quickly.

I'm also wondering whether all this spider traffic is costing us given we regularly exceed our page views quota.


1:30 am on Oct 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Yes, the high pageviews per visit ratio is a really good way to identify IP addresses needing to be filtered out. But if you make a WebTrends filter for those IPs, you'll still be charged for the act of filtering --- because WT is, in a way, processing those lines (by applying filters).

If you can write a script that removes those IP address lines and the strings in the User Agent field, you can save yourself a lot of pageview quota. There are a lot of ways to do such a script and you can also use something like Log Parser to do it. The lines have to be removed before WT processes the logs. You may find that once you've identified the main culprits you won't have to keep modifying your script, at least not more than once or twice a year.

WebTrends has a file called browsers.ini that has a section that contains the strings or IPs that are in its existing spider filters, so that's a place to start for values to filter. But you'll definitely find others by looking at the Visitors table. The best way to use the Visitors Table is in a profile that is set up to have IP/User Agent as its sessionizing method, because the Visitor Table will then display the IP/UA (instead of the cookie value).

(I'm assuming you're using WT software and can actually preprocess the logs)

I am pretty sure this is accounting for the strange pattern you're seeing, and you'll probably be pretty happy with how many page views you save. I've seen it be as much as 50% for mid-size sites.


3:20 pm on Oct 18, 2007 (gmt 0)

10+ Year Member

cgrantski, thank you very much for your feedback. I can now go back to IT and suggest what you have mentioned.

I've just been to a presentation for WT's new Marketing Lab 2 suite and must say that both the Visitor Intelligence and Score modules look impressive. But we gotta get our core stats looking half decent before doing anything remotely more sophisticated.


4:37 pm on Oct 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Yes the new products are really a big jump up, very cool.

Featured Threads

Hot Threads This Week

Hot Threads This Month