Forum Moderators: DixonJones

Message Too Old, No Replies

List of spiders - how big is their part of activity on a website?

         

Oliver Henniges

12:23 pm on May 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have begun to extract my own data according to visitor-actions on my website. Currently I'd like to filter out SE-spiders from "human" visitors. I don't need exact data, just the most frequent bunch.

Do you know of any concise list of the most important bot-user-agents, which I can easily copy into a php-array?

The reason is: I found that only 3% of our visitors in total seem to put anything into their shopping cart. I find this ratio extremely low, because I think we have a relatively comfortable usability over all and some of the best landing pages convert with more than 6% orders per visit (20% putting into cart). So I believe this low ratio of 3% total is due to the fact that there are quite a lot of spiders frequently entering the main-page.

It is genrally said that an orders per visit ratio of 1-3% is normal. Do you know about other benchmarks after filtering spiders? Is there a general difference between websites acoording to spider activity? (I could imagine that older websites or "good" ones or frequently changing ones are spidered more often?

Receptional Andy

12:34 pm on May 8, 2008 (gmt 0)



A simple but reasonably effective approach for 'well-behaved' spiders would be to block anything that matches text like 'bot' (would catch Google and MSN amongst others) 'crawl', 'spider' or 'http' in the UA. Depending on your site however, you might have large scale spider activity not caught by the above.

Oliver Henniges

3:32 pm on May 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I listed all user agents in a table the past two months, so I might simply walk through that data by hand and see what looks like a spider.

But I doubt I am the first one faced with the problem of getting more accurate data in this respect, so I thought someone might help me not to invent the wheel the second time.

I'm a bit stubborn: My very own anaylsis is getting more and more refined and I come to the conclusion that the number of human beings among all traffic should be reduced to (not by!) 10-20 % of all traffic reported by my hoster's stats. One thing to easily detect is official spiders' user agents. Another one is the anonymous scraper-spiders, hiding behind some ordinary mozilla entry, but coming again and again with an interval larger than half an hour, which means that every visit is registered as a distinct visitor.

Traffic definitely isn't what it seems to be, but obviously noone wants to hear that...

JacobPM

5:27 pm on May 13, 2008 (gmt 0)

10+ Year Member



^^I've been noticing that too and that's just from CTRs from some google ad words I'm managing.

why are bots seeming to out-weigh the human aspect of choice? (ie: people think site A is better, not B, but because B is more bot friendly , B comes up more?)