Forum Moderators: DixonJones
Amongst visitors from the UK, Canada and Australia, between 42-49% of the requests on our server are for GIF files, 26-30% are for JPEGs and 16-18% are for HTML pages.
From the USA, the figures are 34% GIFs, 17% JPEGs and 34% HTML, suggesting that something in the US is loading our pages but not images.
Would I probably be correct is assuming that search engines, all/most of which use US-identified IP addresses, are visiting our site and only loading HTML pages and not images? Hence skewing our figures for US visitors?
If I wanted an as-true-as-you-can-get figure for where our actual human visitors are coming from, instead of excluding the spiders from our stats, would another option be looking at the IP addresses of those who load GIFs and JPEGs?
There are some who already know the answers but you really need to see the test results to comprehend what is happening.
Trying to sort out bots is nearly impossible. Most exploit seekers will disguise themselves as a current browser & OS type. I noticed that a particularly large bot farm also has a few bots that don't announce themselves, I always suspected those were the more advanced bots checking for cloaking and SEO exploits.
Start sampling some of the IP addresses. When you start finding "browsers" viewing your site from the Rackspace & Level3 colo facilities, you'll realize that a lot of the odd traffic is just bot activity.
You can create "honeypot" robots.txt entries & links in some of your pages. Bots will follow anything on the page, bad bots will even follow what they are asked politely to ignore, people are only going to click on visible links.