homepage Welcome to WebmasterWorld Guest from 54.205.205.47
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Spiders messing with my stats
Spiders messing with my stats
PumpkinHead




msg:1265110
 3:55 pm on Nov 4, 2005 (gmt 0)

Hi all,

I'm tracking the number of times a page has been viewed by my visitors, using the following code:

<?php
$ip = $_SERVER["REMOTE_ADDR"];
if ($ip == '66.249.66.242')
{ $write_rec = 'N'; }

if ($write_rec!= 'N')
( ** Write record to database ** }
?>

The '66.249.66.242' IP is one of Google's spiders. Obviously this is a bad way of doing things because I would need the IP of every spider to successfully differentiate between a human visitor and a spider.

Whats the best way of doing this?

 

chirp




msg:1265111
 4:06 pm on Nov 4, 2005 (gmt 0)

Something like this?

if(!preg_match("/Googlebot/", $_SERVER['HTTP_USER_AGENT'])) { 
# not Googlebot
}

;)

PumpkinHead




msg:1265112
 4:18 pm on Nov 4, 2005 (gmt 0)

Hi,

Thanks for the reply. What if it's yahoo bot, msn etc etc. I'm then back in a similar situation or is this the best I can do?

jatar_k




msg:1265113
 4:30 pm on Nov 4, 2005 (gmt 0)

you could have an array of bot names then compare the user agent to the array using in_array()

PumpkinHead




msg:1265114
 4:40 pm on Nov 4, 2005 (gmt 0)

Would it be safe to check the HTTP_USER_AGENT and not write a record if any of the following are found:

crawl, bot, slurp, spider, seek, collect, track

I've just been looking through a list of HTTP_USER_AGENT and I think these may work, I realise this isn't going to be perfect though.

jatar_k




msg:1265115
 4:44 pm on Nov 4, 2005 (gmt 0)

could work, seems roughly ok

using user agent is never an exact science

though this makes me think, why are you doing this? Is it to be displayed on the page?

or is it some kind of tracking? if tracking then you would be much better off using a stats package and your raw logs.

PumpkinHead




msg:1265116
 5:18 pm on Nov 4, 2005 (gmt 0)

Hi,

It is for tracking but only for my reference. My site is built from a database, so for example I have the following in the database:

widget0001 ¦ widgetinformation ¦ view count
widget0002 ¦ widgetinformation ¦ view count
...
widget9323 ¦ widgetinformation ¦ view count

I have a stats package but I'd like this count just so that I know which is the most popular page when I'm reading my raw data.

jatar_k




msg:1265117
 5:30 pm on Nov 4, 2005 (gmt 0)

you could do some baseline work

insert for each hit and then see what needs to be filtered once you have enough base data

use a table that stores

ip
pagename
user agent

then you can select counts from mysql and start excluding bots that you find in there, you could also maintain a botlist as well to load into your array for comparison and tighten up your pageview counting over time

AcsCh




msg:1265118
 9:59 am on Nov 5, 2005 (gmt 0)

Why not filter positive? There are only a few browsers out there, but many bots. So go for

If {$_SERVER['HTTP_USER_AGENT'] = Opera or IE or Firefox or safari or ..){
LogIt
}else{
NoLog
}

AcsCh




msg:1265119
 10:03 am on Nov 5, 2005 (gmt 0)

I have a stats package but I'd like this count just so that I know which is the most popular page when I'm reading my raw data.

Anyway I suggest not to go for perfection when stats are concerned. Even ignoring the bots, the stats will probably be pretty accurate for your above question.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved