homepage Welcome to WebmasterWorld Guest from 54.145.183.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Spiders messing with my stats
Spiders messing with my stats
PumpkinHead

10+ Year Member



 
Msg#: 10671 posted 3:55 pm on Nov 4, 2005 (gmt 0)

Hi all,

I'm tracking the number of times a page has been viewed by my visitors, using the following code:

<?php
$ip = $_SERVER["REMOTE_ADDR"];
if ($ip == '66.249.66.242')
{ $write_rec = 'N'; }

if ($write_rec!= 'N')
( ** Write record to database ** }
?>

The '66.249.66.242' IP is one of Google's spiders. Obviously this is a bad way of doing things because I would need the IP of every spider to successfully differentiate between a human visitor and a spider.

Whats the best way of doing this?

 

chirp

10+ Year Member



 
Msg#: 10671 posted 4:06 pm on Nov 4, 2005 (gmt 0)

Something like this?

if(!preg_match("/Googlebot/", $_SERVER['HTTP_USER_AGENT'])) { 
# not Googlebot
}

;)

PumpkinHead

10+ Year Member



 
Msg#: 10671 posted 4:18 pm on Nov 4, 2005 (gmt 0)

Hi,

Thanks for the reply. What if it's yahoo bot, msn etc etc. I'm then back in a similar situation or is this the best I can do?

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 10671 posted 4:30 pm on Nov 4, 2005 (gmt 0)

you could have an array of bot names then compare the user agent to the array using in_array()

PumpkinHead

10+ Year Member



 
Msg#: 10671 posted 4:40 pm on Nov 4, 2005 (gmt 0)

Would it be safe to check the HTTP_USER_AGENT and not write a record if any of the following are found:

crawl, bot, slurp, spider, seek, collect, track

I've just been looking through a list of HTTP_USER_AGENT and I think these may work, I realise this isn't going to be perfect though.

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 10671 posted 4:44 pm on Nov 4, 2005 (gmt 0)

could work, seems roughly ok

using user agent is never an exact science

though this makes me think, why are you doing this? Is it to be displayed on the page?

or is it some kind of tracking? if tracking then you would be much better off using a stats package and your raw logs.

PumpkinHead

10+ Year Member



 
Msg#: 10671 posted 5:18 pm on Nov 4, 2005 (gmt 0)

Hi,

It is for tracking but only for my reference. My site is built from a database, so for example I have the following in the database:

widget0001 ¦ widgetinformation ¦ view count
widget0002 ¦ widgetinformation ¦ view count
...
widget9323 ¦ widgetinformation ¦ view count

I have a stats package but I'd like this count just so that I know which is the most popular page when I'm reading my raw data.

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 10671 posted 5:30 pm on Nov 4, 2005 (gmt 0)

you could do some baseline work

insert for each hit and then see what needs to be filtered once you have enough base data

use a table that stores

ip
pagename
user agent

then you can select counts from mysql and start excluding bots that you find in there, you could also maintain a botlist as well to load into your array for comparison and tighten up your pageview counting over time

AcsCh

10+ Year Member



 
Msg#: 10671 posted 9:59 am on Nov 5, 2005 (gmt 0)

Why not filter positive? There are only a few browsers out there, but many bots. So go for

If {$_SERVER['HTTP_USER_AGENT'] = Opera or IE or Firefox or safari or ..){
LogIt
}else{
NoLog
}

AcsCh

10+ Year Member



 
Msg#: 10671 posted 10:03 am on Nov 5, 2005 (gmt 0)

I have a stats package but I'd like this count just so that I know which is the most popular page when I'm reading my raw data.

Anyway I suggest not to go for perfection when stats are concerned. Even ignoring the bots, the stats will probably be pretty accurate for your above question.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved