Forum Moderators: DixonJones

Message Too Old, No Replies

do spiders leave/send referrers?

writing a stats / referrers script in PHP

         

mincklerstraat

8:41 am on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do spiders leave referrers when they trawl a site? I don't really need to know where they are coming from, but I'd like to know whether once on your site, if they leave referrers from your own site's pages they come from.

I'll be writing a stats / referrers script soon, and need to take spiders and other agents into account. Many stats scripts compare the referrer to the site's uri, and if it doesn't match, the script considers this to be a referrer and logs it - as 'bookmark' if it's blank or something looking like a bookmark, or as the referrer itself if it's a url. If spiders never send referrers, the referrer variable (like in PHP, $_SERVER['HTTP_REFERER']) will either not be set, or else ''. This could amount to LOADS of referers marked as 'bookmark,' which isn't really the point, is it? I'd like to write this stats script without using sessions, since sessions can bork up a spider's crawling.

Any insights on this? And, e.g., anyone know if they don't leave referers, in PHP, if $_SERVER['HTTP_REFERER'] is left unset, or set to '', or is there any other fairly simple distinguishing characteristic to check and script this well?

Thanks!

jatar_k

6:47 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I don't believe that spiders leave referers.

$_SERVER['HTTP_REFERER'] could be either, I believe it is set blank though.

I would always test like so just to be safe.
if ($_SERVER['HTTP_REFERER'] = "" or!isset($_SERVER['HTTP_REFERER']))

As far as assuming that any blank referer is a bookmark, that is totally wrong, as you mentioned. You would be better off just saying "no referer" or not listing them at all.

No sessions makes it tough, though you could write the referrer info to db/file (along with other info) and then read it when you need it. Sessions makes tracking very simple and it gets a lot more confusing without them.

jmccormac

7:29 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Spiders do not generally leave referrers in the logs. What you could do as part of your program is to create a list/array of known spider signatures and grep the referrers for these signatures. Google for example uses "Googlebot/2.1 (+http://www.googlebot.com/bot.html)". It may also be a good thing to correlate these signatures against IPs and or hostnames to filter out spoofed spiders (basically some scumbag using a site trawler program to download your site.).

Regards...jmcc