Forum Moderators: coopster

Message Too Old, No Replies

How to check whether it's a spider or not via PHP?

         

irock

4:02 pm on Aug 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know there's a server variable that can check whether or not the user agent is a spider or not... (at least I want to check the most common legit spiders) Do you know the variable name?

Thanks!

jatar_k

6:46 pm on Aug 23, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Is it one of the ones in the $_SERVER array?
[ca.php.net...]

httpwebwitch

7:49 pm on Aug 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The property you're looking for is $HTTP_USER_AGENT

Here's some code that detects Googlebot and sends an e-mail to let you know you've been crawled.

<?php
// send an e-mail if google crawls this page
if(eregi("googlebot",$HTTP_USER_AGENT)){
// to test this script, change "googlebot" to "mozilla"
if ($QUERY_STRING!= ""){
$url = "http://".$SERVER_NAME.$PHP_SELF.'?'.$QUERY_STRING;
}else{
$url = "http://".$SERVER_NAME.$PHP_SELF;
}
$today = date("F j, Y, g:i a");
mail("you@yourdomain.com", "Googlebot detected on [$SERVER_NAME",...] "$today - Google crawled $url");
}
?>

adamas

8:49 am on Sep 3, 2003 (gmt 0)

10+ Year Member



$HTTP_USER_AGENT may not be available depending on your php settings.

Use $_SERVER['HTTP_USER_AGENT']

[edit - and the same for the other server variables!]

ukgimp

8:52 am on Sep 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This I found yesturday. Our man Nick W wrote it but I cat fine the post so:

/* Use this to start a session only if the UA is *not* at search engine
to avoid duplicate content issues with url propagation of SID's */

$searchengines=array("Google", "Fast", "Slurp", "Ink", "ia_archiver", "Atomz", "Scooter");
$is_search_engine=0;
foreach($searchengines as $key => $val) {
if(strstr("$HTTP_USER_AGENT", $val)) {
$is_search_engine++;
}
}

if($is_search_engine==0) { // not a search engine

/* You can put anything in here that needs to be
hidden from searchengines */
session_start();

} else { // Is a search engine

/* Put anything you want only for searchengines in here */

}

Nick_W

8:57 am on Sep 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, that works fine if you just need to tell for fairly non-essential reasons (such as not showing an SID in the url) - You should probably change $HTTP_USER_AGENT for $_SERVER['HTTP_USER_AGENT'] though....

If it's for more critical reasons, that function is NOT good enough.

Nick