Forum Moderators: coopster
Right, i have a problem where it is hard to get a spider into my site.. I will explain..!
My site is 100% dynamic to the clients computer. It will resize all the text, tables, images and margins to fit the clients screen perfectly..
But this comes at a cost. My first page is javascript to pick up the clients screen resolution which is passed through either a cookie or through the URL.
So because spiders on the web dont seam to have a resolution or even a screen they get thrown into an error page.
So i have created this bit of coding:
<?
//get browser type
$browser = $_SERVER['HTTP_USER_AGENT'];
$browser = strtolower($browser);
//Set vars
$counter = 0;
$size = 0;
$pos = "";
//Set up bot names in lower case
$bots[0] = strtolower("googlebot");//Google
$bots[1] = strtolower("msnbot");//Msn
$bots[2] = strtolower("ask jeeves");//Ask Jeeves
$bots[3] = strtolower("askjeeves");//Ask Jeeves
$bots[4] = strtolower("teoma");//Ask Jeeves
$bots[5] = strtolower("jeeves");//Ask Jeeves
$bots[6] = strtolower("yahoo");//Yahoo (just incase)
$bots[7] = strtolower("slurp");//Yahoo bot
$bots[8] = strtolower("dogpile");//Digpile (just incase)
$bots[9] = strtolower("marvin");//Marvin bot
$bots[10] = strtolower("searchit");//Search IT
$bots[11] = strtolower("libwww");//Lib Perl
$bots[12] = strtolower("scrubby");//Scrubby..!
//Set array size var
$size = count($bots);
//Search for items
while($counter!= $size)
{
$pos = strpos($browser, $bots[$counter]);
if ($pos > "")
{
if (getenv("HTTP_CLIENT_IP"))
{
$ip = getenv("HTTP_CLIENT_IP");
}
else if(getenv("HTTP_X_FORWARDED_FOR"))
{
$ip = getenv("HTTP_X_FORWARDED_FOR");
}
else if(getenv("REMOTE_ADDR"))
{
$ip = getenv("REMOTE_ADDR");
}
else
{
$ip = $_SERVER['REMOTE_ADDR'];
}
$browser= $_SERVER['HTTP_USER_AGENT'];
$remote = gethostbyaddr($ip).";".gethostbyname($ip);
$date = date(d)."/".date(m)."/".date(y);
$time = date(H).";".date(i).";".date(s);
$log = "\n^".$date.":".$time.":".$ip.":".$remote.":".$browser;
$file = fopen("../traffic/botlist.txt","a");
fwrite($file, $log);
fclose($file);
header("Location: http://example/main.php?width=1024&height=768");
die();
}
else
{
$counter++;
}
}
//Write to logs
if (getenv("HTTP_CLIENT_IP"))
{
$ip = getenv("HTTP_CLIENT_IP");
}
else if(getenv("HTTP_X_FORWARDED_FOR"))
{
$ip = getenv("HTTP_X_FORWARDED_FOR");
}
else if(getenv("REMOTE_ADDR"))
{
$ip = getenv("REMOTE_ADDR");
}
else
{
$ip = $_SERVER['REMOTE_ADDR'];
}
$browser= $_SERVER['HTTP_USER_AGENT'];
$remote = gethostbyaddr($ip).";".gethostbyname($ip);
$date = date(d)."/".date(m)."/".date(y);
$time = date(H).";".date(i).";".date(s);
$log = "\n^".$date.":".$time.":".$ip.":".$remote.":".$browser;
$file = fopen("../traffic/frontlog.txt","a");
fwrite($file, $log);
fclose($file);
?>
Sorry that is a whole load of coding but i feel you need to see all of it. Ok what it is doing is it looks at the info in $_SERVER["HTTP_USER_AGEnt"] and sees if a name is within it. Well in theory..!
If the name matches a name in the array it gives the bot an resolution in the URL to fool the site and let it in.
Problem 1.
All the bots i have seen have only fit the front page
Problem 2.
When a bot comes my script hasnt wrote anything to the botlist.txt file.
I tested this code by adding a bot called firefox (most internet browser) and it logged me in the botlist.txt.
So why doesnt it work?
Sorry this is alot of info but i need help :(
[edited by: coopster at 3:30 am (utc) on April 15, 2005]
[edit reason] generalized url [/edit]
EG Home page with some content, then two links.
1. View with custom sizing.
2. View as static content.
This should solve the bot problems, and you might have people who want to view content their way, not in whatever format you have decided to give them, so they could also use the second link also.
(If you have everything set the way you say, you should be able to enter a default of 800x600 for the "static" content.)
So when the web page comes to resize its self and there is no information it could check the $_SERVER["HTTP_USER_AGENT"] for any bots. This way i can set a default in there and all the pages wont need an?width=800&height=600 on the URL.
I dont think there is any need for an mod rewrite unless i can get it to do:
www.example.com/800/600/index.php
or something like that..! Is that possible? Iv never used it
Then when it goes onto the main site it also generates what looks like a static page but a script detects that it is a bot and give it a default screen setting of 1024 by 768 screen resolution.
This seams good for keeping any bots i dont like out because i have to list them for them to be able to come in.