homepage Welcome to WebmasterWorld Guest from 54.205.247.203
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 40 message thread spans 2 pages: 40 ( [1] 2 > >     
the latest impenetrable disguise
lucy24




msg:4538509
 2:37 am on Jan 23, 2013 (gmt 0)

Some of youse may have seen this before. It's a new one on me.

83.23.202.99 - - [22/Jan/2013:12:30:44 -0800] "GET / HTTP/1.1" 200 2526 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:30:39"
83.23.202.99 - - [22/Jan/2013:12:30:44 -0800] "GET /wp-login.php?action=register HTTP/1.1" 403 928 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:30:40"
83.23.202.99 - - [22/Jan/2013:12:30:45 -0800] "GET /register.php HTTP/1.1" 403 928 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:30:41"
83.23.202.99 - - [22/Jan/2013:12:30:45 -0800] "GET /admin.php HTTP/1.1" 403 928 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:30:41"
<snip, snip for a total of 15 requests>
83.23.202.99 - - [22/Jan/2013:12:47:57 -0800] "GET /add HTTP/1.1" 404 912 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:47:52"
83.23.202.99 - - [22/Jan/2013:12:47:57 -0800] "GET /otwarty_admin/ HTTP/1.1" 404 912 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:47:53"


Nifty huh? Just tack your current server time onto the end of your UA string and nobody will ever be able to block you.

Unless, of course, they've already got an IP block on eastern European robots whose clock is four seconds slow. Or a UA block on anything ending in \d\d:\d\d:\d\d. Or a block on external requests for php files. Or...

83.0.0.0/11 is Poland. Unless someone has evidence to the contrary, I'm going to assume it's all servers. I've previously met 83.9.something with the same owner, though each piece will only admit to /13 of the full range.

 

incrediBILL




msg:4538522
 3:25 am on Jan 23, 2013 (gmt 0)

Just tack your current server time onto the end of your UA string and nobody will ever be able to block you.


That depends on your blocking methodology.

I'd suspect their headers were flawed which would trap them regardless of the user agent. Did you collect those? Would be interesting to see.

if you have a good user agent filter, it should kick out any user agent that has all that extra stuff for evaluation.

Example, a prototype Firefox regex in PHP:
/^Mozilla\/(?P<mozversion>(?P<mozmajor>\d{1,2})\.(?P<mozminor>\d{1,2})) \((?P<platform>X11|Maemo|Macintosh|Windows NT \d{1,2}\.\d{1,2}|Android);(?:.*)\) Gecko\/\d{1,10}(\.\d{1,2}){0,3} Firefox\/(?P<foxversion>\d{1,2}(\.\d{1,2}){0,3})$/

If it doesn't pass the simple regex check above, it falls into a routine that examines all the extraneous "stuff" to see if it's something known. If unknown, quarantined until it can be evaluated.

FWIW, the regex for Safari/Chrome/Firefox are almost all identical which makes MSIE the odd browser out in this process.

In case you're interested, the prototype MSIE regex I'm currently testing is:
/^Mozilla\/(?P<mozversion>(?P<mozmajor>\d{1,2})\.(?P<mozminor>\d{1,2})) \(compatible; MSIE (?P<version>(?P<major>\d{1,2})\.(?P<minor>\d{1,2})); Windows (?P<windows>(XP|CE|95|98; Win 9x 4\.90|98|NT (?P<winversion>(?P<winmajor>\d{1,2})\.(?P<winminor>\d{1,2}))))(?:;|\))(?P<fragment1>.*?)(?:\)|$)(?P<fragment2>.*$)/

I do version validations outside of the regex because the regex just gets way too complicated if you attempt to put all the rules in there, or you end up with a big pile of regex's to cover all the variances. Having one regex that validates the format and extracts all the other information works best for me.

lucy24




msg:4538530
 3:59 am on Jan 23, 2013 (gmt 0)

FWIW, the regex for Safari/Chrome/Firefox are almost all identical which makes MSIE the odd browser out in this process.

The one that really sticks out for me is Opera, because it doesn't start with Mozilla-something the way God intended human UAs to start ;)

What does this construction
?P<mozversion>
mean? (Keeping in mind that I don't speak php.)

Did you collect those?

Shared hosting, remember. Between the requests for nonexistent files and the requests for php files-- also nonexistent-- the only thing they managed to pick up was a couple copies of the front page, which is no skin off my nose.

keyplyr




msg:4538558
 4:52 am on Jan 23, 2013 (gmt 0)





Lots of nefarious agents coming from Poland and the Czech Republic.

incrediBILL




msg:4538583
 6:15 am on Jan 23, 2013 (gmt 0)

?P<mozversion>


That's how you name the variable in the regex instead of using $1 or $2 it can be referenced as $mozversion. There are ways of doing that in other regex engines as well.

Shared hosting, remember.


That's not a reason to not log headers.

Save the following as "logheaders.php"
<?php

$ip = get_server('REMOTE_ADDR');

$fh = fopen("headers-". date('Ymd') . ".log","a");
fwrite($fh, "IP: $ip\n");
foreach (getallheaders() as $name => $value) {
fwrite($fh, "$name: $value\n");
}
fwrite($fh, "----\n\n");
fclose($fh);

?>


To test it just run logheaders.php directly and then look at the contents of the headers-YYYYMMDD.log file it created.

You can do logging per PHP file by adding 'include_once("logheaders.php");' at the beginning of your .php files, after the <?php of course, and it will create a file called "headers-YYYYMMDD.log" and log all the headers.

Or do all files in one shot using .htaccess

# Now log all headers for static web pages
php_value auto_prepend_file "/var/www/vhosts/example.com/logheaders.php"


If you have static HTML files where you need headers logged, that can be done as well in .htaccess.

# Map all static pages to a PHP handler, add more if needed
AddType application/x-httpd-php .htm .html


Enjoy

not2easy




msg:4538590
 6:58 am on Jan 23, 2013 (gmt 0)

Wow! I can log headers! Thank you, this makes a difference!

# Now log all headers for static web pages
php_value auto_prepend_file "/var/www/vhosts/example.com/logheaders.php"

would be my preferred method because many of the sites I manage use both Wordpress and static html, some are static only, but with includes for page elements.
This part:
/var/www/vhosts/
can be altered to the actual path to where we put the logheaders.php file, right? - or will this be prepended to files in subdomains as well? Guess testing will tell, these hosting accounts are not all set up the same way.
incrediBILL




msg:4538603
 7:25 am on Jan 23, 2013 (gmt 0)

You can probably use the relative path to logheaders.php in your .htaccess file just fine, I happened to use an explicit path to the file so that's what I know works for sure

lucy24




msg:4538606
 7:59 am on Jan 23, 2013 (gmt 0)

To test it just run logheaders.php directly and then look at the contents of the headers-YYYYMMDD.log file it created.


:: business with Third Site ::

Fatal error: Call to undefined function get_server()

:: detour to php dot net followed by changing line to $_SERVER['REMOTE_ADDR'] ::

Fatal error: Call to undefined function getallheaders()

:: further detour ::

Oh, right. I've been bitten by this one before. It works on MAMP because it's got mod_php, but my real site doesn't (php 5.3, needs 5.4 to work with FastCGI alone, or Apache 2.4 which apparently does some different hanky-panky).

incrediBILL




msg:4538609
 8:30 am on Jan 23, 2013 (gmt 0)

Yup, getallheaders() only works in FastCGI in PHP 5.4 while I was using mod_php 5.x where it worked fine.

Sorry about the missing function, I have it included by default everywhere and missed it when ti did a quick copy/paste of code:

<?php

function get_server($var) {
return isset($_SERVER[$var]) ? $_SERVER[$var] : false;
}

$ip = get_server('REMOTE_ADDR');

$fh = fopen("headers-". date('Ymd') . ".log","a");
fwrite($fh, "IP: $ip\n");
foreach (getallheaders() as $name => $value) {
fwrite($fh, "$name: $value\n");
}
fwrite($fh, "----\n\n");
fclose($fh);

?>


I'll find some replacement code for the getallheaders() that'll work in FastCGI in 5.3, give me a little time and I'll post a function that replaces it when it isn't present.

incrediBILL




msg:4538611
 8:48 am on Jan 23, 2013 (gmt 0)

OK, here's a version that includes an optional getallheaders() function that should run just fine in FastCGI prior to PHP 5.4:

FYI, I added a time stamp for each log entry which wasn't in the original code ;)

<?php

function get_server($var) {
return isset($_SERVER[$var]) ? $_SERVER[$var] : false;
}

if (!function_exists('getallheaders'))
{
function getallheaders()
{
$headers = '';
foreach ($_SERVER as $name => $value)
{
if (substr($name, 0, 5) == 'HTTP_')
{
$headers[str_replace(' ', '-', ucwords(strtolower(str_replace('_', ' ', substr($name, 5)))))] = $value;
}
}
return $headers;
}
}

$ip = get_server('REMOTE_ADDR');

$fh = fopen("headers-". date('Ymd') . ".log","a");
fwrite($fh, date('Y-m-d:') . date("H:i:s\n"));
fwrite($fh, "IP: $ip\n");
foreach (getallheaders() as $name => $value) {
fwrite($fh, "$name: $value\n");
echo "$name: $value<br>";
}
fwrite($fh, "----\n\n");
fclose($fh);

?>


It's easy to forget others aren't running the same environment and have different requirements for PHP which makes it a real PITA for mass market development considering all the various host configurations. Ugh.

lucy24




msg:4538619
 10:09 am on Jan 23, 2013 (gmt 0)

give me a little time

Gosh. I am not accustomed to "a little time" working out to eighteen minutes ;) I was off dealing with the smooth-readers' reports on Travels in North America by Bernhard, Duke of Saxe-Weimar Eisenach. (Really. 1827, two volumes. Don't let the title fool you though; he was just a younger son with time on his hands.)

Yup, that one works-- a little bit too well, as it takes over the display if I run it directly! That's in addition to, not instead of, writing to the file. Same behavior in MAMP and live.

... which is why everyone needs a google-plays-silly-buggers dot com* for online experimenting.

If it's included within some other file will it stay invisible? I think almost all my pages now have a php include. But they're near the end** of their respective files, not at the beginning. Does it matter?


* Not its real name.
** On account of "Have you ever seen a web page before? Really?" et cetera.

keyplyr




msg:4538621
 10:41 am on Jan 23, 2013 (gmt 0)


I think almost all my pages now have a php include.

Lucy, if the host that allows php includes (running PHP 5.2 or earlier) goes to cloud based file serving, in all probability they will upgrade to PHP 5.3 or greater and your php includes will likely break (no matter what type handler you use in htaccess.)

You might consider checking into this, and if needed, changing to virtual includes prior to this event. They are actually much faster and less of a server load.

Ask me how I know :)

not2easy




msg:4538647
 1:44 pm on Jan 23, 2013 (gmt 0)

OK, I see that I have two new "things to do". The servers use php 5.3 (at last check) and not via FastCGI. And I need to look into virtual includes. I have virtual includes for a few perl scripts so I just need to bring my reading up to date.

incrediBILL




msg:4538718
 6:40 pm on Jan 23, 2013 (gmt 0)

OOPS!
I left in a debugging ECHO statement while testing it on FastCGI, remove that and it'll stop that. It was late :)
<?php

function get_server($var) {
return isset($_SERVER[$var]) ? $_SERVER[$var] : false;
}

if (!function_exists('getallheaders'))
{
function getallheaders()
{
$headers = '';
foreach ($_SERVER as $name => $value)
{
if (substr($name, 0, 5) == 'HTTP_')
{
$headers[str_replace(' ', '-', ucwords(strtolower(str_replace('_', ' ', substr($name, 5)))))] = $value;
}
}
return $headers;
}
}

$ip = get_server('REMOTE_ADDR');

$fh = fopen("headers-". date('Ymd') . ".log","a");
fwrite($fh, date('Y-m-d:') . date("H:i:s\n"));
fwrite($fh, "IP: $ip\n");
foreach (getallheaders() as $name => $value) {
fwrite($fh, "$name: $value\n");
}
fwrite($fh, "----\n\n");
fclose($fh);

?>

dstiles




msg:4538758
 8:43 pm on Jan 23, 2013 (gmt 0)

Lucy - I have a LOT of IPs blocked short-term in the 83.2n range but TP-net is a DSL range - the rDNS even includes the term "adsl".

I agree that range is bad but it occasionally throws up valid traffic.

blend27




msg:4538780
 10:54 pm on Jan 23, 2013 (gmt 0)

Similar to what incrediBILL posted but using ColdFusion(CFML) & Java Class for rdns lookup.

<cfsilent>
<cfscript>
function rdnsLookUp(address) {
var iaclass="";
var addr="";
iaclass=CreateObject("java", "java.net.InetAddress");
addr=iaclass.getByName(address);
return addr.getCanonicalHostName();
}
</cfscript>
<cfset x = GetHttpRequestData()>
<cfset rdnsTimeStart = now()>
<cfsavecontent variable="headers"><cfoutput>
#chr(13)##chr(13)#-----------------------------
ip: #cgi.REMOTE_ADDR#
remote host: #rdnsLookUp(cgi.REMOTE_ADDR)# (#DateDiff('s', now(), rdnsTimeStart)#)
time: #now()#
http_content: #x.content#
method: #x.method#
protocol: #x.protocol#
<cfloop collection = "#x.headers#" item = "http_item">
#chr(13)##http_item#: #StructFind(x.headers, http_item)#</cfloop>
</cfoutput></cfsavecontent>
<cffile action="append" addnewline="yes"
output="#headers#"
file="#GetDirectoryFromPath(GetCurrentTemplatePath())#headers.txt">
</cfsilent>

lucy24




msg:4538800
 1:00 am on Jan 24, 2013 (gmt 0)

OOPS!
I left in a debugging ECHO statement while testing it on FastCGI


Make that my oops: If I'd looked closer before cutting-and-pasting-- exactly the way I am always telling people not to do-- I would have seen the "echo" line :(

With the original code, the logfile
fopen("headers-". date('Ymd') . ".log","a");
is created in the directory that contains the phpheaders document. If it's made into an include called by different docs in different places, will there still be all one log file, or a separate one in each relevant directory? Or should I change it to an absolute-URL format?

:: preparing for more poring over php dot net ::



if the host that allows php includes (running PHP 5.2 or earlier) goes to cloud based file serving, in all probability they will upgrade to PHP 5.3 or greater and your php includes will likely break (no matter what type handler you use in htaccess.)

We're currently on
:: shuffling papers ::
php 5.3.6 FastCGI

There's a popup with options but this is the default. The others would seem to be older or slower.

All include files-- both html and php-- are called in the form

<!--#include virtual="/directory/filename.xtn" -->

and then the php ones have a secondary include that goes

include ($_SERVER['DOCUMENT_ROOT'] . "/directory/filename.php");

I don't have any AddHandler or AddType statements currently.

No, wait, I tell a lie. htaccess says

AddType text/html .html

I don't remember when or why I put that in. It seems tautological.

[edited by: lucy24 at 1:08 am (utc) on Jan 24, 2013]

incrediBILL




msg:4538801
 1:06 am on Jan 24, 2013 (gmt 0)

php includes will likely break


Actually, includes won't break, otherwise nothing ever written in PHP would work.

However, "php_value auto_prepend_file" might not work in FastCGI as I'm not sure they use that in .htaccess but a regular include() statement in the actual code that I mentioned will work fine.

keyplyr




msg:4538826
 2:54 am on Jan 24, 2013 (gmt 0)

That's not what I said Bill. Noting to do with FastCGI, and it's not the PHP that breaks, but the cloud based schema that drops PHP globals and includes. Watched it at 3 hosts who recently went to a cloud. YMMV.

not2easy




msg:4538828
 2:57 am on Jan 24, 2013 (gmt 0)

The site I'm using to try this uses plain inline includes and they should work with this file the same as any other php or html .php include. I add menus to pages with something like this:
<?php
include("parts/menu.php");
?>

And that adds a menu to a sidebar div. I could add in the logheaders.php the same way to each page, should this be in the page headers or does it make any difference - or is it better to prepend it to every file via htaccess? It looks like that is what would happen using the line in htaccess.

Sorry if this sounds dense. Until I can see what is going on I am fumbling at how it works.

incrediBILL




msg:4538842
 5:18 am on Jan 24, 2013 (gmt 0)

but the cloud based schema that drops PHP globals and includes


Ah ha, gotcha.

Is the Apache environment getting set correctly?

Sounds like they'll need to fix this as it would break a lot of software.

wilderness




msg:4538867
 8:28 am on Jan 24, 2013 (gmt 0)

but the cloud based schema that drops PHP globals and includes


There must some known vulnerabilities with these cloud changes.

I just had 198 successive requests, which I've never had previously (at least that quantity):

"POST /wp-login.php HTTP/1.1" 403 0 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.3) Gecko/20010801"

Is the Apache environment getting set correctly?

Sounds like they'll need to fix this as it would break a lot of software.


Bill,
I don't believe these hosts care what the consequences are. The only reason the changes are being made is because cloud-hosting is half-the-expense.
The host staff are so overwhelmed with transferring all their customers that they don't even know what day it is.

My informed me that they were going to upgrade the servers after all the transfers were complete. (late this week or early next).

keyplyr




msg:4538881
 9:35 am on Jan 24, 2013 (gmt 0)

Gdaddy (for example) has been moving all sites to cloud computing little by little, upgrading from PHP 5.2 to 5.4 (much more strict.) Several sites I was minding after took a fall. They all parsed HTML for PHP and used a ton of PHP includes as well as a couple older custom shopping carts I wrote using register_globals (I know, I know... a big security hole and I should have fixed them ages ago.)

When I talked to the admins, their categorical answer was they no longer support PHP includes on HTML pages or having register_globals ON in this new version due to security issues. Beyond that, they wouldn't say. Is it the strct version of PHP or merely the fact their file servers now reside in the cloud, or a combination of both? Dunno.

The very same has happened at other shared hosting services. I sometimes work as a gun for hire and I have repaired a lot of broken Order forms or Contact Us forms due to this PHP change with cloud serving.

The FYI is a reality. If your host is planning on migrating your files to cloud computing, it would be prudent to find out exactly what changes will occur. The problem is, Gdaddy told me not to worry that nothing would change - LOL. Spent 3 days fixing all those things that didn't change.

Bottom line - The internet is a dynamic of constant change. Keeping up is almost a full time job. Even worse if you're cheap like me and don't want to pay over $10 a month for hosting :)

not2easy




msg:4539020
 6:39 pm on Jan 24, 2013 (gmt 0)

This migration to Cloud should be its own thread. I had scripts stop working last Sept and in Nov on another host. The fix was to add a line to the directories' htaccess files:
RewriteBase / directoryname

I had over 400,000 pages offline with no errors, no notification until GWT showed thousands of new 404s. The changes are not always global, a host might migrate parts of a domain. When I contacted the host about the problems, they swore nothing had changed. Until I read this I had no clue of what might have caused it. I still have no way to be certain, but I have an IP showing in that account which is supposed to be the "Shared IP" for that account. It is not the same IP I see in the error log for example. I need to check some headers. I'm in the process of moving a domain from that host so maybe my time is better spent just moving it.

incrediBILL




msg:4539025
 7:02 pm on Jan 24, 2013 (gmt 0)

All this shared hosting / cloud hosting issues is exactly why I have my own dedicated servers. I control the environment so stuff like this is never an issue.

Back to the topic of "the latest impenetrable disguise" and I highly encourage a cloud hosting thread somewhere, probably the PHP forum would be best, as this sounds like some serious stuff that could really mess with sites, esp. bot blocking.

FWIW, as a side note, you can never rely on hosts not to mess with their servers, I used to be a host, an was the author of a major ecom package way back, I know all about it unfortunately. For instance, if we (host) got a new control panel update (Plesk/CPanel/etc.) and rolled it out it could break stuff. Likewise, some hosts changed the way the security on the server worked and all the user / group permissions changed and BANG! thousands of ecom customers wouldn't run the next morning.

lucy24




msg:4539138
 2:51 am on Jan 25, 2013 (gmt 0)

All this shared hosting / cloud hosting issues is exactly why I have my own dedicated servers.

Yah, but this is your day job, right? I don't think anyone starts out with their own physical server from Day One. Except for the congenital computer geeks for whom building their own server in the garage is equivalent to "Hey! Let's put on a show in the barn!"

My host's mass mailings make periodic noises about cloud stuff but so far it all seems to be optional extras so I don't pay attention. They did just move a bunch of us to a new server and I'm still waiting for the logs to catch up. There's always one day missing, but never the same day. I had to go over to piwik to see if I'd missed anyone interesting. (Other than, say, the latest incarnation of someone leaving multiple tabs open*, so every time they crash and/or voluntarily restart, one of my pages loads up again. The latest is extra disconcerting because it happens to be a footnote anchor. piwik says so, though logs of course can't.)

Oh, and, ahem, for those who missed it the first time around: The "impenetrable disguise" part was meant to be satirical :P Like those plays where someone is wearing a weeny little black eyemask and not even their nearest and dearest can recognize them.


* With thanks to Leosghost for originally identifying this behavior.

incrediBILL




msg:4539145
 3:16 am on Jan 25, 2013 (gmt 0)

Yah, but this is your day job, right?


Not anymore, well, define job. I'm semi-retired and rent a server(s) pre-installed with a control panel so it's virtually no work for me to host all of my personal domains.
Literally the only thing I have to do is make sure the automatic updates are keeping the OS and control panel updated, that's it.

For the number of domains I have it's actually cheaper to have my own personal servers.

I used to host elsewhere, several elsewhere's for that matter, but I got annoyed because other clients on the servers were always causing problems, email constantly had issues, and when nothing else was going wrong the host would make changes that broke stuff. It was a nightmare.

It also makes it much easier to do bot blocking when you deploy server-wide solutions vs. going account to account.

not2easy




msg:4539163
 6:03 am on Jan 25, 2013 (gmt 0)


It also makes it much easier to do bot blocking when you deploy server-wide solutions vs. going account to account.


This discussion has sent me to look into reseller hosting, both for my own and for domains I manage. Exactly because it allows for more centralized controls. Many clients are small, personal (non commercial, non monetized) domains that can't seem to bother with details very well. There isn't a large amount of money at stake so I don't see a lot of viable options. I am too old to keep running after the same problems in dozens of different places. I figure it has to be better than this.

brotherhood of LAN




msg:4539219
 11:52 am on Jan 25, 2013 (gmt 0)

I have to assume you don't host Wordpress Bill, or perhaps there'd be more maintenance on your server :)

RE: includes messing up the script. I have to agree with Bill about getting a dedicated server, or at least a different host. I can only assume includes were disabled for performance issues or devs were leaving security holes when using user supplied variables to include stuff.

Fact is you could just do $x = file_get_contents('script.php'); eval($x); and it would still work as an include, but much more awkwardly.

Thanks for sharing the regex... is there any thing in particular you'd recommend to check for within the server headers, and do/have you ban based on something within the headers? The only example of me looking at them is for X-Forwarded-For but I've been led to believe that examining all the fingerprints of headers makes us quite unique, like 1 in 100000 uniqueness, which obviously makes it a lot easier to ban agents.

incrediBILL




msg:4539384
 10:16 pm on Jan 25, 2013 (gmt 0)

any thing in particular you'd recommend to check for within the server headers, and do/have you ban based on something within the headers?


Browsers always send the same headers, over and over and over. If one is missing, like Accept Language, it's a bot. There are some fake headers out there that identify some bots, but mostly if it's a real browser it always does the same thing.

Just beware that some hosts modify some Apache header fields so for instance HTTP_CONNECTION which is typically keep-alive is set to close, OOPS! I used to use that as a signal but thanks to some hosts it's not reliable.

Just like everything else, I whitelist headers. I know what browsers do so I whitelist that behavior and if it's something less than all the headers expected I know it's a bot. If it's something extra, it might be a bot but it has to be evaluated.

This 40 message thread spans 2 pages: 40 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved