Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Quick primer on identifying bot activity.
Ocean10000 - 7:52 pm on Mar 29, 2008 (gmt 0)Check to see if the User-Agent contains one of the following terms so it is possible to flag the User-Agent as a possible mobile phone browser.
Check to see if the "x-wap-profile" header is present, this is another header which is often sent with mobile phone browsers. If present it is usually safe to assume and flag the browser as a mobile browser of some type.
- "Windows CE"
- "Symbian OS"
The value of this header is usually a URL pointing to an xml file describing the supported features of the browser and phone. Check if the "Accept" header is present, and make a note of this for later. Check if the browser is identified as one of the following browsers "IE", "Opera", "Firefox" and is missing the "Accept" header. There are two options that one could take in this case they are listed below. If checking all non-bot User-Agents excluding mobiles, it is advised to use the first method just in case.
- One would be show it a captcha page where a human may continue on but a bot would get stuck. Mark the IP and User-Agent as being giving a captcha check and note if they answer it properly.
- Mark the IP and block them from accessing the site. And send them a 403 status error and no further content.
Most major web browsers will send this header along with the request, which tells the web servers what the browser can accept. I have only listed the major browser providers but this is usually safe for all known browser except for a few mobile browsers which is why the mobile browser checks are in place earlier.
Thread source:: http://www.webmasterworld.com/search_engine_spiders/3613998.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com