Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Host as IP


dstiles - 12:00 am on Feb 27, 2010 (gmt 0)


I've just noticed several entries in my "trap" logs that show the site's IP instead of its domain name (HTTP_HOST). It appears the actual page is appended to the IP but I'm not certain about that.

All instances I've seen, from two IPs (one italian, one UK) have been blocked: they had no valid headers so were blocked anyway. The UAs are common bad-bot ones:

Mozilla/4.0 (compatible; MSIE 7.0; WIndows NT 6.0)

Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1) Opera 7.01 [en]

My problem is: how does this access get through in the first place? The (IIS) server itself returns a 400 before it ever gets near the page, yet my traps require a page to be accessed before the trap can work.

I've now added a specific trap for this and added environment logging to see what's happening.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4088160.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com