Forum Moderators: open
Connection: Close
is a valid header that can be returned by Firefox/Opera/IE. I usually only see it done when going though a proxy server or similar service. So I wouldn't block a client just on this alone, but you can use it as a tip off to do more checks.
I better header to check is the "Accept" Header which the major web browsers use, and again some proxy servers will remove it. But its easier to check for common proxy server headers. If the Accept Header is missing its more then likely a Bot or Client coming though a proxy server. FYI Mobile browsers are a crap shoot at including the Accept header.
I use the Accept Header method to remove a lot of spoofing bot traffic on my site. It keeps my bandwidth bill light and scrapers moving on to easier targets.
Ocean.
[edited by: Ocean10000 at 3:22 pm (utc) on Mar. 16, 2008]
I have a problem of spoofing IE7.
Here is the header:
Connection: Close
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Accept-Language: en-us
Host: www.mydomain.com
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322)
UA-CPU: x86
Look pretty right? The problem is, my robot trap will catch this header from lots of different IP on a short period of time, and strange thing is, no matter where the IP from, the Accept-Language is en-us, and the Accept is same means they must has same installed software, and User-Agent also same means they even have same patch!
To odd to not to think it's a faked IE7. Any ideas?
So I think I will block a ip if:
Connection = Close
and Accept = image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
and Accept-Language = en-us
and User-Agent = Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322)
How you think?
I do block based on some invalid accept and connection headers but typically I profile the visitor in real-time and block bot-like behavior such as:
- Fall into my spider trap
- Use invalid SGML/HTML in URIs
- Speed traps to stop accesses too fast for a human
(make sure you disable pre-fetch in htaccess first)
- Ask for robots.txt file
- Doesn't ask for CSS, .js, .gif and .jpg files
- so on and so forth, etc.
FWIW, some of the bots think they're clever and ask for all the files on the first page they access then take off screaming through the site and get immediately busted for speed.
Others are a bit smarter and go a bit slower and ask for hundreds of pages over days.
However, doing a test for humans at the control after certain odd behavior happens when the average number of page views is skewed has so far been the only clear cut way to block the stealth bots.
I don't use the squiggly captchas, I use a combination of real text (for handicapped accessibility) plus javascript tests for actual typing to thwart blow-thru techniques and so far it seems to keep the bots out quite nicely and humans rarely have an issue.
I think the final version is posted here but his host seems to be offline at the moment:
[modem-help.freeserve.co.uk...]
It was there a few days ago, should come back.
My question is, for example, if googlebox was in this trap and will it delete the page already in then database? Or they will keep the old version and try get new stall next time?
(I guess they will keep old one untill another http200, what's your idea, or is there any official declares?
Basically my security goes like this:
1. search engines get a free pass into the site, no further scrutiny
2. anything that's not MSIE/Firefox/Opera gets bounced, almost all other unwanted junk bounces here
3. anything claiming to be MSIE/Firefox/Opera get further scrutiny, such as speed traps, spider traps, bad headers, bad SGML/HTML, etc.