Forum Moderators: phranque
I haven't dealt with this problem myself, so I can be of little help. However, you may have be able to construct some tools to help yourself with this problem.
As I understand it, there is one misbehaving client behind a proxy that it shares with other desirable visitors. The undesirable client always requests the same page.
If this is the case, here's what I would do:
Redirect all visitors requesting that page from that IP address to a simple script, wriiten in PERL, for example.
In that script query and log the following information from your server variables:
TIME (to correlate with main logs)
HTTP_USER_AGENT
HTTP_FORWARDED
HTTP_X_FORWARDED_FROM
HTTP_X_FORWARDED_FOR
HTTP_VIA
CLIENT_IP
HTTP_FROM
HTTP_USER_AGENT_VIA
After logging that info, use the script either to output the requested page, or to send a redirect to the page under a new name.
If you're not familiar with PERL scripts, have a look at this one as an example of how to grab server variables, write records to a file, and output a response from a script.
Once you have looged at least one hit from the unwelcome visitor, check the 'special' log to see if any of the variables listed above return any distinguishing information. Then, check to see if those variables are supported by mod_rewrite, and can be used to block that visitor. If so, then you can implement the block in mod_rewrite, and if not, you can re-design the script itself to do the blocking.
I would like to investigate the use and 'heirarchy' (if any) of the above HTTP request header fields, but I haven't had the time. I also don't have a well-defined test case like you do. If the above information is useful to you, and you find out anything interesting about this subject, I'd love to hear back.
Obviously, discussing all of the topics surrounding this subject would take me all night. I suggest you post to the forums in order to get more input from other members, and also so that the discussion can benefit the other users.
I hope this helps!
Jim
---------------------
My latest response:
I suppose it could be someone behind the proxy with a screen set to update every 4 minutes day and night. But, actually I think it is more the PROXY ITSELF, since it is always the same IP and "typical caching-proxy" signature and other users come through randomly.
Here are some lines from the middle of our access logs with the extraneous stuff removed. Note how prior to and at the start of this snipet it was accessing the page about once an hour. After the random access it goes into the 4 minute mode for the rest of the month: (IP has been changed to protect the innocent?)
0.0.0.0 - - [19/May/2003:17:22:54 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [19/May/2003:18:16:53 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [19/May/2003:19:18:36 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [19/May/2003:20:36:55 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [19/May/2003:22:23:21 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:00:45:57 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:03:56:45 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:13:56:39 -0700] "GET /brett/092902la.htm HTTP/1.0" 200 11773 "http://search.msn.com/results.asp?RS=CHECKED&FORM=MSNH&v=1&q=Las+Vegas+Lava+Hula" "Mozilla/4.0 (compatible;
0.0.0.0 - - [20/May/2003:13:56:40 -0700] "GET /brett/logo4a.gif HTTP/1.0" 304 - "/brett/092902la.htm" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
0.0.0.0 - - [20/May/2003:19:45:54 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:19:54:15 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:19:58:16 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - {20/May/2003:20:02:17 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:06:18 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:10:19 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:14:20 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:18:21 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:22:22 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:26:23 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:30:24 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:34:25 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:38:26 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:42:27 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
0.0.0.0 - - [20/May/2003:20:50:29 -0700] "GET /showsh.htm HTTP/1.0" 200 31464 "-" "Mozilla/3.01 (compatible;)"
...
I've even got the following in the header of the page in question so it is not supposed to be trying to cache at all:
<META HTTP-EQUIV="expires" CONTENT="0">
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
<META HTTP-EQUIV="Cache-Control" CONTENT="no-cache">
Does anyone else have a way of dealing with this?