Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: phranque
The machine's IP address was 126.96.36.199 which is within Xerox's network. At first, access seemed to be normal - and possibly/probably that of a proxy server (though there had originally been multiple IPs, that slowly dwindled down to that one IP with normal looking, proxy looking traffic). Then, suddenly the server tried grabbing every page and eventually ended up in a loop on the same page over and over again.
The oddity - besides that Xerox would (for days) run a proxy server that was gulping down the same page at a pretty fast rate, over and over again - was that it got caught on the same page going through the same section of our site that someone else who was trying to mirror our site got stuck on. Perhaps that is only coincidence, but it is a weird one if so. There are no page errors, or looping scripts or links - just seems to be bad software on their end.
The next odd thing was their response...
The requests were the result of a bug in a web proxy server. The proxy administrators are now aware of the problem, but I'm not sure how well they will be able to prevent errant requests, until a fix arrives for the proxy server bug. The proxy serves tbousands of Xerox client machines, and preventing errant requests might equal blocking your site from all Xerox hosts.
That makes no sense to me. Preventing their proxy server from re-retreiving the same page hundreds of times an hour might mean blocking our site? That doesnt sound like a fix - but more like "we dont care, we'll just block you - so suffer or be blocked".
And the other part that doesnt make sense in the response is, why not backlevel the code to the version that was working for the weeks before the issue arose?
If there is anyone who has seen similar issues with IP address 188.8.131.52 or others in Xerox's IP range (184.108.40.206 Subnet 255.255.0.0) and has any idea what's up or if their network admins are just this incompetent, please enlighten me with anything you can... this one makes me very curious.
Actually, that's one of the better tech support responses to a problem I've seen. It actually makes sense, and there is nothing in there that sounds "made-up." Rather than seeing incompetence, it appears to me that you got the one support tech on earth that knows anything and is willing to communicate with you.
They've got a bug in their proxy, they've requested support from the vendor, and there isn't much they can do without shutting down their corporation until the bug fix arrives. It is likely they do not know when the bug was introduced, and maybe they can't find out, so why would they risk a roll back?
It's a nasty problem, but there's nothing they can do about it until the vendor provides a patch - Much like vulnerabilities in a certain company's servers that allowed major internet worms to spread for the past few years... :)
I've got nothing to do with this company, but it really does impress me as a particularly clear piece of work from their tech.
You might consider blocking their proxy, though.
Thanks for the response. The only issues I still see are as follows (and sorry I didnt make it clear in my post), but (1) I provided them with enough info so they knew when it happened, and (2) their proxy server ceased retrieving any valid pages on our site - so I can't see how it would be useful to any of their "thousands" of users at all unless it was sending their users old pages from caches the server couldnt update.
And then there was the one oddity. You see, we have an individual in New York who has been trying to mirror the New York section of our site for quite some time - and his bot gets stuck on Rochester (I believe - off the top of my head). Xerox's proxy server is stuck in the exact same place, acting exactly the same, and making the exact same requests up till that point. It seems to me more like this person (since their access from their Internet account is blocked) is now using Xerox's proxy server to attempt mirroring again.
Thus all the questions... hoping to confirm that the IP is indeed managed by Xerox, and as mr farrar stated, for Xerox employees, and that it wasnt something else, like a high speed customer on a re-sold line trying to mirror our site. (ie: If it's Xerox, and a bad Xerox proxy server then fine, once it's fixed, it'll be gladly unblocked - but if it's not, then I'd like to try to get this behaviour stopped).
And yes, I did indeed block them. :-( With the size of their pipe and the sites of ours they were accessing in such manner, I figured it a good idea until they fix it.
Thanks again for the response, and any other info you or anyone else may have confirming the veracity of this being a Xerox server (no answer from their abuse, web or network email addresses, btw).
Since that IP range is listed as Direct Assignment, it's unlikely that they are re-selling any protion of it for public use.
It wouldn't surprise me if they had a problem tracking this down. I'm sure that like most corporations, their priorities are to watch for incoming hack attempts and to monitor their employees outgoing requests to keep an eye on them. They would be looking at incoming traffic, and at traffic headed out onto the 'net - traffic that is on the "inside" (corporate intranet side) of the caching proxy server. They have to monitor on the inside of a caching proxy, because otherwise they would never see repeat outgoing requests for resources which are already cached. So, their visibility of what the proxy itself is requesting from the 'net may be weak.
Some proxies will "follow links" and prefetch pages and resources from a site, especially if that site is "popular" with their users. It may just be some weird bug in the code that makes it lose its mind with certain URLs, file sizes, e-tags, expiry times, or other conditions, and that your site happens to trigger the problem.
I certainly hope the proxy server vendor gets this cleared up soon - it sounds like a really nasty problem for the sites it accesses when it goes crazy.
I wonder if it might also be caused by an open proxy. I often see requests for URLs like
[mysite.com...] in my logs. I suppose that might work in some cases. In which case, this could indeed be your friend the mirror-man trying to back-door through Xerox.