Whatever it is, it's virulent, and cancerous. I first started seeing the fake URI=REF in early 2008 on one site with two message boards, on just one board. Since then it's 'expanded' to regular .html pages in almost every directory, and from just a few IPs in as many weeks to scores of IPs and hundreds of URI=REF hits every single day.
There are two reasons why I've not brought it up here:
1.) The less bot-runners know how we spot them, the better. Plus the program, whatever it is (and I'm not sure it's just one) is more spam harvester/comment spammer/whatever than a search engine spider/bot per se.
2.) There's not one bot or bot-runner to block. The compromised IPs -- dynamic AND fixed -- are bot-running real and fake UAs (some just a few; some many scores) as indicated by Project Honey Pot (PHP) [
projecthoneypot.org...] and Stop Forum Spam [
stopforumspam.com...] records.
For example:
PHP Threat Level 42: [
projecthoneypot.org...]
PHP Threat Level 48 (5 wks ago: 47): [
projecthoneypot.org...]
Because of #1, I'm reluctant to go into any further detail about how it works on-site. Suffice it to say that whatever the malignancy, it's tricky. I use mod_rewrite to combat the patterns and .htaccess (& a firewall) to combat the worst sources -- China; Russia; Ukraine; but yep, no one's immune.
If you're seeing it, good luck stopping it. I've had my fingers in the dike for years to no avail.
On a pattern-related note:
The ever-egregious TalkTalk [
google.com...] employs the same URI=REF pattern with robots.txt and .html files. But for robots.txt, all are blocked.
FWIW...
It gives me the willies to see so many compromised machines marauding around, seemingly unchecked, independent of the usual exploits. Wherefore art thou "security" companies? Or perhaps 'It' has a name by now?