Forum Moderators: open
What I do know is that this version of WordPress was hacked with a backdoor, and that all requests alternate between a GET for a particular blog entry, followed by a HEAD request for the same page. I am currently blocking this user agent, but would like to know if anybody can give me more information about it. Could it be an actual RSS feed gatherer, for including my pages in their newsfeed? If so, they are going after the wrong pages, as I already provide atom.xml as a feed. Methinks they are scrapers and are all related, despite a slew of rotating IP addresses.
My Rule:
RewriteCond %{HTTP_USER_AGENT} ^WordPress/2\.1\.1$
RewriteRule .* - [F]
The list of offending IPs is already published on my blog, along with a trace and CIDR for each one. I will tell you that some well documented RBN hosts are involved, but that's all I dare say.
Thanks in advance, Wiz.
WordPress/1.5.2 PHP/4.4.1
WordPress/2.0.1
WordPress/2.0.2
WordPress/2.0.3
WordPress/2.0.4
WordPress/2.0.5
WordPress/2.0.6
WordPress/2.1
WordPress/2.1.1
WordPress/2.1.2
WordPress/2.1.3
WordPress/2.2
WordPress/2.2.1
WordPress/2.2.2
WordPress/2.3.2
WordPress/2.3.3
WordPress/x.x.x.x PHP/4.x.xx
WordPress/MU
WordPress/wordpress-mu-1.0
WordPress/wordpress-mu-1.2.1
WordPress/wordpress-mu-1.2.3-2.2.1
WordPress/wordpress-mu-1.2.5
Knowing all these exist, I'd probably change that rule to be more generic:
RewriteCond %{HTTP_USER_AGENT} ^WordPress/
Most of the activity looks like it's probably a scraper however some of the hits I logged actually came from the WordPress site itself, so it bears more investigation.
[edited by: incrediBILL at 11:01 pm (utc) on Mar. 26, 2008]
I'm definitely seeing two distinctly different behaviors for this user agent, one that screams scrapers and the other that appears to be actual Wordpress accesses.
However, how hard would it be to add a "HEAD" request following each "GET" request to a scraper script just to pretend to be Wordpress and throw suspicious webmasters off guard?
It's quite trivial to do and would take about 5 minutes or less to accomplish.
The short answer is if you block "WordPress/" the problem goes away altogether.
BTW, have you done a search for your headlines or content to see if it appears on any WordPress blogs? More importantly, use Google's translator and try searching for your headlines in Russian or some other language as I hear translated scraping is all the rage for "free content" these days.
[edited by: incrediBILL at 2:18 am (utc) on Mar. 27, 2008]