Welcome to WebmasterWorld Guest from 3.209.80.87

Forum Moderators: Ocean10000

Message Too Old, No Replies

User Agent: "WordPress/2.1.1" coming from Russia and UK

Is this an RSS aggragator, or a scraper bot?

     
11:43 pm on Mar 25, 2008 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts: 319
votes: 0


All this month I have been blocking and tracking visits from Russia and a particular host in the UK, all with the exact user agent: "WordPress/2.1.1"

What I do know is that this version of WordPress was hacked with a backdoor, and that all requests alternate between a GET for a particular blog entry, followed by a HEAD request for the same page. I am currently blocking this user agent, but would like to know if anybody can give me more information about it. Could it be an actual RSS feed gatherer, for including my pages in their newsfeed? If so, they are going after the wrong pages, as I already provide atom.xml as a feed. Methinks they are scrapers and are all related, despite a slew of rotating IP addresses.

My Rule:

RewriteCond %{HTTP_USER_AGENT} ^WordPress/2\.1\.1$
RewriteRule .* - [F]

The list of offending IPs is already published on my blog, along with a trace and CIDR for each one. I will tell you that some well documented RBN hosts are involved, but that's all I dare say.

Thanks in advance, Wiz.

11:00 pm on Mar 26, 2008 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


I've seen a bunch of WordPress user agents and never knew what they were doing, if it was link checking, aggregating rss, track back pings, being spoofed by a scraper, all of the above or none of the above.

WordPress/1.5.2 PHP/4.4.1
WordPress/2.0.1
WordPress/2.0.2
WordPress/2.0.3
WordPress/2.0.4
WordPress/2.0.5
WordPress/2.0.6
WordPress/2.1
WordPress/2.1.1
WordPress/2.1.2
WordPress/2.1.3
WordPress/2.2
WordPress/2.2.1
WordPress/2.2.2
WordPress/2.3.2
WordPress/2.3.3
WordPress/x.x.x.x PHP/4.x.xx
WordPress/MU
WordPress/wordpress-mu-1.0
WordPress/wordpress-mu-1.2.1
WordPress/wordpress-mu-1.2.3-2.2.1
WordPress/wordpress-mu-1.2.5

Knowing all these exist, I'd probably change that rule to be more generic:

RewriteCond %{HTTP_USER_AGENT} ^WordPress/

Most of the activity looks like it's probably a scraper however some of the hits I logged actually came from the WordPress site itself, so it bears more investigation.

[edited by: incrediBILL at 11:01 pm (utc) on Mar. 26, 2008]

12:17 am on Mar 27, 2008 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


Here's an example from WordPress dot com itself that just happened today:

66.135.48.* "GET /somepage.html HTTP/1.1" 200 1777 "-" "WordPress/MU"
66.135.48.* "HEAD /somepage.html HTTP/1.1" 200 - "-" "WordPress/MU"

This is very consistent with the type of activity you reported.

1:24 am on Mar 27, 2008 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts:319
votes: 0


So, it might just be a blog feed aggregator plug-in or such. But, all of the hits I am logging are based in the former USSR, or come from servers owned by Inter*age (did I obscure than enough?).

Probably nothing much to worry about ;-)

2:13 am on Mar 27, 2008 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


It's possible that the WordPress plug-in, if in fact that's what this is in all cases, could be hosted in Russia or it's hosted elsewhere and using a Russian proxy IP to access your site and throw you off track.

I'm definitely seeing two distinctly different behaviors for this user agent, one that screams scrapers and the other that appears to be actual Wordpress accesses.

However, how hard would it be to add a "HEAD" request following each "GET" request to a scraper script just to pretend to be Wordpress and throw suspicious webmasters off guard?

It's quite trivial to do and would take about 5 minutes or less to accomplish.

The short answer is if you block "WordPress/" the problem goes away altogether.

BTW, have you done a search for your headlines or content to see if it appears on any WordPress blogs? More importantly, use Google's translator and try searching for your headlines in Russian or some other language as I hear translated scraping is all the rage for "free content" these days.

[edited by: incrediBILL at 2:18 am (utc) on Mar. 27, 2008]

2:32 am on Mar 27, 2008 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts:319
votes: 0


Thanks Bill. You Da Man!

As a byproduct of these "scrapes" I have added several new CIDRs to the Russian Blocklist and the Exploited Servers Blocklist (Can I say that?). So, some good has come from it.

Wiz

[edited by: Wizcrafts at 2:33 am (utc) on Mar. 27, 2008]