|Wikimpress - dumbest scraper project ever|
| 1:34 am on Mar 15, 2012 (gmt 0)|
I just started being scraped by a bot with this Agent string:
"Mozilla/5.0 (compatible; U; Linux i686 (x86_64); de-DE; <a href=http://wikimpress.org/>Wikimpress</a>) Wikimpress/1.0
Checked out the wikimpress.org site. This what I found out. Wikimpress.org is a totally foolish, illegal content scraper project, that per the German site's description plan to gather the whole WWW and especially social media related content into a single Wiki.
They state it is a commercial project (+ they plan to show ads) and all content will be released under Creative Commons license. (Huh? Steal copyrighted content all over the world, and re-release it under CC?)
Among other they also state that
|Wikimpress is not related to Wikipedia. But we're using Wikipedia information under the CC-BY-SA license. |
They are also scraping WikiPedia, and I see from their "random page" functionality that Wikipedia is already well represented.
Blocked, Blocked, Blocked.
| 2:38 am on Mar 15, 2012 (gmt 0)|
I see they have no category for copyright law.
| 9:11 am on Mar 15, 2012 (gmt 0)|
Did they come from 184.108.40.206?
220.127.116.11 - 18.104.22.168
| 9:34 am on Mar 15, 2012 (gmt 0)|
Pointing to ww7.netznutz.net, which might indicate that there is more than the one IP I have seen so far.
NetzNutz GmbH (translation: Net Nuts) is the supposed company that owns this site among a long list of other domains. Many of them leading to a for sale page, but other domains with various junk content.
Don't know if there is on useful site among them, but I did not see anything not from somewhere else.
Examples: a CD sales affiliate site (of the clone type), a CD Wiki site with imported information about CDs, the Wikimpress site, a site with some air pictures of Germany, and similar stuff.
Basically just random content places to either gather ads or advertise the domains for sale, but without having to create any content himself.
Similar to how the Wikimpress site is planned to gather the pages from all our sites (the whole WWWW and Social media. :) ) And then per the Wikimpress site's description along the way put ads around it all.
| 9:32 am on Mar 16, 2012 (gmt 0)|
I'm blocking the whole CIDR range.
Another nasty content theft (scraping) and image theft / hotlinking site is www.thesearchengine.net - 22.214.171.124 which I caught in my hotlinking script.
The IP is cited in malware reports too: