homepage Welcome to WebmasterWorld Guest from 54.196.62.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Wikimpress - dumbest scraper project ever
DeeCee



 
Msg#: 4429323 posted 1:34 am on Mar 15, 2012 (gmt 0)

I just started being scraped by a bot with this Agent string:

"Mozilla/5.0 (compatible; U; Linux i686 (x86_64); de-DE; <a href=http://wikimpress.org/>Wikimpress</a>) Wikimpress/1.0

Checked out the wikimpress.org site. This what I found out. Wikimpress.org is a totally foolish, illegal content scraper project, that per the German site's description plan to gather the whole WWW and especially social media related content into a single Wiki.

They state it is a commercial project (+ they plan to show ads) and all content will be released under Creative Commons license. (Huh? Steal copyrighted content all over the world, and re-release it under CC?)

Among other they also state that

Wikimpress is not related to Wikipedia. But we're using Wikipedia information under the CC-BY-SA license.


They are also scraping WikiPedia, and I see from their "random page" functionality that Wikipedia is already well represented.

Blocked, Blocked, Blocked.

 

Marshall

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4429323 posted 2:38 am on Mar 15, 2012 (gmt 0)

I see they have no category for copyright law.

Marshall

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4429323 posted 9:11 am on Mar 15, 2012 (gmt 0)



Did they come from 188.138.104.220?

Plusserver, Germany
188.138.0.0 - 188.138.127.255
188.138.0.0/17

DeeCee



 
Msg#: 4429323 posted 9:34 am on Mar 15, 2012 (gmt 0)

IP: 188.138.104.220
Pointing to ww7.netznutz.net, which might indicate that there is more than the one IP I have seen so far.

NetzNutz GmbH (translation: Net Nuts) is the supposed company that owns this site among a long list of other domains. Many of them leading to a for sale page, but other domains with various junk content.

Don't know if there is on useful site among them, but I did not see anything not from somewhere else.
Examples: a CD sales affiliate site (of the clone type), a CD Wiki site with imported information about CDs, the Wikimpress site, a site with some air pictures of Germany, and similar stuff.

Basically just random content places to either gather ads or advertise the domains for sale, but without having to create any content himself.

Similar to how the Wikimpress site is planned to gather the pages from all our sites (the whole WWWW and Social media. :) ) And then per the Wikimpress site's description along the way put ads around it all.

MxAngel



 
Msg#: 4429323 posted 9:32 am on Mar 16, 2012 (gmt 0)

I'm blocking the whole CIDR range.

Another nasty content theft (scraping) and image theft / hotlinking site is www.thesearchengine.net - 188.138.118.19 which I caught in my hotlinking script.

The IP is cited in malware reports too:

[malc0de.com...]
[threatexpert.com...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved