wikipedia and scrapers

Forum Moderators: phranque

Message Too Old, No Replies

wikipedia and scrapers

is wiki a scraper?

soapystar

9:03 pm on Jun 12, 2006 (gmt 0)

finding more and more pages on wiki that are simply a rehash of artcles elsewhere on the net. So im now wondering..is wiki the worlds biggest scraper site?

Mike12345

3:32 pm on Jun 13, 2006 (gmt 0)

I would say no

Wiki is made up of user contributed information, so what is to stop me contributing to ABC Articles and then also adding my work to Wikipedia?

Are you sure that the sites you are finding didnt scrape from wikipedia? I know of a few sites that use information from wikipedia, both with and without consent.

trillianjedi

3:49 pm on Jun 13, 2006 (gmt 0)

both with and without consent.

You do not need consent to re-publish Wikipedia information - it's public domain under the GFDL.

soapystar

3:55 pm on Jun 13, 2006 (gmt 0)

i am watching the pages for the travel sector...huge chunks are just a rehash from other websites. Sometimes credited sometimes not. I had my own work scraped with no credit back to me. When i added the credit back to my own site it was removed with a label of link spam in the edit history section.

hannamyluv

3:58 pm on Jun 13, 2006 (gmt 0)

You do not need consent to re-publish Wikipedia information

I think the point is that most people don't feel they need to get consent to repuplish ON wikipedia. It is a problem. Well meaning contributors will sometimes republish wholesale things from another person's site. Sure, you can go in and delete it, but I think that most publishers do not want to police wikipedia's results on top of everything else.

trillianjedi

4:00 pm on Jun 13, 2006 (gmt 0)

I had my own work scraped with no credit back to me.

That's different - you should email or write to Wikipedia in the same way that you would ask any other website to c & d.

hannamyluv

4:02 pm on Jun 13, 2006 (gmt 0)

Or I could sue, which is also what you might do when another site copies your stuff. ;)

Wiki is a nice idea, but well meaning or downright nefarious people can and are taking advantage of it to the harm of other sites.

pmkpmk

4:04 pm on Jun 13, 2006 (gmt 0)

Advice to all who find their original content on Wikipedia:

Edit the article
Remove your original content
Save the article along with a one-line explanation that it was scraped content
Go into the Discussion of the article and explain where it was stolen from, that you are the author and that you don't consent
Go into the History of the article and find out which Wikipedia user has inserted your original content
Go to the Users Discussion Page and leave a comment that he/she/it stole cotent from you and that you object.

Wikipedia does not want or even tolerate stolen content.

soapystar

4:06 pm on Jun 13, 2006 (gmt 0)

yes its an option but the actual hassle of doing that is considerable (c&d and sue). In the end i had to leave a note in the history section that my work should not be used without a credit and prompted them to check the wayback machine for who had the information first. Then the information was left off Wiki rather than just credit me for the article.

As for deleting your own work what you find is the next editor just comes along and reverts your edit.

[edited by: soapystar at 4:08 pm (utc) on June 13, 2006]

hannamyluv

4:07 pm on Jun 13, 2006 (gmt 0)

Advice to all who find their original content on Wikipedia

like I said, that's alot of policing and work to protect what is mine already.

Wikipedia does not want or even tolerate stolen content.

Wikipedia may not, but what's to stop a troll like editor who has decided that an area is "their" domain. It is already going the slow road to death like DMOZ.

pmkpmk

4:16 pm on Jun 13, 2006 (gmt 0)

alot of policing and work to protect what is mine already

Welcome to reality! The information age makes it easier, but it's not the cause for IP theft.

what's to stop a troll like editor who has decided that an area is "their" domain

Everybody is a Wikipedia editor (one of the biggest differences to DMOZ). And every editor is the reviewer of every other editor (the second biggest difference). There is a clear escalation process to be taken in case of an "edit war". And as long as the content is by yourself, you are in the best position.

Yes, it is work to protect what is yours. But that is not Wikipedias fault. In contrast, Wikipedia makes it very easy to protect it, compared to the gazillion of scraper sites which you CAN'T control!

hannamyluv

4:24 pm on Jun 13, 2006 (gmt 0)

Everybody is a Wikipedia editor

Ahhh... yes. The same as anyone could sign up to be a DMOZ editor. Fantastic system both until the blush of novelty wears off. Then you are just left with people editing who are only doing so for their own benefit.

I understand that we have recourse, yada yada. What I am saying is that people are abusing the system and it will escalate until wikipedia is just yet another inernet would-have-been-nice-if-people-were-not-such-a$$es footnote.

trillianjedi

4:24 pm on Jun 13, 2006 (gmt 0)

In the end i had to leave a note in the history section that my work should not be used without a credit and prompted them to check the wayback machine for who had the information first.

That's a good way to deal with it. Good post btw pmk with the bullet point directions of how to remove.

WebPsychic

4:26 pm on Jun 13, 2006 (gmt 0)

I think the root "pedia" lends a false credibility to it. I know many teachers in my comunity who don't recommend Wikipedia to the students, and if recommended at all, it's usually garnered with "caution". If you're looking for info on a somewhat generic subject, (rock climbing) it might be ok to use with other online sources.

Before the John Siegenthaler scandal, I think many people didn't realize it was an open source resource. If it's a controversial topic, be prepared to turn on your "garbage alert".

Mike12345

9:11 am on Jun 14, 2006 (gmt 0)

TJ, yes you are quite right, the work is public domain.

However consent is given in accordance with the conditions set out in the GFDL, if those conditions were not adhered to, would that not be a breach of consent?

If not then surely the GFDL is useless?

trillianjedi

9:16 am on Jun 14, 2006 (gmt 0)

if those conditions were not adhered to, would that not be a breach of consent?

Sure. A breach of the "licence", technically.