Ralph_Slate - 2:06 am on Aug 30, 2012 (gmt 0)
WMT tells me 75,756 links to 18,350 of my pages.
The revision pages are "noindex, nofollow" on Wiki, so those aren't inflating things. The foreign language Wikipedia pages make up a number of the duplicates - even though the foreign language pages are not duplicates of each other, they do crib from each other. In the one example I looked at, I could see that the US page on the topic was 3-4 times as long as the German page, and contained different external links (my page was linked on both versions).
Each link to my site from Wikipedia is actually made up of two links - one to the deep page on my site, and the other to the homepage. That is how the Wikipedia template was created.
My site is a trusted reference site in its niche (similar to Internet Movie Database), and the Wiki editors used it to build out that niche on Wikipedia. And now the spammer cloners are using it to build their own sites, and Google is picking up on the spammer site links to my site.
I have 440 links on DMOZ (counted by doing a site: search on Google), and those links are not nofollowed, so when the spammers clone DMOZ, I get dozens of do-follow links from spam sites. DMOZ isn't nearly as much a factor as Wikipedia though - Wikipedia is what the cloners are after, because it has so much text.
Wikipedia are nofollowed, but I have to believe that when google Penguin algorithm looks at a link profile that has 75% nofollow coming from spam sites, the algorithm likely flags that - because Google is not known for ignoring data, they are data-greedy. Even though the nofollowed links are not used in building their link graph, they have never said they aren't using them to penalize, have they?