We've got a medium-size ecommerce site with a few million pages, and an affiliate for one of our competitors has an ongoing routine I can't quite figure out.
The affiliate buys few .info domains per week, creates gibberish subdomains, and sets up bare-bones pages linking to our site on those subdomains. If you try to visit the domain or subdomain home page, you get a "404 Not Found" message. We find these gibberish subdomains linking to us by checking our latest links with GWT, but by the time we find them the links to our site no longer exist on the pages, and it's simply a list of manufacturer item numbers and short descriptions with the affiliate's links to our competitor. GWT is currently showing about 160 of this affiliate's domains pointing to our site.
How is Google even finding these pages, since there are no home pages or site navigation? Perhaps the affiliate deep-links to them from other sites? And why is the affiliate linking to our site? Is it just a temporary thing until the page is indexed so it appears legit, and then he replaces the links to our site with his affiliate links to our competitor?
If this activity doesn't hurt our standing with Google it's not much of a problem, but post-Penguin we certainly have reason to be paranoid about what could appear to be a sketchy backlink scheme. We're continually adding these domains to our disavow file, but it feels like an endless game of whack-a-mole.
I get these all the time too along with the fake search engines. I don't think it's our job to scour the internet looking for these sites. We should not be penalized as we have absolutely nothing to do with them.
My guess is they are crawling your site for content and use it to create their own mini sites. Rank this sites in Google by running extensive backlink campaigns and then place affiliate links to shops they are affiliates with. I think your links are there because the crawler do not remove them.
Also how google discover this sites well my guess is they are promoting this pages, running backlink campaigns etc etc.
I suggest you to check access logs of your site and look for intensive crawling activities, identify the bot and smash it with ban hammer.