deadsea - 5:11 pm on Nov 21, 2012 (gmt 0)
In addition to spam sites that go up and down, and change their content constantly, there are also bugs in Googlebot.
First Googlebot will take piece of text that looks like a url and treat it as if it were a link. This is particularly problematic on some search scraper sites. They sometimes have truncated urls in text linked to tracking urls that googlebot can't crawl. Something like <a href="http://track.scraper.example.com/?8373888383">www.mysite.example.com/main_widget_arti..</a> This is a mechanism through which Googlebot finds all these 404 "pages" on my site that were never meant to exist to begin with.
I also recently found a case where a scraper site did link to my site as well as to other sites and Googlebot seemed mash the links together. The site linked to othersite.example.com/big_bad_widgets.html and to mysite.example.com Wouldn't you know that Googlebot started crawling mysite.example.com/big_bad_widgets.html on my site. That url turned up as 404 in my logs (from googlebot) and also on the WMT dashboard. No idea what Googlebot would be doing in that case other than making a mistake.