Would it not make sense to add a penalty to that site even if they weren't linked? -- I mean the user experience, according to the numbers, would be horrible. Maybe there is a certain % of total links indexed, divided by WMT 404's, that would trigger a penalty?
They wouldn't need to "penalize", because the site would lose every bit of every ranking factor that passes through any inbound link to any of those page *and* from all of those pages to the other pages still on the site since 404s don't "pass weight" anywhere.
The loss of ranking factors passed would "effectively behave like a penalty", but they don't need to "punish" the site with the 404s more than to simply not pass the ranking factors it had through inbound links and not show the 404 error pages from it in the index.
Here's another way to look at it: If I put up 500,000 links all to different pages that didn't exist on *your site* would that somehow cause a bad visitor experience for people who found a page from your site in Google and then visited it? Not at all.
My linking to 404 error pages on *your* site [or any site]
would cause a bad visitor experience relating to *my* site, because I didn't maintain the links and make sure they ended with what the visitor thought they were getting. If *my* site were removed from the rankings, that "bad visitor experience" wouldn't happen. Conversely, removing the available pages [200 OK]
on your site from the rankings wouldn't solve that or any other problem related to the 404 errors on your site at all.
What would cause a bad visitor experience on *your site* is if the links to the 404 pages were on *your site* rather than mine. In that case, as Google, to make sure you send your visitors to a "good experience/destination" removing *your site* makes sense.
Which leads to [rhetorically]
: Why would Google say 404 errors are not a problem unless there are links to those pages on your site?
Could Hummingbird be taking into consideration more information from WMT then the algo before it?
You're "going the wrong way" with the direction of information here -- The algo doesn't take information like 404 errors or pages containing links to your site and things along those lines *from* WMT, the algo provides the information *to* WMT, which is why WMT if often out of date in the info it show webmasters about a site.