Forum Moderators: Robert Charlton & goodroi
Massive jumps in GSC legacy crawl errors - who sees this?
...It's as if it's performing an exhaustive and historical update of its link graphs.Simon, that's probably a good description of what's happening. Google does this periodically. Conceivably, with a Penguin announcement in the works, they're trying to establish some sort of clean reference point. I've observed that such crawls often happen at times of big changes. This thread goes over a bunch of possibilities...
... it means that Google has a list of every URL they ever crawled and occasionally they look again, even years later. They do that kind of "historical crawling" on various cycles and I do see them doing that in recent days....
I'm interested to know if the sites seeing this are a purely random choice, or if they are sites Google has identified are in contention for Penguin (either recovery, or hit, or simply testing).Simon, I coincidentally was wondering the same thing about your site(s) and what profile you might be fitting. You mentioned that you'd beem "under both Penguin and Panda, but been clean for years"... but since this is a time trip back to old data sets, perhaps with a comparison to your present status, I'd think they be happy to zero out all old transgressions whenever data supports that.
If your seeing a large number of 404's somebody didn't do their job.
NickMNS wrote
I have trimmed millions of pages
You mention that Google would be happiest if all disavowed inlinks were now 404 or 410s. Do you mean the source of those links or do you mean the destination pages on our site? Because, as per above, the pages containing those links are now 404/410 as the sites don't exist any more, but the destination pages on our site do still exist. I don't think that should be an issue, do you?Simon, yes, posting as I did at roughly 5am my time, I didn't state it very well, but I don't think it should be an issue either. I doubt from what you've described that you have any reasons to worry. I was talking about the source of those questionable links... and that, as a "bookeeping" issue, Google would be happiest if many or all the links that had been disavowed had also been removed. I assume that a shorter list that would require less computation as Google re-evaluates a site's inbound links.
My own speculations here: I'm thinking that the algorithm may be highly "recursive"... with the same or related processes repeated on the results of the previous operations, giving us results that are increasingly refined. There's likely a pause to check results at every step, so Google can gauge whether the algorithm is working as anticipated and decide what to do next.After a deep crawl would be a good time to pause and evaluate.
HELP! MY SITE HAS 939 CRAWL ERRORS!1And five more observations worth reading, with links to longer posts worth checking out too.
I see this kind of question several times a week; you're not alone - many websites have crawl errors.
1) 404 errors on invalid URLs do not harm your site's indexing or ranking in any way. It doesn't matter if there are 100 or 10 million, they won't harm your site's ranking. [googlewebmastercentral.blogspot.ch...]
2) In some cases, crawl errors may come from a legitimate structural issue within your website or CMS. How you tell? Double-check the origin of the crawl error. If there's a broken link on your site, in your page's static HTML, then that's always worth fixing. (thanks +Martino Mosna)