Welcome to WebmasterWorld Guest from 54.145.103.69

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Massive jumps in GSC legacy crawl errors - who sees this?

     
4:14 pm on Sep 7, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


We've seen some massive jumps in legacy crawl errors in Search Console, i.e. pages appearing that haven't existed on the site for months/years and where the 'linked from' list is also only pages that haven't existed for months/years, both on-site and off-site. Crawl errors were sitting at around 1,000, but on the 3rd August, they jumped to 10,000. On 1st September, they jumped to 50,000.

I've never seen Googlebot go this crazy before. It's as if it's performing an exhaustive and historical update of its link graphs.

I know others have seen this too, e.g. @Jhurwith and @BushieTop and others have reported it on multiple forums. Have you seen this on your site? Would be good to try to see what we have in common, or whether this is just random. Our site is UK, ecommerce and under both Penguin and Panda, but been clean for years. Interesting to see the profile of other sites with these crawl errors.
5:40 pm on Sept 7, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 1025
votes: 88


If it was widespread it might signal something, but I'm not seeing it here. Have you made any changes recently that might have caused it?
5:50 pm on Sept 7, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@Wilburforce No. It's definitely unrelated to site changes. And others have reported this sudden jump too, all in the past month. The obvious suggestion is that these are test sites for Penguin (hence the need to exhaustively re-follow all links to the site), but I doubt it's that simple.
7:36 pm on Sept 7, 2016 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:June 26, 2004
posts:379
votes: 33


Yes, we have seen this too. I'm not worried since this is so widespread. I may be worrying more once the new update goes through...
8:35 pm on Sept 7, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@ecommerceprofit Could you please provide some more detail, e.g. date(s) when you saw the large jump in errors, error count before and after, etc?
9:03 pm on Sept 7, 2016 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:June 26, 2004
posts:379
votes: 33


7-22-16 - 19 errors
7-28-16 - 145 errors
8-10-16 - 276 errors
8-28-16 - 763 errors (huge jump starts)
8-31-16 - 2,364 errors (leveling out occurred here...straight line)
9-5-16 - 2,366 errors
10:17 pm on Sept 7, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3766
votes: 208


I've seen a recent jump toward the end of August, can't say it is massive like Simon H is reporting. On one site it shows about 1800+ 404s, but they aren't as old as some of the URLs at a different site where they are claiming "linked from" as sitemaps that have not existed for several years. Periodically they seem to be using antique cached sitemap versions or maybe following through all those scraped "directory" links.
12:49 am on Sept 8, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11780
votes: 742


Not seeing anything significant on my end. I would assume it relates to the number of incoming links pointing to outdated file paths. Google may have recently crawled a couple reserves for those broken links that in turn affected your report.
7:33 am on Sept 8, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@keyplyr I don't think so. As per original post, the 'links from' against each crawl error is a list of similarly outdated URLs. 95% of the 50,000 crawl errors have no current links to those pages. It's as if Google has just recrawled the web from 12 - 18 months ago.
7:47 am on Sept 8, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11780
votes: 742


Simon_H I understand what you're saying. However links from 12 - 18 months ago isn't unreasonable for Google to reavaluate.

Are your pages in the Wayback Machine?

But in Googleland the resource doesn't really need to exist by that report, hense the paradox. From my own experience, crawl errors in that report often have no source.

Sometimes I see errors to pages that have never existed, and not just typos or extra spacing. I attribute some of these to on site visitors that use the address bar to search for terms when my URL is there.
8:58 am on Sept 8, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@keyplyr I appreciate seeing crawl errors that old or inaccurate in GSC is totally normal; we see them all the time too and the numbers increase gradually over time. What isn't normal is seeing jumps of that magnitude (1,000 -> 10,000 and 10,000 -> 50,000) in such a short space of time. Others have seen similar jumps over the last month, so I think this is more than just business-as-usual for Google.

Yes, pages are in Wayback.
10:52 am on Sept 8, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12110
votes: 337


...It's as if it's performing an exhaustive and historical update of its link graphs.
Simon, that's probably a good description of what's happening. Google does this periodically. Conceivably, with a Penguin announcement in the works, they're trying to establish some sort of clean reference point. I've observed that such crawls often happen at times of big changes. This thread goes over a bunch of possibilities...

17 May 2013 - GWT Sudden Surge in Crawl Errors for Pages Removed 2 Years Ago?
http://www.webmasterworld.com/google/4575982.htm [webmasterworld.com]

As tedster noted...
... it means that Google has a list of every URL they ever crawled and occasionally they look again, even years later. They do that kind of "historical crawling" on various cycles and I do see them doing that in recent days....

This is confirmed by the 2006 interview with the Google Sitemaps Team, which I reference at the end of the thread, and which is worth reading.
11:16 am on Sept 8, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@Robert_Charlton - Fab! I hadn't seen that thread. Lots of people were seeing this happen in early-to-mid May 2013... and then Penguin 2 hit on May 22nd. Things change in 3 years, but if this is a repeat of 2013, it suggests Penguin will follow very soon.

It's a shame we didn't hear more from those who saw this in 2013. I'm interested to know if the sites seeing this are a purely random choice, or if they are sites Google has identified are in contention for Penguin (either recovery, or hit, or simply testing).
12:32 pm on Sept 8, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12110
votes: 337


I'm interested to know if the sites seeing this are a purely random choice, or if they are sites Google has identified are in contention for Penguin (either recovery, or hit, or simply testing).
Simon, I coincidentally was wondering the same thing about your site(s) and what profile you might be fitting. You mentioned that you'd beem "under both Penguin and Panda, but been clean for years"... but since this is a time trip back to old data sets, perhaps with a comparison to your present status, I'd think they be happy to zero out all old transgressions whenever data supports that.

I wonder whether you have a sense of your Penguin disavows being reexamined, or whatever they do in this kind of review. I also wonder what it suggests about how how they're cleaning this up.

I'm assuming that Google would be happiest if all of your disavowed inlinks were now 404s or 410s, and that periodic respidering of those reconfirmed an improving status over time... thus cleaning up complicated bookkeeping issues for them. I think they feel it's their duty as a search engine to continue to make this kind of check of old dropped pages... and that you would prefer that it get less and less frequent over time.
2:04 pm on Sept 8, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:1925
votes: 498


One reason some sites maybe be noticing this more than others is that some sites may have more pages that have been deleted over the given time span. I site that has not deleted any of its pages will not see the 404's.

I am seeing a larger number of 404's for pages deleted up to 4 years ago, I am also seeing 404's for pages deleted only 1 year ago. It is possible that the last crawl for the pages deleted 1 year ago happened 4 years, and the re-crawl is happening now like with the other pages deleted 4 years ago.

I have a large static site that has evolved over the years, at my peak I had in the order 35M pages, I now have in the order of 7M. The site has never been fully indexed. So I have large pool of potential 404s.
3:03 pm on Sept 8, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@Robert_Charlton Thanks! So, a background... Starting 5 years ago and finishing 3 years ago, an SEO company built us a spammy backlink profile. (I won't get into issues of fault.) Hit by Penguin in May 2013. John Mueller confirmed we have a link spam issue, as I cheekily asked him to look us up on his penalty server. No link-building since and we've actually achieved an excellent natural link profile as national media, newspapers, etc have written about our products because our brand is getting well known.

The SEO company that build the spammy backlink profile provided us with a monthly list of all links. So we disavowed every one at the domain level. Although, 80% of those domains have naturally expired anyway. We submit monthly disavow lists and I check Google's cache to see when specific links have been recrawled. We have virtually no manipulated links left; the few that haven't expired naturally have been cached since disavowing. So it looks very positive on paper; lots of genuinely natural high authority links and virtually no questionable links left.

You mention that Google would be happiest if all disavowed inlinks were now 404 or 410s. Do you mean the source of those links or do you mean the destination pages on our site? Because, as per above, the pages containing those links are now 404/410 as the sites don't exist any more, but the destination pages on our site do still exist. I don't think that should be an issue, do you?

@NickMNS Totally agree. We're an ecommerce site and so over the years, we have countless thousands of items that are discontinued. And it's all of those products (literally every one) that has come back as a crawl error. Whereas someone with a relatively static site wouldn't see as many crawl errors.
3:27 pm on Sept 8, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


BTW Jennifer Slegg asked Google's John and Andrey a similar question yesterday: [thesempost.com ]

The question was more about does a sudden increase in crawl mean the site is in contention for a manual action or an algo update is likely. The answer was a carefully worded 'no', but I think we know this already as Google has said it multiple times previously.

I think what we're talking about here is subtley different, plus there's the whole causation vs correlation thing. Just because a sudden crawl increase doesn't mean an update is imminent, it's still very possible that certain updates will be preceded by a crawl increase. And when multiple sites see a sudden jump in crawl errors all around the same time as is happening now and as happened in May 2013, then that suggests to me that something abnormal is happening.
12:54 am on Sept 9, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3381
votes: 270


under both Penguin and Panda

Even if your site eventually escapes from Panda and/or Penguin, it will still have permanent scars from both of them. In other words, it will never be as successfull as it would have been if it had never been penalized.
10:39 am on Sept 9, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


I asked John Mueller about this at the Hangout earlier. He said he'd seen chatter about this so he'd already checked with the engineers. And it's apparently nothing to do with Penguin and is completely normal.

I'm not convinced this is just coincidence. From a statistical point of view, having multiple sites flagging massive jumps in crawl errors around the same time (we're actually receiving warning emails from GSC it's so severe) is unlikely to be pure coincidence, plus the last time people saw this happen, Penguin 2 hit a week later.

@aristotle Is that your opinion or do you have evidence? Google have repeatedly said they don't hold grudges and many sites appear to have fully recovered from both Panda and Penguin.
1:07 pm on Sept 9, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3381
votes: 270


Simon_H --
Perhaps a better way to say it is that a penalized site is permanently weakened forever after, even if it eventually escapes from the penalty. This is based on simple logic.

First of all, because a penalized site gets less traffic, it loses opportunities to attract backlinks that would help its future rankings. This reduces its traffic from what it would have been even after it escapes the penalty, meaning that it continues to lose opportunities to attract backlinks even then. The effect is propagated into the future, so that the site's rankings continue to suffer from the "after-effects" of the penalty forever.

As a related argument, take the case of an ecommerce site. Its reduced traffic during the period of a penalty means that it could miss out on acquiring some repeat customers that it would have acquired otherwise, permanently reducing its sales even after the penalty is lifted. In a similar way, a penalized forum misses out on opportunities to attract new permanent members, and a penalized blog doesn't pick up as many permanent followers as it would have.

There's also the question of "trust". Does a penalized site permanently lose trust in the eyes of google's algorithm? I don't know. but I can't rule out the possibility.
1:16 pm on Sept 9, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@aristotle Yes, totally agree with first 3 paragraphs - I see what you mean. Regarding last paragraph, I don't know either!
11:24 pm on Sept 9, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 25, 2005
posts:3566
votes: 28


If your an ecommerce website (or any website) and you have a product (or page) that has become obsolete and you (or the webmaster at the time redirect or 410 the url). People forget time moves on a new move to a new server new coding etc. and you don't take care of the old url's from 1 year or 12 years each and every url produced by this website has to have a final destination. Be a 410 or a redirect. If your seeing a large number of 404's somebody didn't do their job.

The ole saying is Google never forgets. This is what I believe is happening here.
1:02 am on Sept 10, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:1925
votes: 498


@Simon_H just to be clear I have not seen a large spike in crawling, What I have noticed is that the errors reported are far more heavily weighted in pages that were deleted 3+ years ago.

As mentioned above, I have a somewhat special situation where I have trimmed millions of pages, and my site has never been fully indexed. So as result I see a large number of 404's report on a daily basis. Over the past few month (6 to 12) these pages have mostly been pages delete during site revamp last Feb, or pages no-indexed before and in anticipation of the revamp.

Given the above I am able to observe this phenomenon of the sudden re-crawling of old and obsolete pages.

@bwnbwn
If your seeing a large number of 404's somebody didn't do their job.

It has been mentioned many times by Google (John Mu and co.) that if you delete a page you can and often should return 404 for that url. The large number of errors report is simply checking if the page is still gone, and does not negatively impact you in any way.
10:26 am on Sept 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@NickMNS Cheers. Yes, I think it's normal for sites that have a high page turnover to see a continual increase in crawl errors over time, and those crawl errors are often to pages that are many years old. We see that too. What we weren't expecting was all 50,000 pages to come back in one go!

Good last point. @BushieTop should take note, because he's also seen this jump in legacy crawl errors and I believe is thinking about 301ing them. As you say, there's nothing wrong with 404s (unless it's due to broken internal links) and 301ing them all can look very unnatural. If you have some decent inbound links to a 404 page and there's a very similar live page, then 301, but otherwise best to leave as 404.
11:19 am on Sept 10, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3381
votes: 270


NickMNS wrote
I have trimmed millions of pages

I doubt that makes a good impression on google's algorithm. At the very least it suggests poor planning, or a trial and error approach where you keep stumbling around hoping to eventually come up with something that works. Also, it's hard to see how pages that are created so rapidly could be of much quality.

On most of my sites I've never deleted a single page. It usually takes me at least a month to write one article, if you count the time spend on research, and I don't expend that much time and effort on something unless I intend for it to be a permanent part of the site.
3:33 pm on Sept 10, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts: 1925
votes: 498


@aristotle in principal I agree with you. But sometimes in life one makes mistakes, especially when you do not know what you are doing and you listen to others that pretend to know what there doing for advice. Again my site is unique I provide stats about widgets, when I first launched the site I had one page for each widget type, thin content to be sure. I then aggregated the data and created one page for each widget group with more in depth stats about the group. The pages were cut all at once (well I did that 2 twice). Clearly this clean-up was beneficial, since my last update in February my traffic has grown 3 fold, and continues to rise.
4:00 pm on Sept 10, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3381
votes: 270


NickMNS -- I'm glad that it's working out so well for your site now.

I just went on a rant because over the years I seen people come here complaining about their rankings in google and then mention that they wrote 1000 articles for their site in one week, or some such.
5:51 am on Sept 20, 2016 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:Oct 24, 2003
posts: 600
votes: 4


I posted something similar in August. This just happened to coincide with the relaunch of my ten year old site and I found it very odd that Google was reporting the 404 errors for pages that haven't existed for years... The oldest were ten years old and predated the site I just replaced! I spent ten days catching hundreds and hundreds of pages and redirecting them. My htaccess file is 4,600 lines now... [webmasterworld.com...]
9:59 am on Sept 20, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12110
votes: 337


You mention that Google would be happiest if all disavowed inlinks were now 404 or 410s. Do you mean the source of those links or do you mean the destination pages on our site? Because, as per above, the pages containing those links are now 404/410 as the sites don't exist any more, but the destination pages on our site do still exist. I don't think that should be an issue, do you?
Simon, yes, posting as I did at roughly 5am my time, I didn't state it very well, but I don't think it should be an issue either. I doubt from what you've described that you have any reasons to worry. I was talking about the source of those questionable links... and that, as a "bookeeping" issue, Google would be happiest if many or all the links that had been disavowed had also been removed. I assume that a shorter list that would require less computation as Google re-evaluates a site's inbound links.

I'm assuming something like the following.

If...
a) the links pointing to you are still there and the destination pages on your site are still there, then they've got your disavow list, which is essentially a list of nofollows, that needs to be referenced as they compute your backlinks. They've also got to assess your disavow sincerity and to factor that in. I think they do that in part by evaluating subsequent improvements made on your site, which presumably is reflected by new, more trustworthy inbound linking.

If...
b) the links pointing to you are still there and the destination pages on your site are moved or gone, then in some particularly bad cases they need to decide whether you've actually repented about those (destination) pages and gotten rid of them, or whether they and you are playing whack-a-mole and that you're just shifting your urls to avoid the penalties.

But if...
c) the links were gone, then it's kind of analogous to their 404/410 crawl list, where if over time the urls on the disavow list return continuous crawl errors (ie, that the source links would be gone), the less frequently they'd feel they need to recrawl the disavowed links.

I think you nail what's probably going right now, regarding what Jennifer Slegg is reporting, including your observation about causation/correlation, which had struck me too is actually the situation.

It's likely that Google can't in fact actually predict what is going to happen until after they see what the recursive computations on various algos look like after the current deep crawl.

To relate this to some earlier thoughts I'd posted...

According to Google: Penguin 3.0 is continuing
Dec, 2014
https://www.webmasterworld.com/google/4719313.htm [webmasterworld.com]

My own speculations here: I'm thinking that the algorithm may be highly "recursive"... with the same or related processes repeated on the results of the previous operations, giving us results that are increasingly refined. There's likely a pause to check results at every step, so Google can gauge whether the algorithm is working as anticipated and decide what to do next.
After a deep crawl would be a good time to pause and evaluate.

I trust at 3am my time that this makes sense now. ;)

10:10 am on Sept 20, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12110
votes: 337


More about crawl errors... this from an extensive Jan 11, 2013 Google+ post John Mueller made about how Google looks at these may also be of interest. I'm quoting the first 2 of 7 observations John made, and I recommend the whole post...

John Mueller > Public
https://plus.google.com/+JohnMueller/posts/RMjFPCSs5fm [plus.google.com]

HELP! MY SITE HAS 939 CRAWL ERRORS!1

I see this kind of question several times a week; you're not alone - many websites have crawl errors.

1) 404 errors on invalid URLs do not harm your site's indexing or ranking in any way. It doesn't matter if there are 100 or 10 million, they won't harm your site's ranking. [googlewebmastercentral.blogspot.ch...]

2) In some cases, crawl errors may come from a legitimate structural issue within your website or CMS. How you tell? Double-check the origin of the crawl error. If there's a broken link on your site, in your page's static HTML, then that's always worth fixing. (thanks +Martino Mosna)
And five more observations worth reading, with links to longer posts worth checking out too.

This 56 message thread spans 2 pages: 56
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members