Seconds after posting the original message (at 2205 UTC yesterday) that started this thread I went to the Crawl Errors section of WMT and it has been completely redesigned from what I had been looking at only a few minutes before! One minute I was on the old system, the next minute the new.
The summary screen rotates the language with each refresh. This is either a silly bug or a <cynic> deliberate error to get people's attention and get people talking about the new features. </cynic>
Crawl errors
Introuvable 23
Non suivies 4
Accès refusé 3
Erreur du serveur 22
Soft 404 1
Autre 0
Crawl errors
Nicht gefunden 23
Nicht aufgerufen 4
Zugriff verweigert. 3
Serverfehler 22
Soft 404 1
Sonstiges 0
and finally to English, which is the system setting for WMT.
I was unable to add to this thread until now, as it was locked.
The new design looks great, and features a new button where you can "clear" errors from the list.
The main feature is that it now shows the number of errors graphed over time. However the data doesn't seem to be correct, especially for the "URL Errors > Web > Server Error" graph.
On one site there were a large number of "Error 500" errors for the last year or more. After fixing those issues in January I have watched the numbers in the old WMT Crawl Errors report slowly decline to 4 as Google has recrawled the URLs. It seemed to me that the error would be removed from the report 6 weeks after the error was last found on the site.
The issue causing that problem was fixed in January and the site hasn't served a single 500 error since then. Just yesterday, WMT listed the final 4 URLs that it had last seen with errors back in January. However, today there are now a large number of those errors relisted, the error count is back up to 45. Yesterday Google were happy the errors had long gone. Today, they are relisted. This is garbage.
The graph is especially misleading. For this one site, it shows 45 errors for today (and for each day going back in time, and a larger number at the beginning of the graph). I would take the data point for today to mean they actually FOUND 45 such errors on the site TODAY. It doesn't mean that at all. It means that as of today they have 45 URLs in their database that when LAST CRAWLED at some point in the past, days or weeks ago, returned that error at that time.
Do I need to go through and "clear" each error, or will Google do that as they recrawl each one? It appears to me that the WMT data being used is at least several weeks old.
The "Not found" error report is correct for the couple of sites I have checked, showing the same data today as it did yesterday.
Make sure you click both the "Server Error" and "Not Found" boxes as there are separate graphs for each. Likewise for the three entries at the top of the page, as each of those leads to a separate graph.
Google still don't report 410 responses as 410. Everything is listed as 404.
< moderator note: see g1smd's post below - this original report
was incorrect and 410 statuses are now reported separately > The other issue I raised almost three years ago is still there. When you save a report, the filename format varies depending on the report. There's a mix of
sitename-datetime-reporttype.csv
,
sitename-reporttype-datetime.csv
reporttype-sitename-datetime.csv
and
reporttype-datetime-sitename.csv
which doesn't allow for an easy to understand sort order when files are listed. Can we just have
sitename-datetime-reporttype.csv
for all of the reports?
[edited by: tedster at 12:47 am (utc) on Mar 16, 2012]
[edit reason] insert correction notice [/edit]