homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 66 message thread spans 3 pages: 66 ( [1] 2 3 > >     
Bugs in WMT.... Ooops

 10:05 pm on Mar 12, 2012 (gmt 0)

Suddenly I see:

Crawl errors
Ni bilo mogoče najti 23
Ni uspelo 4
Dostop zavrnjen 3
Napaka v strežniku 22
Programska napaka 404 1
Drugo 0




 3:29 am on Mar 13, 2012 (gmt 0)

That's some serious data corruption!


 3:32 am on Mar 13, 2012 (gmt 0)

I see it too.

Mine is in Spanish though.

When I click the actual links, it reverts back to English.


 5:06 am on Mar 13, 2012 (gmt 0)

Your system language isn't set to Slovenian is it? Wouldn't work for me, because g### doesn't speak my system's first-choice language. Went over and checked in Safari before I remembered that they haven't even filled in Search yet.


 7:46 am on Mar 13, 2012 (gmt 0)

Seconds after posting the original message (at 2205 UTC yesterday) that started this thread I went to the Crawl Errors section of WMT and it has been completely redesigned from what I had been looking at only a few minutes before! One minute I was on the old system, the next minute the new.

The summary screen rotates the language with each refresh. This is either a silly bug or a <cynic> deliberate error to get people's attention and get people talking about the new features. </cynic>

Crawl errors
Introuvable 23
Non suivies 4
Accès refusé 3
Erreur du serveur 22
Soft 404 1
Autre 0

Crawl errors
Nicht gefunden 23
Nicht aufgerufen 4
Zugriff verweigert. 3
Serverfehler 22
Soft 404 1
Sonstiges 0

and finally to English, which is the system setting for WMT.

I was unable to add to this thread until now, as it was locked.

The new design looks great, and features a new button where you can "clear" errors from the list.

The main feature is that it now shows the number of errors graphed over time. However the data doesn't seem to be correct, especially for the "URL Errors > Web > Server Error" graph.

On one site there were a large number of "Error 500" errors for the last year or more. After fixing those issues in January I have watched the numbers in the old WMT Crawl Errors report slowly decline to 4 as Google has recrawled the URLs. It seemed to me that the error would be removed from the report 6 weeks after the error was last found on the site.

The issue causing that problem was fixed in January and the site hasn't served a single 500 error since then. Just yesterday, WMT listed the final 4 URLs that it had last seen with errors back in January. However, today there are now a large number of those errors relisted, the error count is back up to 45. Yesterday Google were happy the errors had long gone. Today, they are relisted. This is garbage.

The graph is especially misleading. For this one site, it shows 45 errors for today (and for each day going back in time, and a larger number at the beginning of the graph). I would take the data point for today to mean they actually FOUND 45 such errors on the site TODAY. It doesn't mean that at all. It means that as of today they have 45 URLs in their database that when LAST CRAWLED at some point in the past, days or weeks ago, returned that error at that time.

Do I need to go through and "clear" each error, or will Google do that as they recrawl each one? It appears to me that the WMT data being used is at least several weeks old.

The "Not found" error report is correct for the couple of sites I have checked, showing the same data today as it did yesterday.

Make sure you click both the "Server Error" and "Not Found" boxes as there are separate graphs for each. Likewise for the three entries at the top of the page, as each of those leads to a separate graph.

Google still don't report 410 responses as 410. Everything is listed as 404.

< moderator note: see g1smd's post below - this original report
was incorrect and 410 statuses are now reported separately >

The other issue I raised almost three years ago is still there. When you save a report, the filename format varies depending on the report. There's a mix of
sitename-datetime-reporttype.csv, sitename-reporttype-datetime.csv reporttype-sitename-datetime.csv and reporttype-datetime-sitename.csv which doesn't allow for an easy to understand sort order when files are listed. Can we just have sitename-datetime-reporttype.csv for all of the reports?

[edited by: tedster at 12:47 am (utc) on Mar 16, 2012]
[edit reason] insert correction notice [/edit]


 8:05 am on Mar 13, 2012 (gmt 0)

I see strange languages too, they have changed something recently [googlewebmastercentral.blogspot.com ]


 8:20 am on Mar 13, 2012 (gmt 0)



They don't like me, though. I just get English, no matter what I do or where I go.


 8:32 am on Mar 13, 2012 (gmt 0)

Yesh, as I originally posted this thread the design updated before my very eyes... but this thread was locked for the next 5 hours.


 8:32 am on Mar 13, 2012 (gmt 0)

I didn't see this particular corruption but I did find something else around the same time you reported it, my site in a lot of different languages on Google.

It seems Google is running all websites through their translator and archiving them, perhaps in an effort to spot the "grab foreign content and translate it dirty" websites?

I don't know, but the timing suggests they may be related?


 8:37 am on Mar 13, 2012 (gmt 0)

Same here, Search queries is in English yet crawl errors is foreign! I think this sums up Google at the moment, Product Search is also full of bugs! Funnily enough we had our first Google Checkout review in over a YEAR this week despite having many sales via GC.

Also, different sites have different foreign language! One is deffo German.


 9:51 am on Mar 13, 2012 (gmt 0)

Haha thought it was just me, because I was using my iPhone to tether. Loving the new WMT though :)


 11:18 am on Mar 13, 2012 (gmt 0)

sometimes I see other domains, with weird domain names up in the pull down top right, but when i click i dont get to there reports


 12:44 pm on Mar 13, 2012 (gmt 0)

Mon GWT est en français, que je ne comprends pas. Ce n'est pas très utile.


 12:45 pm on Mar 13, 2012 (gmt 0)

I swear I wrote that in English. What's going on here?


 12:57 pm on Mar 13, 2012 (gmt 0)



 1:03 pm on Mar 13, 2012 (gmt 0)

Google still don't report 410 responses as 410. Everything is listed as 404.

I am seeing 404 and 410 responses.
The errors seem to be cummulative - completely misleading. E.g. found 20 errors today, tomorrow recrawled 15 from yesterday + found 5 more, would expect to see 25, but what we seem to have is 40

ButI cannot find "Blocked by Robots" any more - anybody knows where it has gone?


 6:11 pm on Mar 13, 2012 (gmt 0)

Over in the other thread I was wondering if "access denied" is their new name for "blocked by robots.txt". Option B is that "access denied" means 403 and-- another option that just occurred to me-- "blocked by robots.txt" is the new "not followed".

Except that, wait, I have tons of roboted-out pages and they're simply not listed anywhere, although the "can't find" group rolled over from old format to new.


 9:23 pm on Mar 13, 2012 (gmt 0)

In Google's webmaster blog they mention that the "blocked by robots" list has been removed from the "crawl errors" section and will shortly re-appear in the "site configuration" section - because many of the URLs in the "blocked by robots" list aren't actually errors, the webmaster purposely blocked that access.

On one site I am now seeing 410 responses actually reported as 410 in WMT reports. Good stuff. That's been a long time coming.


 1:13 am on Mar 14, 2012 (gmt 0)

Holy krap!

Crawl errors
&#1604;&#1605; &#1610;&#1578;&#1605; &#1575;&#1604;&#1593;&#1579;&#1608;&#1585; &#1593;&#1604;&#1610;&#1607; 23
&#1578;&#1593;&#1584;&#1585; &#1578;&#1578;&#1576;&#1593;&#1607; 4
&#1578;&#1605; &#1585;&#1601;&#1590; &#1575;&#1604;&#1608;&#1589;&#1608;&#1604; 3
&#1582;&#1591;&#1571; &#1601;&#1610; &#1575;&#1604;&#1582;&#1575;&#1583;&#1605; 2218
Soft 404 1
&#1571;&#1582;&#1585;&#1609; 0

Ahh, WebmasterWorld doesn't do UTF-8.

Suffice to say the list is now in Arabic.


 5:18 am on Mar 14, 2012 (gmt 0)

... and Google Translate was stumped on "Soft 404" ?

I feel so totally cheated :sob: Mine's resolutely in English. Maybe it's because my system language is set to not-English?


 9:32 am on Mar 14, 2012 (gmt 0)

mine is latin !


 1:32 pm on Mar 14, 2012 (gmt 0)

Mine was in Spanish, but only parts of the page. It is getting to be more useless. Pages indexed for years are showing now as "Pages Indexed: 0" yet they are indexed and traffic continues, useless non-information.


 1:38 pm on Mar 14, 2012 (gmt 0)

I have had spanish, German French and some Asian language.


 1:39 pm on Mar 14, 2012 (gmt 0)

Sorry for the poor typing. Ipad problems.


 1:39 pm on Mar 14, 2012 (gmt 0)

The old style reports visible until a few days ago showed several errors for a site; errors that had been fixed many weeks ago but which apparently had not been recrawled to see the new status.

The new style reports show zero errors for this site (great!) but the historical graph drops to zero on Feb 22nd, several weeks earlier.

Why the discrepency?


 7:26 pm on Mar 14, 2012 (gmt 0)

I have noticed for quite awhile that they are showing me errors for pages that have not existed for well over a year,(and had been properly removed) the label on a sitemap may read "Downloaded June 10th, 2010". About two weeks ago it had big red warnings that an important page was blocked by robots.txt and I clicked to see the "page" that was blocked was "someimage.gif". Quite often recently I leave there just shaking my head and wondering why I even bother to look, but it helps to know how far off base they have been moving.


 10:48 pm on Mar 14, 2012 (gmt 0)

heh, just noticed part of mine is in Spanish.


 5:02 am on Mar 15, 2012 (gmt 0)

<hare-brained idea>
Maybe this language bug is really showing us some crossed wires in Google's infrastructure - crossed wires that have something to do with that mysterious "zombie traffic" phenomenon.

Bill Slawski (the patent guy) has been covering a series of Google patents over the past year that have to do with which regional data center Google might route any given query to. reference [seobythesea.com]

Each regional data center would have certain standard records, but coupled with other records that emphasized regional/local interests.

...this system will attempt to predict how likely it is that relevant information may be found at a particular producer node, and may or may not take into account particular topics or subject matter that may be relevant to the query. Remember, this is a "prediction" before the query is processed, so as much that can be done without actually finding all results to predict whether or not the query may have to be sent to more than one producer node, the better.

But what happens if this prediction goes haywire? I'm not saying I've got it nailed down, here - but I am catching a whiff of something or other.

</hare-brained idea>

[edited by: tedster at 1:15 am (utc) on Mar 16, 2012]


 5:21 am on Mar 15, 2012 (gmt 0)

I'm getting Turkish today.. Yesterday was German and the day before was French.


 7:52 am on Mar 15, 2012 (gmt 0)

Keep hitting Reload or Refresh. It changes every time.

@Tedster I have no idea, but whatever it is I'd guess that the problem is deep in the infrastructure with multiple causes otherwise it would be fixed already.

This 66 message thread spans 3 pages: 66 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved