homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 37 message thread spans 2 pages: 37 ( [1] 2 > >     
WMT Crawl Errors: The Next Generation of Reports

 10:29 pm on Mar 12, 2012 (gmt 0)

new WMT - looks pretty interesting. Hopefully will get rid of some of those 404s from years ago:

Crawl errors is one of the most popular features in Webmaster Tools, and today were rolling out some very significant enhancements that will make it even more useful.

We now detect and report many new types of errors. To help make sense of the new data, weve split the errors into two parts: site errors and URL errors.




 12:34 am on Mar 13, 2012 (gmt 0)

System: The following 2 messages were spliced on to this thread from: http://www.webmasterworld.com/google/4428367.htm [webmasterworld.com] by tedster - 11:00 pm on Mar 12, 2012 (EDT -5)

Just a few days ago in a neighboring thread I said idly
Look up the phrase "soft 404". Google hates 'em.

If anyone needs hard (haha) evidence, look at the brand-new Crawl Errors listing on the gwt dashboard. Wasn't there yesterday.

No more "not found" vs "blocked in robots.txt": now there are six count 'em six separate ways to offend the googlebot.

Not found nn
Not followed nn
Access denied nn
Server error nn
Soft 404 nn
Other nn


 3:28 am on Mar 13, 2012 (gmt 0)

I noticed that this afternoon too.

Now if they could let us dismiss the backlinks that are incorrectly linking to our sites and creating all these 404s then I would be happy.


 4:55 am on Mar 13, 2012 (gmt 0)

Yes, I wish they would explain what "fixed" means in the context of pages that don't exist, never did exist, never will exist, nobody ever said they existed...

Oh, wait. If "not followed" means obeying a "nofollow" directive-- as opposed to "We just didn't feel like going there"-- then what's "access denied"? Is it the old "blocked by robots.txt" or is it a cold-blooded 403? Or do 403s count as "server error"?

And now, if they could make up their minds about whether a 404 is or is not the same thing as a 410...


 5:36 am on Mar 13, 2012 (gmt 0)

Can somebody please explain to me why a GoDaddy domain forward to a static page (and not linked to in the site structure) on my main domain shows up as a 'Soft 404' in my WMT? What is the correct protocol for forwarding domains to avoid Soft 404s? Thanks!


 11:22 am on Mar 13, 2012 (gmt 0)

I was looking at the old interface yesterday and the Soft 404s were there already then. Though it does seem to be reporting a few more now.


 11:42 am on Mar 13, 2012 (gmt 0)

I like the new WMT.

As Lucy has already said, I don't like the lack of clarity as to which error belongs where. They appear a little confused.

Regarding 410, I *hope* they don't treat it the same as a 404. I always think of 404 as temporary, for catching coding errors. Assuming Google will continue to visit for some time. 410 I look at as more of a manual, deliberate message telling Google this is gone, go away and please don't come back.


 1:02 pm on Mar 13, 2012 (gmt 0)


Soft 404s have been in the report for some time.

"Fixed" is just a notation to you, not to Google. It removes the URLs from the report so you can more easily see the issues you haven't yet fixed. (If the issue still exists the next time Google crawls the URL, it returns to the report.)

Not followed means some kind of redirect error.

Access denied appears to primarily be URLs that returned a 401 status.

403s are listed in the "other" category.

Google treats 404s and 410s basically the same way.

Play Bach-

The soft 404 report includes URLs that redirect to a page that appears to be an error page. Is the redirect target a parked page?


 2:19 pm on Mar 13, 2012 (gmt 0)

Hi Vanessa!
I don't have any redirects on my site to the pages listed as soft 404s. Rather, these are incoming forwards from GoDaddy. I own a few domains that I bought years ago that are in my niche but I did not develop into websites. For example, "bluewidget.com" gets forwarded from GoDaddy to example.com/bluewidget.html. Nowehere on example.com do I link to bluewidget.html, so why does that show up as a soft 404 in my WMT for example.com? Thanks again!


 2:27 pm on Mar 13, 2012 (gmt 0)

vanessafox wrote:
Access denied appears to primarily be URLs that returned a 401 status.

403s are listed in the "other" category.

Hmm... I'm seeing 403's under "Access denied" and "Other". The ones under "Access denied" are internal, while the ones under "Other" are pseduo-external links. I say pseudo-external because we use an on-site script to redirect to off-site URLs for tracking purposes, so it's essentially a 303-to-403 (or 400 in some cases).

"Restricted by robots.txt" seems to be gone.



 3:16 pm on Mar 13, 2012 (gmt 0)


You're saying that in your example.com reports you're seeing example.com/bluewidget.html? Or bluewidget.com? I'm thinking it must be the first. And that URL returns a 200 status code directly?


Yeah, restricted by robots.txt is gone. I have a question into Google about Access Denied vs. Other. Will update my post when I hear back.

[edited by: tedster at 3:43 pm (utc) on Mar 13, 2012]


 3:47 pm on Mar 13, 2012 (gmt 0)

Hi Vanessa,
Yes, I'm seeing the page bluewidget.html as a soft 404 even though it's just a landing page for the GoDaddy forward, not linked to any other way.

> And that URL returns a 200 status code directly



 4:02 pm on Mar 13, 2012 (gmt 0)

I am *really* unhappy about not being able to download into a CSV file anymore. I made heavy use of that; specially when trying to do site audits.


 4:11 pm on Mar 13, 2012 (gmt 0)

You need to look at what's happening with the soft 404s. I had one... it was a funky URL (with another site's URL appended) that somehow landed on the correct page. The server should have returned a 404, but it returned a 200 instead. It's not in my sitemap...

I use a canonical tag, so Google's probably confused about what's happening... I added a 301 redirect to the URL to tell Google it's a bad link, but the page is found here.

I agree with others stating it's a bit confusing what "fixed" means. I took it to say to Google "I've looked at this one, it's fine the way it is... " meaning that I've either corrected the problem or the 404 is okay.

I noticed a lot of the "more information" links on Google return a 404... too funny.

Sally Stitts

 4:24 pm on Mar 13, 2012 (gmt 0)

Web - 380 errors - all 404s.
1. Consisting of mostly malformed URLs by others, which have been there forever.
2. Every page that I have ever made and removed, going back 7 years. They have dug up EVERYTHING, as if everything I ever did should still be on the web. No way. I trash old stuff, especially if Google doesn't like it. Do I have to tell them about stuff that no longer exists? Can't they figure that out for themselves? Especially, after 7 YEARS? Do I have to tell them specifically, "Make these links go away, the pages no longer exist"?

Mobile - 55 errors - all 404s.
All malformed URLs.
I DON'T HAVE a mobile site, or any other provisions for mobile. Ha.

So, "Much ado about nothing." Nothing I can change, anyway. Another whackadoo fire drill, from my perspective.


 6:11 pm on Mar 13, 2012 (gmt 0)

Vanessa posted a useful review of this issue (including explanations and a list of what's missing) at Search Engine Land [searchengineland.com...]

Besides all of the useful info she covers, I noticed she also said:
I have questions into Google asking about the removed functionality (particularly the confusing changes such as the Not Followed errors) and Ill update this story as I hear back.

That should make for a useful follow-up post.


 6:23 pm on Mar 13, 2012 (gmt 0)

From the article Vanessa wrote:

Previously, you could download up to 100,000 URLs with each type of error. Now, both the display and download are limited to 1,000. Google says "less is more' and "there was no realistic way to view all 100,000 errors - no way to sort, search, or mark your progress." Google is wrong.

Good observation. There is, in general, a kind of dumbing down happening with many of these changes. It's almost like WMT is turning into one of those eye candy reports that no one really cares about, rather than truly actionable information.

I look forward to something much more real from Google in the near future. In the meantime (and maybe in the long run) webmasters may want to use Bing's WMT if they aren't already. A lot of cral information is about our site anyway, and not specifically about Google - no matter whose tool you use to get the data.


 6:28 pm on Mar 13, 2012 (gmt 0)

Every page that I have ever made and removed, going back 7 years. They have dug up EVERYTHING, as if everything I ever did should still be on the web.

That one doesn't seem to have changed conceptually. Yes, OK, you learned of the page's existence in 2007 when you found it on a sitemap. But if it disappeared from the sitemap in 2008, and nothing anywhere links to it, isn't that sort of a subtle hint that the page no longer exists and the failure to find it shouldn't be considered an error?

The unrelated "remove url" page has two modes: active removal requests (default), and every request you've ever made. That would be useful for errors.


 8:21 pm on Mar 13, 2012 (gmt 0)

If you are dependent upon the GWT crawl error files that you can no longer download, there are some paid tools that might at least tide you over. But on the downside - you have to pay for them. If the Recommended Tools item were pinned to the top of this forum, I'd recommend the one I use.


 9:28 pm on Mar 13, 2012 (gmt 0)

In Google's webmaster blog they mention that the "blocked by robots" list has been removed from the "crawl errors" section and will shortly re-appear in the "site configuration" section - because many of the URLs in the "blocked by robots" list aren't actually errors, the webmaster purposely blocked that access.

Parallel thread... [webmasterworld.com...]

Today I am also seeing 410 responses actually reported as 410. Previously everything was reported as 404.


 9:49 pm on Mar 13, 2012 (gmt 0)

1200 pages I removed, blocked in robots.txt and used the removal tool on are now showing up in GWT.

What burns my bacon is the 1000 little boxes to check to clear these WITH NO CHECK ALL BUTTON? Am I REALLY supposed to click on 1000 little boxes in order to clean up something that has no impact on my actual rankings? Pass, I have better things to do.

Like learn Spanish

No se encuentra 1,217
URL no seguidas 0
Acceso denegado 0
Error del servidor 0
Error 404 leve 0


 12:06 am on Mar 14, 2012 (gmt 0)

many of the URLs in the "blocked by robots" list aren't actually errors, the webmaster purposely blocked that access

One would like to think that none of them are errors and all of them were blocked on purpose ;)

Maybe g### could steal an idea from Bing. They've got an extra flag that means "Did you really mean to block this? We think it's an important page."


 1:01 am on Mar 14, 2012 (gmt 0)

I've tried several Chrome extensions now to work around the 1000 checkboxes problem.

Thing is, I can get it to check all the boxes, but Google doesn't actually pay attention to that. All the checkboxes need to have a click registered before they are actually included in the list of urls to mark as fixed. Grrr ...


 1:07 am on Mar 14, 2012 (gmt 0)

Sometimes the blockage is an error. Especially when the
Disallow: / from the dev server is migrated to the live server on site launch day (been there, done that - which is why I usually use .htpasswd on any dev or test site).

 3:50 am on Mar 14, 2012 (gmt 0)

Update: I did find a Chrome extension that allowed me to bulk select the checkboxes and tick them. Not sure if it's ok to post the name of the extension here so I won't, but I can recommend chasing it down. BIG time saver. And now I've got a pretty clean looking WMT.


 5:12 am on Mar 14, 2012 (gmt 0)

I noticed that this afternoon too.

Now if they could let us dismiss the backlinks that are incorrectly linking to our sites and creating all these 404s then I would be happy.

No Doubt!


 9:26 am on Mar 14, 2012 (gmt 0)

There's a new (minor) frustration with the Crawl Error reports. A site online for many years and which I rarely look at now shows there was 1 "URL error" from the start of the graph until some time last week. Nothing has been done to the site for months, but the error was cleared just a few days ago. There's no way to find out what that error was.


 2:58 pm on Mar 28, 2012 (gmt 0)

I just fixed a Not Followed error in WMT, I think.

The main site is configured correctly to redirect some urls but the mobile version doesn't do the redirect.


 2:21 am on Mar 29, 2012 (gmt 0)

I have spent a few days going through the lists as each day brings up new errors to look at. It is finding legitimate errors to fix that I didn't know I had. That part is great. The frustrating part is it grabbing pages from years ago. They are long deleted and it tells me it is "linked from" a current page that had the link long ago or maybe it never did at all. I check and mark them fixed but of course they just come back again and get in my way. How is Google finding links from 5-6 years ago that have been gone for years and making them up for some other page?

I understand the ones that are linked to from some other website I don't control. Although some way of distinguishing that easier would be nice. I can't control other sites so push it in my face unless I want to recover links.

If they could just fix that issue and have it be valid crawling and let me kill the reported "problem"

rango: Select one and then go to the bottom. Hold down the shift key and select the last one. This selects them all...10, 50, 500...then mark fixed.


 3:53 am on Mar 29, 2012 (gmt 0)

I keep seeing errors from other search engines and from Google now.

For example: small search engine has a SERPs page for example.com/folder/web-page.html

It links to the title of the page and has a meta description. Beneath the description, the url is displayed in plain text (no link). And usually the url is broken such as example.com/folder/web. It stripts out the -page.html and results in G spitting it out as a 404 error. I have hundreds of these from around 4 different small search engines.

Then I also have weird 404 errors which seem to be from G crawling its own SERPs with XML errors. Getting 404 errors for example.com/folder/web-page.html<web: and a bunch of junk after that which also includes whatever url is listed beneath me in the SERPs.

I've just been ignoring those errors, no idea if that is the right thing to do or not.

This 37 message thread spans 2 pages: 37 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved