| 5:49 am on Jun 7, 2010 (gmt 0)|
It's not totally new - although this particular method of reporting may be. I never let my sites return a soft 404 - it's one of the first things I check - so I won't see this kind of report.
A "soft 404" means a page that has content that seems to be a 404 error message, but the page itself is being sent with a "200 OK" HTTP status. Some IIS servers, in particular, are configured to serve custom error pages this way, usually with a 302 Temporary redirect from the requested URL.
It is a technical error, because the 404 Not Found status never gets sent to the requesting user agent - googlebot in this case.
A few years back, this situation caused some sites to pile up a huge amount of duplicate content over time, because every "bad" URL request or link would be indexed with the text of the error message. A 302 redirect normally means the original URL is indexed, but with the text of the redirect target. Then Google began actively testing servers for their 404 handling, especially IIS servers. They don't want to pile up that kind of data in their index.
| 6:13 am on Jun 7, 2010 (gmt 0)|
Thanks for the reply tedster.
In our case, 404 are also getting reported as usual. Also, pages with 404 error are serving the 404 response.
Also, the URL's under Soft 404 are completely different from 404 pages. These pages have content and links as a normal page. So I am not sure why it is getting reported there.
| 7:00 am on Jun 7, 2010 (gmt 0)|
Maybe try "Fetch as googlebot" for one of those URLs - under the Labs section of the menu.
| 7:25 am on Jun 7, 2010 (gmt 0)|
Just did that for two of the URL's and one of them says Not Found when URL is working fine and giving 200 response from other tools. The other URL is a success and no errors reported.
| 6:13 pm on Jun 7, 2010 (gmt 0)|
For one account I see 14 Soft 404's. All 14 of them return a 301 when entered in Fetch as Googlebot.
The 301s are to the pages they're supposed to be going to (not a 301 to the 404 page).
| 7:38 pm on Jun 7, 2010 (gmt 0)|
Google published a new blog post today about this feature in Webmaster Tools - with a screenshot - and it is indeed new. See [googlewebmastercentral.blogspot.com...]
The blog article does not discuss what causes these apparently false positives, however.
| 8:08 pm on Jun 7, 2010 (gmt 0)|
I see the same as BradleyT
A bunch of pages that have not existed in years, and for which there is no ideal replacement. These all have a lot of inbound links and 301 redirect to the site's homepage (as I think best serves my visitors).
| 1:32 am on Jun 8, 2010 (gmt 0)|
404 Softs are the worst Google's idea of the year (so far). When I saw it first, my initial thought was: this is SO easy to get wrong, why can't you just trust the actual standard HTTP response a site sends you?
Well, just two hours later I'm looking at a page of my site that's labeled as 404 soft (404-like content) yet it's an actual content page! Maybe it does not have too much content - someone had posted a link to a relevant site and one sentence explaining what that is - but it's NOT an error page!
Maybe Google engineers should step away from their keyboards, take a deep breath and fix what's broken first before moving onto the next shiny project. I think they are now in a state of euphoria induced by capabilities of their new Caffeine toys.
Somebody (preferably older than 30 if they can scramble someone like that) should round these kids up and take their toys from them before they break something else!
| 2:22 am on Jun 8, 2010 (gmt 0)|
Looking at my other sites: you are correct! Not only Soft 404 shows up on pages returning 200 but they have also started to creep up on properly 301-ed pages as well! This is killing me: I have moved a part of a site to a new domain and due to large amount of redirects and I have a script that converts old URL structure into new. The result of that script is a 301 header. No content (Content-Length: 0). I would have understood why they could not have gotten it rights if there was content after the 301 header, but I triple checked everything with HTTP Viewer and it's just a 301. So how come 301 is no longer a 301?
This is getting ridiculous. I think this has not been well thought out at all: even the intended design of the fix is not working well : a valid 200 OK header page is deemed by Google (my emphasis) to be a 404. Now these 301 redirects are being completely erroneously considered 404? This really must be Google's most botched project of the year (and we are not even half way into 2010!)
| 2:23 am on Jun 8, 2010 (gmt 0)|
Sounds like the heuristic Google is using to analyze page content is not quite ready for prime time.
I'm curious about the false positives. Are any of them NOT the target URL of a redirect? I would also hope Google's heuristic would need them to be the target of more than one redirect.
| 2:33 am on Jun 8, 2010 (gmt 0)|
I need to restate that for clarity: the "404-like content" error shows on the dynamic URL that initiates the 301 redirect, not the intended target of redirect. I assume the result is the same though: the link juice (for lack of a better term) is going to stop at the intermediary redirecting URL instead of flowing to the final destination URL.
|I'm curious about the false positives. Are any of them NOT the target URL of a redirect? I would also hope Google's heuristic would need them to be the target of more than one redirect. |
The regular "200 OK" page labeled "404-like content" in my other example is, obviously, the final (and only) URL.
| 2:53 am on Jun 8, 2010 (gmt 0)|
Should there be a specific content-type header in the 301 response? Or none at all? That's about the only thing I can think of...
In any case the behavior is not consistent, unless more of these "soft 404's" start appearing in the wmt console in the next few days...
I note the ones listed now are from May24 - May31
| 3:51 am on Jun 8, 2010 (gmt 0)|
In my case, they are from Mar28 - Mar30
| 1:36 pm on Jun 8, 2010 (gmt 0)|
|A bunch of pages that have not existed in years, and for which there is no ideal replacement. These all have a lot of inbound links and 301 redirect to the site's homepage (as I think best serves my visitors). |
I just noticed that is what all of my Soft 404's have in common too - pages that are 301'd to the homepage because there's no ideal place for them to go.
| 3:08 pm on Jun 8, 2010 (gmt 0)|
That's interesting. I've long been an advocate of only using 301 to point to true replacement content, but it's been pretty common for SEOs to use a 301 to the home page. If Google now questions that kind of thing as a soft 404, do you think there's a chance that the link juice isn't actually flowing as intended?
| 3:12 pm on Jun 8, 2010 (gmt 0)|
|pages that are 301'd to the homepage because there's no ideal place for them to go. |
Probably not the best option. I've seen quotes from Google that state they do not suggest this at all. Either return a 404, 410 or 301 to a suitable replacement, the Home Page NOT being one of them. There are exceptions to the rule.
I can't find ANY Soft 404s! I feel left out. ;)
| 4:03 pm on Jun 8, 2010 (gmt 0)|
I use custom errors in IIS and I'm pretty sure that they are actually HTTP 500 server errors in my case.
| 4:26 pm on Jun 8, 2010 (gmt 0)|
My 301s-turned-soft-404s are going to the final destination, not just recycled back to homepage. It is possible however, that two or three of those 301s are actually going to the same destination. This is by design (due to differences in pagination between the old site and the new) and I need it to work that way!
|That's interesting. I've long been an advocate of only using 301 to point to true replacement content, but it's been pretty common for SEOs to use a 301 to the home page. |
I do not appreciate them adding yet another layer of complexity to the already complicated business of moving sites!
| 4:33 pm on Jun 8, 2010 (gmt 0)|
I wonder what the introduction of the META Refresh Element would do in this instance? Any ideas? For example, a page that returns a 404 Status and then has a 1 second refresh to the home page. ;)
| 5:03 pm on Jun 8, 2010 (gmt 0)|
@pageoneresults: I have a fair number of 404s with meta refresh to homepage in them. None have appeared in Soft 404 yet. Maybe it's just too soon to tell - we are something like 24 hours into it yet.
But Soft 404 issue aside - I don't particularly like these meta refreshes and have been removing them from my sites (quite a few still remain hence the first paragraph). There has been talk about them being treated as redirects (either 301 or 302 - could not find any consensus on that). In my case many of the no longer existing URLs were actually removed by me for spam - user generated content. I did not want Google to see pages about any number of unsavory subjects redirected to my homepage and have the homepage tainted that way. Maybe I'm just being paranoid ...
| 8:07 pm on Jun 8, 2010 (gmt 0)|
If I detect a bot then I return 404 for a missing page.
If it's not an obvious bot I send the (probably human) customer to the home page where they can look for what they want WITHIN MY SITE! Commercial sense: keep the punter as long as possible.
Interesting that the blog linked to above has three blatant spam responses (still) in it. Do they not know how to trap spam? Oh, no. I remember now, this is google... :)
| 10:45 pm on Jun 8, 2010 (gmt 0)|
I'm liking the idea of a "soft 404 errors" report, especially where it highlights inappropriate mass 301 redirects from many URLs to a single destination. I have long advocated that such redirects are not a good idea.
| 10:53 pm on Jun 8, 2010 (gmt 0)|
This new report also gives us a hint that such redirects may not even be doing any good for the target page.
True soft 404 pages (302 > 200) in the style of old-days Microsoft, seemed to stop getting websites into deep trouble a good while back. Back then was when we started to see googlebot doing test crawling of clearly non-existent urls to inspect the server response.
Now Google has pushed it one step further. This is a good thing, especially if the reports become quite dependable. It's past time to let go of any obsessive stranglehold on every bit of theoretical link juice. Times have definitely changed.
| 11:10 pm on Jun 8, 2010 (gmt 0)|
google is crawling oscommerce product notification links and calling them soft 404s
| 11:57 pm on Jun 8, 2010 (gmt 0)|
|pages that are 301'd to the homepage because there's no ideal place for them to go. |
I'm not showing any of these (yet), but isn't the above practice a problem. If a page doesn't exist anymore then why not just 410 it? The only reason I can think of is to try and recapure the link juice.
I'd only use a 301 if I had to rename the page or move it to a diffent location. I try not to do this...
| 3:26 am on Jun 9, 2010 (gmt 0)|
> This new report also gives us a hint that such redirects may not even be doing any good for the target page.
> The only reason I can think of is to try and recapure the link juice.
Although SEO is a valid consideration, at least one of the posters in this thread is talking about the "traffic retention" aspect of 301-redirecting removed pages to the home page.
The purist's answer -- and I believe what Google "wants," is a 404 error page with a 404 HTTP response code.
That 404 page can include many if not all of the elements of the home page, along with a "non-scary" explanation that the requested URL was removed and links to the HTML site map, major category pages, site search page or facility, etc.
The same is true for intentionally-removed resources, which should return 410-gone status, and an error page almost identical to the one described above for 404.
Just make sure that any bad URL returns an HTTP 404 or 410 status directly, with no intervening redirects of any kind (NB: to include domain canonicalization redirects, etc.)
If that requirement is met, the content can be anything you like, and Google will just have to fix the problems at their end (remove their faulty 'heuristics'), unless they actually want to encourage non-HTTP-compliant error handling (which I doubt).
| 6:03 am on Jun 9, 2010 (gmt 0)|
Is this a diagnostic tool Google have added to Webmaster Tools, or something that they are using for indexing and ranking?
| 6:30 am on Jun 9, 2010 (gmt 0)|
It's a report that has been added to Webmaster Tools. Soft 404 handling has long been a ranking problem for website, not because Google intentionally penalizes for it, but because it creates a technically challenging situation that obstructs optimum crawling, indexing and ranking.
| 3:08 pm on Jun 9, 2010 (gmt 0)|
I've taken a peek into approximately 10 GWT Accounts and found just 1 with Soft 404s. I'm guessing that the Soft 404s link does not appear if you don't have them, I'm almost certain of that.
The 1 site I did find with the Soft 404s is one we consulted on a few years ago. We provided specific instructions on what they should do with 301, 302, 304, 404 and 410. They did not follow those instructions and now have the Soft 404s showing. Message for that client: If you're reading this, I TOLD YOU SO! ;)
If you have flaccid 404s and you think Google is incorrectly reporting, I'd double, triple check everything. If you have any sort of redirect chain in your 404 handling, I think that may present some challenges. I dunno, not my forte, I don't do redirect chains, I think they're poisonous. :)
| This 51 message thread spans 2 pages: 51 (  2 ) > > |