homepage Welcome to WebmasterWorld Guest from 54.161.247.22
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 51 message thread spans 2 pages: < < 51 ( 1 [2]     
Google Displaying 'Soft 404' Errors in Webmaster Tools
AnkitMaheshwari




msg:4147974
 5:41 am on Jun 7, 2010 (gmt 0)

I am seeing some of my site URL's mentioned in the Crawl Error report under the 'Soft 404s' head. Is this new? It says 404-like content in the details section.

 

g1smd




msg:4149942
 8:15 pm on Jun 9, 2010 (gmt 0)

Hmmm. I can't find a single "soft error" in any of the reports I have access to.

Just checking... this is available world-wide now, not just the US?

BradleyT




msg:4151245
 2:49 pm on Jun 11, 2010 (gmt 0)

Yesterday we went from 14 to 22 soft 404's reporting.

Our 404 page returns a 404. What happens is a URL no longer exists so it gets 301'd to the homepage which returns a 200 and google incorrectly labels it a Soft 404.

So I guess Google is now the judge and jury of where we can redirect pages to.

The URLs showed up in the webmaster tools 404 report at one time so I created 301s to get rid of them.

helpnow




msg:4151305
 4:12 pm on Jun 11, 2010 (gmt 0)

BradleyT - quick message, top of my head, and I may be wrong... If it 301s to the home page, and returns a 200, doesn't that "bad" URL that you want to kill actually continue to live on, taking on the content of your home page, and thus do you now not have 22 URLs all showing your home page as the content, i.e. 21 duplicate content versions of your home page? I believe 301ing to yoru hoem page is one of the wrost thigns you can do, unless I misunderstood your situation. We did this about a year ago, and got thousands and thousands of dupes of our home page, and lost our site links in the process, and our home page didn't come up for a site:domain.com search.

caribguy




msg:4151314
 4:31 pm on Jun 11, 2010 (gmt 0)

From section 10 of RFC 2616 [w3.org]

10.3.2 301 Moved Permanently
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs.

10.3.3 302 Found
The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.


The way I interpret this is that if a the content of a resource can now be permanently found at another URL, the server should return a 301 status. In my case and I guess BradleyT's as well, this happens to be the homepage. Google has no business rewriting the HTTP protocol, there is NO SUCH THING as a "soft 404"

BradleyT




msg:4151327
 4:52 pm on Jun 11, 2010 (gmt 0)

The real funny thing is that 2 of these URLs in the soft 404 report used to be our homepage 2-3 years ago - index_us.php and welcome.php.


I mistakenly had nofollow,noindex on our 404 page up until the beginning of this week. Perhaps that was confusing Googlebot into thinking we didn't have any 404 page at all and these soft 404's will go away on their own in a week or two.

g1smd




msg:4151333
 5:02 pm on Jun 11, 2010 (gmt 0)

The way I interpret this is that if a the content of a resource can now be permanently found at another URL, the server should return a 301 status.

Yes, but it should be a "one to one" mapping:
old-page-1 => new-page-1
old-page-2 => new-page-2
old-page-3 => new-page-3
old-page-4 => new-page-4

It might be a "several to one" mapping but it will not be a "many to one" mapping.

In my case and I guess BradleyT's as well, this happens to be the homepage.

No. The new URL for a deep content page will be another deep content page, or at worst a section index page. It will never be the root index page.

In particular, whatever the new page is, it will never be a replacement for dozens or hundreds or thousands of old pages. That's just not a plausible site architecture. The old URLs should return a 404 or 410 and the error message can be customised on a per-directory basis especially for those URLs.

caribguy




msg:4151362
 5:49 pm on Jun 11, 2010 (gmt 0)

it will never be a replacement for dozens or hundreds or thousands of old pages


Agreed.

index_us.php and welcome.php.


Reflects my situation too. Again, the content from those pages has moved to a new, PERMANENT URI years ago. The content still exists, so there should definitely not be a 404 or 410 response.

BradleyT




msg:4151376
 6:27 pm on Jun 11, 2010 (gmt 0)

It might be a "several to one" mapping but it will not be a "many to one" mapping.


example.com/sale/widgettype1/
example.com/sale/widgettype2/
example.com/sale/widgettype3/
example.com/sale/widgettype4/

Now the sale for all categories is on the homepage. How is that not many to one?

helpnow




msg:4151453
 9:59 pm on Jun 11, 2010 (gmt 0)

Oh, you CAN do many to one if you want, but he meant that if you do, you might run into problems.

He meant that a 301 is intended for an URL that has _moved_. It isn't logical to _move_ a bunch of URLs to 1 place.

Say you used to have your contact page here: example.com/contactus but for some reason you decided to move it over here: example.com/howtocontactus, so, you 301 the old URL into the new URL.

The point is that probably a 404 or a 410 is better suited for the situation of many urls that no longer exist.

Meaning, if the URL:

- no longer exists -> use a 404 or 410
- has simply moved to a new URL -> use a 301

If the URLs are dead, kill them off with a 404 or 410. If they have moved (which will likely be a 1:1 mapping), 301 them.

If you 301 a bunch of URLs into one destinaton URL, you will likely have problems.

helpnow




msg:4151454
 10:01 pm on Jun 11, 2010 (gmt 0)

P.S. note the distinction between several to one, and many to one... Where several might be 1-8 URLs, say, that might make sense, but more than 10 - probably a 301 situation. When in doubt, it is safer to kill the old URL than take a chance with a 301.

g1smd




msg:4151468
 10:47 pm on Jun 11, 2010 (gmt 0)

Yes, it's quite logical for "Acme Widget Mk I" to be redirected to "Acme Widget Mk 2" when the former product is no longer available.

It's also not a problem for both "Acme Widget Mk I" and "Acme Widget Mk II" to be redirected to the "Acme Widget Mk III" page.

If both of the old products each had a "specs", "accessories" and "review" page, you might have simply decided to telescope all of those to the main page for the new product... however, you appear to be better off redirecting both "specs" pages for the old products to the "specs" page of the new product, both "accessories" pages for the old products to the "accessories" page for the new product, and both "reviews" pages for the old products to the "reviews" page for the new product.

Taking it further, it's certainly not seen as legitimate for 100 "Acme Widget" pages and 100 "Acme Gadget" pages and 100 "Acme Doodad" pages to all be redirected to a single target URL. That would potentially be seen as trying to manipulate PageRank.

After following just "one" of the redirects, the searchengine would have already found the new URL. Spidering all the other redirected URLs, and with all of them having the same single destination, would be seen to be wasting the crawl budget of the searchengine.

sublime1




msg:4152325
 1:43 pm on Jun 14, 2010 (gmt 0)

I found one Soft 404 on our site; it was fine, but it took 2 301's to get to the final page, which is completely fine. I suspect they're working out the details.

pageoneresults




msg:4152332
 1:50 pm on Jun 14, 2010 (gmt 0)

But it took 2 301's to get to the final page, which is completely fine.


I'd consider that to be a problem.

301 > 301 > 200

That's a chain and is to be avoided at all costs.

1script




msg:4152432
 5:18 pm on Jun 14, 2010 (gmt 0)

@page1:

301 > 301 > 200
That's a chain and is to be avoided at all costs.

That's a pretty categorical statement. Would you care to elaborate?
I have this exact setup on many of my sites for technical reasons:

  1. Site 1 redirects to Site 2 but does not "know" Site 2's URL structure so redirects to a "dispatcher" script at Site 2 (first 301->301)
  2. "Dispatcher" script knows Site 2's URL structure and converts a rather generic URI in the first 301 into another 301 leading to the final destination with correct URL (second 301->200)

So, what exactly is wrong with this setup other than MC's hint at each 301 losing a bit of "link juice"?

pageoneresults




msg:4152458
 5:58 pm on Jun 14, 2010 (gmt 0)

So, what exactly is wrong with this setup other than MC's hint at each 301 losing a bit of "link juice"?


Based on my understanding of the Redirect Protocols, that would be incorrect handling.

I believe Google is going to report any chain of redirects as a Soft 404 in this type of scenario, specifically the 301 > 301 > 200.

Anytime you have an additional hop involved, the chain effect, I believe it's the kiss of death. I don't think the final destination page gets much of anything here, maybe you can prove otherwise but that second 301 is where things take a turn for the worst.

Based on what I've read in the Redirect Protocols, you don't want any type of chain, at all. Everything should be a 1:1 with no hops inbetween. The W3 even specifically state in the UA guidelines that a request should be treated as a loop once the fifth redirect is detected. I know, you only have two but it is that second one that is the concern.

301 > 200
302 > 200
304 > 200

No chains allowed. I'm sure someone will come along and try to make a valid argument for a chain of redirects but I've not come across one yet so I'm ready for ya. :)

tedster




msg:4152463
 6:08 pm on Jun 14, 2010 (gmt 0)

The only real argument is one of what is practical in a real-world situation. For instance, a "www" canonical fix is in place across the domain and then one bit of content actually gets moved to another directory within a very large infrastructure.

Any backlinks that need the canonical fix will go through an "extra" hop -- unless you've got the resources to hand code each individual case. And on very large websites, those resources may just not be available.

g1smd




msg:4152513
 7:18 pm on Jun 14, 2010 (gmt 0)

Site 1 redirects to Site 2 but does not "know" Site 2's URL structure so redirects to a "dispatcher" script at Site 2 (first 301->301)

That would be an implementation error. You should rewrite the request to the script and the script then issue a single 301 redirect to the correct URL.

1script




msg:4152525
 7:39 pm on Jun 14, 2010 (gmt 0)

@g1smd

Site 1 redirects to Site 2 but does not "know" Site 2's URL structure so redirects to a "dispatcher" script at Site 2 (first 301->301)


That would be an implementation error. You should rewrite the request to the script and the script then issue a single 301 redirect to the correct URL.
I don't think that's possible (unless both sites have synchronized databases): the first 301 is issued on a site that has no way of knowing the correct destination URL. If I return 200 and the correct content on that first 301 without issuing another 301, I would create a URL with content that's duplicate of the final destination.

tedster is correct in that in many cases there may even be a third 301 shoved in there because of canonical non-www -> www rewrites on Site 1 although in this case I think this one would be possible to eliminate via a proper condition in .htaccess

Short of syncing the databases of both sites (which may or may not be possible for various reasons) I'm not sure how can the second 301 be eliminated. Considering the alternative - a duplicate content - I think it's a smaller problem.

g1smd




msg:4152559
 8:58 pm on Jun 14, 2010 (gmt 0)

Your original plan was described as "redirect old site to new site", which then "redirects to script", which then "redirects to correct page". The rewrite eliminates one of those steps.

There is potentially another way; set up a proxy rule to pass the request arriving at the old site, silently and directly to the new site, and get the script on the new site to issue the redirect to the new URL.

Now there's just one redirect from old to new.

There's potentially other ways to do this, but it depends on what type of server software you have available.

1script




msg:4152662
 2:12 am on Jun 15, 2010 (gmt 0)

There is potentially another way; set up a proxy rule to pass the request arriving at the old site, silently and directly to the new site, and get the script on the new site to issue the redirect to the new URL.
I'll definitely look into this (do you mean RewriteCond %{HTTP:XROXY_CONNECTION} in .htaccess?). I'm all for limiting the number of the redirects whenever possible. Oh, and in my case I don't redirect the entire Site 1, which adds some complexity. Site 1 is still running, I have just transferred one section of it to Site 2 hence the complicated setup. Thanks!
seoN00B




msg:4152683
 3:09 am on Jun 15, 2010 (gmt 0)

Ive got about 203 Soft 404s. could be this a server problem?

This 51 message thread spans 2 pages: < < 51 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved