Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Can you redirect with a 301 to a 404 page on a different site?

         

anallawalla

1:24 am on Nov 24, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



A variation of this old thread [webmasterworld.com]:

There is a mobile version of a site: m.example.com - it is an abbreviated site
There is a desktop verson of the same site at www.example.com - it is richer.

It is currently configured to send any 404 requests from the mobile site with a 301 to the desktop site. These are not mapped 301s to go to corresponding URLs. m.example.com/blah simply goes to www.example.com/blah even if that is not a corresponding page.

In that old thread, tedster (RIP) said that a 301 to a 404 is not a problem (presumably when on the same server), but here is my thinking:

For the mobile (nonexistent - deleted but indexed) URL, a Googlebot visit will give Google a 301 to the desktop site, so it notes briefly the "new" (but possibly nonexistent) URL, then it gets a 404.

Will Googlebot get the message that the mobile URL is now a 404?

3zero

1:56 am on Nov 24, 2015 (gmt 0)



Interesting one , have you maybe thought of a different approach, you could use "curl" to fetch and display content off www.example.comon the mobile site and even alter the base url to m.example.com for all links in content.

This would mean 404 pages and headers would be printed out if the page doesn't exist on www.example.com

Just an idea ;)

lucy24

2:53 am on Nov 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How big is your site? It's always preferable-- both for search-engine purposes and for your human users-- to return the final response on the initial request.

If the site is huge, do the moved/deleted pages fit into any kind of pattern? Obviously nobody is going to suggest that you hand-code several lakhs of individual responses.

Do the two versions, mobile and plain, physically live in the same directory on the same server? Under normal circumstances, this is irrelevant to search engines and invisible to humans-- BUT it can make a difference in the kinds of response options that are available to you.

nonexistent - deleted but indexed

Is it really impossible to return a 410 to these requests?

I think you should start by spending some time thinking about what would be most useful and beneficial to your human visitors. This doesn't seem like a case where you have to throw a few humans under the bus in order to serve the greater good ;) Consider, in particular, that some of your human mobile visitors will be on limited and/or expensive data plans, or in the middle of nowhere with slow connections.

anallawalla

7:33 am on Nov 24, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The URLs are in the style of m.example.com/green-widgets, m.example.com/pretty-blue-widgets and so on. No pattern to make a smaller .htaccess file.

More of an academic question from Google's viewpoint. The Mobile site has fewer than 11 domain links, some being from low-value forums, so I am not too concerned about link juice being lost. Soon there will be just one responsive site.

Google's blog posts say that anything other than a true 404 response is a soft 404 and this 301 > 404 on the parent domain smells like it.

lucy24

7:18 pm on Nov 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sure, they don't like 301 > 404, (or, of course, anything with > in the middle) but really the only way they can manifest their dislike is by reducing the status of ... the URL that no longer exists at all. So unless their overall crawling experience is so miserable that they take a dislike to your entire site, which is probably a different thread, what's the worst that can happen?

I don't see how you (where "you" = google) can call something a "soft 404" if, well, it is a 404.

Andy Langton

12:38 am on Nov 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



301 to a 404 is a 404. A 301 is stored during crawling only as a pointer to a new URL. So, Googlebot might still hit the 301 occasionally, but if the end result is a 404, then it's a 404.

Look at it another way. Google doesn't list 301s in the search results, so from that point of view, anything that 301s can be considered irrelevant for actual ranking purposes. The value from the 301 gets passed to the next URL. If the next URL is a 404, then that's the end of the matter ;)

anallawalla

1:30 am on Nov 26, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'm asking about a 301 to another site that delivers the 404.

Would Google eventually "accept" that the deleted page on the first site no longer exists?

Ranking isn't part of the question.

tangor

2:46 am on Nov 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the fact that the previous site has a 301 will indicate it is still "live". Whether there is a down side to doing that can only be determined by doing it and see what happens.

Me.... if the 301 is intended to be a 404, no matter what site it is directed to, then 410 and not have any worries.

Andy Langton

8:43 am on Nov 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> Would Google eventually "accept" that the deleted page on the first site no longer exists?

Does Google ever accept that a 404 is completely gone? :)

>> Ranking isn't part of the question.

I meant in the broader sense of any ranking 'value'.

>> Would Google eventually "accept" that the deleted page on the first site no longer exists?

Google will keep asking for the 301, but as it resolves to a 404, any value is dropped. But, as above, Google will keep asking for 404s and 410s, too.

There are many cases where a subdomain will 301 to a 404 on another subdomain. Most sites have a blanket redirect from non-www to www. Asking for example.com/broken will usually 301 to www.example.com/broken, then 404. Strictly speaking, the redirect shouldn't happen, but there's no really any problem that can occur because it does.

As to the 'soft 404' question, this is not the case here. A soft 404 is broken content with a 2xx status code. The content might be broken because it says "File not found" or because is shows completely irrelevant content than is expected (e.g. a bulk redirect to a homepage). But it has to end in a 2xx status code to be 'soft'. What you have is, essentially, a broken link.

anallawalla

11:30 am on Nov 26, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I knew that returning a 2xx was a soft 404, but this statement on a Google page was conflicting;

A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn’t exist.


[googlewebmastercentral.blogspot.com.au...]

There are plenty of other Google references that are more precise. :)

Andy Langton

11:50 am on Nov 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think the intent is clear from what they've written, even if it is technically imprecise (e.g. the URL absolutely exists, it's the content that doesn't). On the help pages the reason to avoid this is given as the below:

Firstly, it tells search engines that there’s a real page at that URL. As a result, that URL may be crawled and its content indexed.

[support.google.com...]

This obviously can't happen with a redirect. There's no "secondly", incidentally!

Certainly, my opinion is that the 'soft' part of a 'soft' 404 is the 2xx status code.

That said, I've checked on the data, and Google does not agree with me. At least, Webmaster Tools definitely lists (some, doesn't seem to be all?) 301s to 404 pages as soft 404s. So, if it's soft 404s you want to avoid, you would likely need to ditch the redirect.

lucy24

7:10 pm on Nov 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A soft 404 is broken content with a 2xx status code.

No. A soft 404 is nonexistent content with anything other than a 404/410 code. Google themselves say so unambiguously; I don't see where the confusion lies.

The 301 version is most common in sites that handle all requests for nonexistent pages-- that is, requests that ought to receive a 404 or 410-- by redirecting to, for example, the front page.

If your logs ever show a Google request for a garbage URL like jmtfgkhjyvuihgft7dyj.html, that's the Googlebot verifying that the site is capable of returning a 404 response. It's programmatically triggered any time a site passes some threshold of 301s (as can happen from natural causes if you've done a major site redesign).

Andy Langton

7:17 pm on Nov 26, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google themselves say so unambiguously


I don't think that help page is technically precise at all, hence why I think a 2xx response is required at some point for the error to be "soft".

See also:

A "soft 404" occurs when a webserver responds with a 200 OK HTTP response code for a page that doesn't exist rather than the appropriate 404 Not Found.

[googlewebmastercentral.blogspot.co.uk...]

Instead of returning a 404 response code for a non-existent URL, websites that serve "soft 404s" return a 200 response code.

[googlewebmastercentral.blogspot.co.uk...]

incorrectly return an HTTP 200 code in the case of soft 404 errors

[googlewebmastercentral.blogspot.co.uk...]

The "anything other than a 404 or 410" language, I believe, is to discourage mass-redirecting of 404s.

keyplyr

4:56 am on Nov 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A successful 301 redirect to any 404 page is a 200.

lucy24

5:31 am on Nov 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the page is 404 how can the redirect be called successful?

Sooner or later, the shorthand "404 page" is going to lead to grief, because it can mean EITHER "an URL that generates a 404 response" OR "the physical page shown to humans along with a 404 code" (which, in the case of a malformed ErrorDocument directive, really can be requested by name).

tangor

6:13 am on Nov 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Going back to OP, as this thread is all over the place.

For the mobile (nonexistent - deleted but indexed) URL, a Googlebot visit will give Google a 301 to the desktop site, so it notes briefly the "new" (but possibly nonexistent) URL, then it gets a 404.


This is all about g... so taking it apart:

A non-existing page, now deleted but once was indexed (and because g never forgets a url), the webmaster wants to pass a 301 to a new site which also does not have that page to return a 404 because it does not exist MERELY TO SAY THERE IS A NEW SITE?

Just checking.

Meanwhile:

If the page was deleted (or does not exist) from the original site it should return 410. No 301 required.

keyplyr

8:21 am on Nov 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, one can certainly redirect to any page (even one used by the server as a 404 if it exists & the server allows it) but the server response would still be a 301 (or respectively 302.) To get an actual 404 the redirect must point to a non-existent page on that server, then there would be 2 distinct server responses, a 301 & a 404.

If the OP just wants to redirect to another page on another server account, that's entirely doable also but there would not be a 404 response code given.

Note that some server configs don't follow standards. Several servers I've worked at don't handle custom 404s properly, giving 200's instead of 404 responses. This is more common than one might think.

Andy Langton

4:37 pm on Nov 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the webmaster wants to pass a 301 to a new site which also does not have that page to return a 404 because it does not exist MERELY TO SAY THERE IS A NEW SITE?


I don't believe this is correct - there is a 'blanket' redirect from one subdomain to another (mobile to non-mobile) which redirects even if a corresponding page doesn't exist. The system used does not 'know' if the request will eventually be a 404. The implication is, does this matter, and if so is it worth fixing? I say, no, unless you're concerned about "soft 404s".

anallawalla

10:34 pm on Nov 28, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yes, these are Drupal 6 and 7 sites respectively. Apache platform. I was trying to get into the mind of the server administrators, which is a near-impossible thing to do in an enterprise that uses an offshore IT service where the staff change all the time. I like 404s to be served by the same platform without any redirect.

Thank you all.

whitespace

4:50 pm on Nov 29, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



...there is a 'blanket' redirect from one subdomain to another (mobile to non-mobile) which redirects even if a corresponding page doesn't exist.


I thought the "blanket" redirect is only for 404s. ie. "which redirects [only] if a corresponding page doesn't exist."

FWIW, the linked Webmaster Central Blog regarding "soft 404s" suggests checking the "proper HTTP Response by using Fetch as Googlebot in Webmaster Tools". Well, the Fetch as Googlebot tool simply reports the 301 redirect response.

lucy24

8:11 pm on Nov 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the Fetch as Googlebot tool simply reports the 301 redirect response

Makes sense. They can only show one response: the one to the initial request. After all, the very definition of a 300-class response is "direct the user-agent to make a new request". New request, new response.

The last time I looked into it-- that means looking at actual behavior in logs, not at google's documentation-- the Googlebot follows-up 301 responses more-or-less immediately unless they already happen to have visited the new target within the past hour or so. "More or less" means it isn't instant, as with a human browser, but within a minute or so. Presumably the delay is where they check whether they've already been there, implying that the lookup is less work for the Googlebot than making the request.

I don't think anyone is going to disagree with the contention that it's best to give the ultimate response, such as a 410, on the first request. The question is how to balance the extra work of coding this response against the possible damage from a chained redirect.