Forum Moderators: phranque

Message Too Old, No Replies

Redirecting all 404s to home page - good or bad?

         

brix76

6:34 pm on Nov 28, 2011 (gmt 0)

10+ Year Member



Hi all,

I have been wondering if redirecting all 404s directly to my homepage is good?
Or having a custom 404 page is better.
Some say, redirecting everything to my homepage could make the robots think I have a lot of duplicate content and rank me lower.
Other say it is indeed search engine friendly - less 404s, better ranking.

What do you guys think - redirect all to homepage or nicely done custom 404 page?

Thanks a lot.

enigma1

5:29 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know exactly the size of the returned 404 or 410 response for those requests. It's only a few hundred bytes or less

So I take it is not a so nice 404 custom page. And since you know exactly the size, it also means is totally useless for the visitor. Same 404 to every one right? The default apache 404 is few hundred bytes also. Why do you bother to hook it, just leave the default.

It's not everyday you see ASP running on Apache.

It's not every day you see webmasters who don't know how to interpret headers. So did you finally figure out what type the server is?

when was the last time you checked for broken images on your site?

There are no broken images. You better pick up a browser or a better tool to check sites or fix your code to properly test something. Didn't you know, unused entries in a stylesheet are not processed by a browser. And spiders don't process them either. So while you're bloating the .htaccess to determine if the visitor is human or bot with endless referrers and agents, there are way simpler methods.

Patterns: Regular Expressions. There's a whole bunch of stuff that doesn't even get through the front door.

What bunch of stuff? Every single character in the ascii table maybe be used in a request. It's application specific. Start with multilingual requests if that makes you more comfortable. You cannot tell from a server script what the application needs to do. I see these errors very often with websites. Yea, just dump a general filter and break various functions of the application. Then call somebody else to clean up the mess.

Even if you know the application specifics, while you duplicate code to filter urls on a server script and then filter POST type requests or other types inside the application I do everything at the same level. More portable, more efficient. The code can work on a different environment with virtually no changes.

We shall start on the definition of a Soft 404 error.

Better start with something you understand how to read. 301 to 200 is not a 404 soft error. At least bring up a relevant reference.

Can you say RewriteMap?

You cannot use a lookup table to parse the web content and make decisions what strings to replace in a request. The application knows about it and can process the request. From your comments, I see you have great difficulties to place a visitor to an appropriate page on any request actually.

pageoneresults

6:36 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's not every day you see webmasters who don't know how to interpret headers. So did you finally figure out what type the server is?


I did. It took me a while too. I had to call a friend to help me understand the header results I was reviewing. ;)

There are no broken images. You better pick up a browser or a better tool to check sites or fix your code to properly test something.


Let me be more literal, there are four broken image references in the external CSS file located at /stylesheet.php. In fact, there are two each for a total of 8 301>200 redirects. That's what, 16 Round Trips to the server?

images/design/so_18hv.gif
images/design/box_root_folder.png
images/design/box_root_pages.png
images/design/box_root_mail.png


Question: Why would you 301>200 invalid image references to the home page?


Didn't you know, unused entries in a stylesheet are not processed by a browser.


I didn't! And you're confusing me now. I look at those entries in the stylesheet and they are being used. If you were able to record 404s, you'd know that there are four broken images being referenced in your style sheet. Are you saying the browser isn't processing those? Bear with me here, I'm new to this stuff.

This has been an interesting discussion. It's made me think about all sorts of things and you've given me permission to run some tin foil hat theories I've had for years. Let's see how all of that pans out. What is it that you know that the rest of us just haven't caught on to yet? That's a serious question, you've got me wondering.

You've also mentioned that you are using the 301 to send visitors to a relevant page based on their query, URI string, etc. I don't see that being done with your application? It appears that everything is 301>200 to the home page, no matter what the invalid referring string/query is?

enigma1

7:24 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



there are four broken image references in the external CSS file located at /stylesheet.php.

Yes I know there are left in the stylesheet with some entries as background images. The css tags aren't used in the pages. So if they're accessed it means something's wrong with the request. The very first one is also in comments in the stylesheet. You can check with FF->Tools->Page Info->Media they aren't pulled in.

The 404 apache handler is hooked to the application. The particular implementation of SEO URLs on this site is old, but you can see relevant redirections if you alter a couple of characters in the request string without altering the extension. The code I have on sourceforge.net via a cms is more efficient it can parse extensions too and can use extensionless urls.

Then depends on the requirements. You can put any kind of complexity parsing the keywords of the links running queries in the database to find the most appropriate page. Pretty much like when you do a search for products but this tries to match web pages because the links are stored in a database.

The other thing is you can expand the algorithm to keep track of old pages that were removed. There is a redirection database table in the code which I believe is easier to manage than having it in the .htaccess. I could not keep url signatures for instance or log headers for requests if the request wasn't processed inside the application. There are link relationships and other details that make things extremely complex with server scripts alone.

incrediBILL

9:20 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Better start with something you understand how to read. 301 to 200 is not a 404 soft error. At least bring up a relevant reference.


The topic of the thread is, and continues to be, "Redirecting all 404s to home page".

A 404 being redirected into 200 OK was, and still *IS*, a soft-404.

Not my definitions, Googles.

Please keep the thread on topic.

tangor

10:10 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Any request which should produce a 404 that is redirected 301 to a 200 response is an incorrectly handled 404. Returning an index/home page for every 404 is incredibly bandwidth intensive... most index pages are significantly larger than any standard 404 or most custom 404 pages.

Returning to OP's original question, yet again, serving a custom 404 with navigation/suggestion to the user is a pretty good idea. Gives you one more chance with the visitor before they click the back button.

incrediBILL

10:38 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



serving a custom 404 with navigation/suggestion to the user is a pretty good idea.


Actually, you can serve any content as a result of the 404, even as enigma1 suggests, something closely relevant to the original query, but it should be on a 404 page.

Maybe something like "Sorry, we couldn't find the requested page but this might be of interest:" followed by customized content for that request.

The SE won't index the content of the 404 page so it's not making any virtual web space, unlike redirects to 200 OKs have the potential of doing, so it's really unlimited what you can do on custom 404 pages.

aakk9999

11:18 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If enigma1 is so keen to send his visitors to the home page when an invalid URL is requested, he should render the home page in his script as usual and send the home page back, but change the headers response being 404.

Of course this means little bit of extra logic in his script that accepts all requests (to execute/duplicate the script that renters the HTML for home page, but returns response 404 with the home page HTML), slightly more work than just firing 301 back.

But if he is really keen for visitor to land on the home page, then this would satisfy his requirement, and the 404 response would satisfy search engines.

lucy24

11:40 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I also don't see why should I waste resources serving a nice 404 page.

I went back and checked. The word "waste", either of resources* or bandwidth, has come up about half a dozen times in this thread and I'm just not getting it. Sending someone to a custom 404 page is only wasteful if every last one of those people then proceeds, via a link on that 404 page, to the front page.

If even a handful of the 404's instead goes directly to some other page-- linked from the custom 404-- then it is no longer a waste, because the 404 page is vastly smaller than the index page. Same if the 404s simply leave the site because they've already been to the index page so they know it doesn't have what they're looking for. (Index-page links by themselves don't count, because those are also on the 404 page.)


* I understand about bandwidth but I don't get the resources. Your files don't wear out, do they? I guess if you're getting an absolutely colossal number of 404s, servicing them could shorten the life of your server, but that's pretty far-fetched.

pageoneresults

12:20 pm on Dec 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Better start with something you understand how to read. 301 to 200 is not a 404 soft error. At least bring up a relevant reference.


Do you want to put some money on it enigma1? ;)

Tell me more about “Soft 404s.”
A soft 404 is when a site redirects any unknown URLs to their homepage instead of returning 404s. This can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.
[GoogleWebmasterCentral.BlogSpot.com...]

^ I've removed all the irrelevant stuff from the above quote and trimmed it down to that which we are discussing. The above comes straight from Google. If you 301>200 what should typically be a 404, Google considers those Soft 404s. It says it right on the tin!

I don't understand why you would argue this fact when it is in writing?

Your 404s are flaccid!

P.S. I launched a test page yesterday and you've probably already started to see referrers from it. This first test is just to see if I can disrupt the indexing patterns of your site. Since I don't have access to stats, I can only guess and will need to rely on you to report back to us. I'm expecting at some point I'll get a Sticky Mail asking me to take down the test pages. ;)

I'm also expecting that once you detect the flaws I'm exposing that you'll be quick to fix them. Once fixed, I'll move on to the next test, right now we're on Test 001.

but you can see relevant redirections if you alter a couple of characters in the request string without altering the extension. The code I have on sourceforge.net via a cms is more efficient it can parse extensions too and can use extensionless urls.


I noticed that, good one. You may want to address that "Extensionless" issue on your site? < Hint.

enigma1

6:10 pm on Dec 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually you're only posting what you find convenient. How did you miss this from the same reference?
it’s likely that someone intended to link to you and simply made a typo. Instead of returning a 404, you could 301 redirect the misspelled URL to the correct URL and capture the intended traffic from that link.

Is right there, and I mentioned earlier a redirect is not only 301. I am not talking about 302s.

I launched a test page yesterday and you've probably already started to see referrers from it

To achieve something first you need to drive a spider to follow these links and there will be no referrer. Accessing them with a browser or tool has no effect, if there is a flaw you need to make it public in some way.

I'll get a Sticky Mail asking me to take down the test pages

These requests are like a walk in the park so don't count on it. You don't want to see my logs and what others try to do.

I understand about bandwidth but I don't get the resources

Didn't you guys mention so many times about a custom 404? What do you mean a custom 404. The apache default 404 is not custom. It's a default. When I refer to a custom 404 with a search, contact link, navigation, etc. I am talking of a page similar to the other site's pages.

In which case having a custom 404 handling would require:
1. Pass control to the web application
2. Connect to the database, parse db content, parse request, load the template with error message, to give some hints to the user.
3. display html. (navigation, content etc).

That's a custom 404, it will consume as much as a regular page. How do you compare this vs the header bytes of the 301? At least that's what I understand if someone asks what's a custom 404 page.
because the 404 page is vastly smaller than the index page

If you only output some static/fixed text or html you cannot help the user locate what he's looking for. If he's a real user.

If enigma1 is so keen to send his visitors to the home page when an invalid URL is requested, he should render the home page in his script as usual and send the home page back, but change the headers response being 404.

Yep that's what I am saying about b/w waste with the 404.

Many years ago I was using custom error pages extensively (not only for 404s). I checked the logs, I checked the stats, I checked what spiders were doing and honestly I didn't like what I saw.

A 404 being redirected into 200 OK was, and still *IS*, a soft-404.

Yes but where do you see the 404, since the header I'll sent out shows 301? There is no 404 header.

pageoneresults

6:21 pm on Dec 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes but where do you see the 404, since the header I'll send out shows 301? There is no 404 header.


You are correct. You perform a 301>200 to the home page for invalid queries. You do capture as many of those as you can based on existing URI strings and 301>200 to the appropriate document. But, when it comes to non-existent docs, you 301>200 all of those to your home page. Google considers those a Soft 404. That is what they are saying with this statement...

Q: Tell me more about “Soft 404s.”
A: A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn’t exist. A common example is when a site owner wants to return a pretty 404 page with helpful information for his users, and thinks that in order to serve content to users he has to return a 200 response code. Not so! You can return a 404 response code while serving whatever content you want. Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.


Another example of a Soft 404 is when a site redirects any unknown URLs to their homepage instead of returning 404s.


I separated out the part of the reference that is specific to this 301>200 process for all invalid unmapped requests. Google considers those Soft 404s even though your header responses return 301>200.

incrediBILL

6:29 pm on Dec 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes but where do you see the 404, since the header I'll sent out shows 301? There is no 404 header.

Fine.

That's a quibble and you know it as the page would normally generate a 404.

Let's call it cloaking then.

Per Google's webmaster tools help:
Serving different content to search engines than to users.

Google shows one thing in their index, you deliver something else.

They really don't like that behavior even more than they don't like soft-404s.

If it were me, I'd call it a soft-404 before I admit to cloaking ;)

enigma1

6:45 pm on Dec 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's call it cloaking then.

Nope, both spiders and users will get exactly same server response for the same request. Both will see exactly the same 301 headers. So where's the cloaking?

Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s

Yes I've seen that first time you posted, but redirects aren't only via 301. Many users make the mistake of not even serving a redirect header. They just output a location and that's a problem.

The argument you have against me is I cannot redirect on every request. But my point is it's possible knowing the web-content, where the users should be pointed to, in all cases.

You know full well web becomes more and more dynamic and users becoming more demanding, they expect to find relevant results even if a request is old, a product discontinued or an article removed, show them something similar with an appropriate message. I am not saying it's easy, but to a certain degree doable.

g1smd

8:01 pm on Dec 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



but redirects aren't only via 301.

There's also 302 and 307 responses in the HTTP headers, javascript location redirects and the good old meta refresh. Care to elaborate?

incrediBILL

9:58 pm on Dec 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nope, both spiders and users will get exactly same server response for the same request. Both will see exactly the same 301 headers. So where's the cloaking?


THE CONTENT IS DIFFERENT! THAT'S CLOAKING!

When what you see in the Google index for your search results isn't what the site delivers, its CLOAKING! Your site redirects them to something other than what Google says is on the actual page, it's bait & switch.

netmeg

10:04 pm on Dec 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



this post is the beast that would not die

The Shower Scene

10:46 pm on Dec 12, 2011 (gmt 0)

10+ Year Member



enigma1, admit you are wrong and move on. Your inability to see the truth in this matter is staggering, despite reams of links and citations of Google. Your continued obstinance is a drag on this discussion and turns you into a freak show.

lucy24

12:41 am on Dec 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For comparison purposes:

Government:
[fcc.gov...]
[ssa.gov...]
[nyc.gov...]
[af.mil...]

Education:
[harvard.edu...]
[cornell.edu...]

Nonprofit:
[aspca.org...]
[amnestyusa.org...]
[liveunited.org...]
(redirected from unitedway.org)

dot com:
[microsoft.com...]
(don't blink or you'll miss it!)
[apple.com...]
[amazon.com...]
[google.com...]
(c'mon, youse two, you're not even trying)
[nationalgeographic.com...]

Unclassifiable:
[webmasterworld.com...]

enigma1

8:55 pm on Dec 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When what you see in the Google index for your search results isn't what the site delivers, its CLOAKING! Your site redirects them to something other than what Google says is on the actual page, it's bait & switch.

Where do you see the cloaking? The 301 headers are identical for both humans and bots. The destination url is the same for both. The content of the destination is the same for both.
I don't see in the google index the invalid links. How can you see that, google won't index pages with 301 headers.

Your inability to see the truth in this matter is staggering, despite reams of links and citations of Google.

Thank you for the fine in-depth analysis and technical contribution to the thread, but where are your references? I see none.

For comparison purposes:

Yes these are custom 404 pages, so what are the results of your comparisons? Do you see -the page content itself- as a couple of hundred bytes? I see it goes to many KBs and varies 20K, 30K, 50K for each 404 are you comparing this with the few bytes of a 301 redirect? And these sites in order to output what you see, they connect to databases pull-in the common stuff (see my earlier post) in order to process the request before deciding to sent out the 404.

Some of these sites have their headers completely messed up.
[cornell.edu...]
that's what a soft 404 is.

There's also 302 and 307 responses in the HTTP headers, javascript location redirects and the good old meta refresh. Care to elaborate?

Yes and also the canonical tags that force redirects for spiders. Do you mean the ones spiders will process, or only browsers or both?

The 302 is the default one. If you don't specify the redirect type 302 is assumed. It's also assumed if the location header is sent and no redirect type is specified (for example if you output 200 OK with location underneath it is a 302.) Many people omit to specify the redirect type with lots of consequences (or benefits depending what you do). For browsers any kind of redirect has the same result it redirects to the location specified. For spiders is different, as indexing, ranking and other factors are involved. When used in some ways, 302 can be used for cloaking at various levels as spiders may keep indexing the original url. The redirect hijack several years back was one way but given spiders are forced to keep 302 redirects even today in their index can do lots of things although unreliable.

307 is similar to 302 another temporary redirect type but uses http 1.1 protocol. This can be confusing whether or not to output a 307, although more specific than 302, many requests fake the protocol.

Meta-refresh for spiders may also be unreliable. They claim a 0 secs refresh is equal to 301 for some spiders but I don't know the consequences. In my view 301 is at the moment the clean way to do redirects for seo purposes, but you may not be able to use it always, so a meta-refresh or canonical or js could replace it to a certain extend.
[youtube.com...]

enigma1

11:46 am on Feb 3, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@pageonresults, did you finish with your tests?

I launched a test page yesterday and you've probably already started to see referrers from it


I don't see anything different several weeks later. I log in the requests and monitor the SERPs so nothing there. Any news from your end?
This 110 message thread spans 4 pages: 110