homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

This 110 message thread spans 4 pages: 110 ( [1] 2 3 4 > >     
Redirecting all 404s to home page - good or bad?
brix76




msg:4391736
 6:34 pm on Nov 28, 2011 (gmt 0)

Hi all,

I have been wondering if redirecting all 404s directly to my homepage is good?
Or having a custom 404 page is better.
Some say, redirecting everything to my homepage could make the robots think I have a lot of duplicate content and rank me lower.
Other say it is indeed search engine friendly - less 404s, better ranking.

What do you guys think - redirect all to homepage or nicely done custom 404 page?

Thanks a lot.

 

g1smd




msg:4391760
 7:39 pm on Nov 28, 2011 (gmt 0)

Extremely bad. Don't do it.

URLs that don't exist should return 404, not be redirected at least not unless there is content at the new URL that is a good replacement for that found on the original page.

lucy24




msg:4391842
 11:12 pm on Nov 28, 2011 (gmt 0)

:: cough, cough ::

I realize this is completely irrelevant and not what you asked, but do humans like it? This human can't stand it-- same for 403s if I go poking into directories where I don't belong-- but I have no idea whether there is a clear majority opinion. That's assuming your average human has even thought about it.

tangor




msg:4391855
 12:29 am on Nov 29, 2011 (gmt 0)

The nicely done custom 404 can be beneficial in providing a link to the homepage, or a localized site search, or a short directory/menu to the site which a HUMAN can use to find information ON YOUR SITE. It's better to keep 'em as long as you can instead of having them click back the serps.

If a page does not exist a 404 should be returned.

lucy24




msg:4391875
 2:21 am on Nov 29, 2011 (gmt 0)

Oops. Looking back at my post I realized it's hopelessly ambiguous, mainly due to reading the original question too fast. ("Is it A or B?" "Yes.") I meant that I can't stand the blanket redirection to the home page, whether triggered by a 404 or a 403. Give me a nice 404 instead.

And if you happen to have any 410s, make sure they get a page too, even if it's the same physical page as the 404. The Apache default 410 is even more intimidating than the default 404.

enigma1




msg:4391962
 10:14 am on Nov 29, 2011 (gmt 0)

Your site's home page has all the navigation one needs to get anywhere within your site. So a nice 404 just adds more work for you to do.

The thing to remember is the type of redirect. If you want to get rid of an old page and there is no similar page do a 301 redirect to the home page. Not just a redirect. The 301 means permanent redirect, the code tries to funnel traffic of a non-existing page to the home page. A 404 doesn't do that. It just says nothing here.

If you do have a similar page use it for the redirect so visitors can find the replacement.

I can't stand the blanket redirection to the home page

You mean the extra connection due to the redirect headers? That's transparent, it's the same domain. And if you check the server logs lots of 404s are generated because of exploits. Why wasting resources processing the database, parsing templates etc at the moment a bot won't even index the content. Having the redirect there it helps because they may not even follow it and it's only headers you sent out.

g1smd




msg:4392164
 7:45 pm on Nov 29, 2011 (gmt 0)

Mass redirection, especially to the root home page is a "signal of low technical quality" for a web site.

Return the correct 404 or 410 code and a page explaining the error and with useful navigation to other parts of the site.

lucy24




msg:4392205
 9:28 pm on Nov 29, 2011 (gmt 0)

I can't stand the blanket redirection to the home page

You mean the extra connection due to the redirect headers?

No. I mean that when I am visiting someone else's site it annoys me no end when I attempt to go to a nonexistent page-- whether by mistyping or by following a bad link-- and instead of being handed off to an "Ain't no such page" I just get bounced to the front page. If I want to go to the front page, I'll go there. I've probably been there already.

It also makes it well-nigh impossible to contact the webmaster-or-equivalent and say there's a bad link on page such-and-such. By the time you've been redirected to the home page, it's too much trouble to go back to those two other pages: the one that doesn't exist, and the one that linked there.

Then again, last time I tried a webmaster@ e-mail I got hit with a 5.7.1, which struck me as a fairly ridiculous response. Especially when the Contact form-- which clearly wasn't intended for technical errors-- led only to

The requested URL /cgi-bin/formmail was not found on this server.

Additionally, a 302 Found error was encountered while trying to use an ErrorDocument to handle the request.
Apache/1.3.37


Yes, you read that last number right ;) Fortunately I've got another e-mail address I can use.

enigma1




msg:4392220
 10:02 pm on Nov 29, 2011 (gmt 0)

what you mean mass redirection is a signal of low technical quality? I am not sure I follow you very well. In my case I will try and redirect the user to an associated page if nothing is similar to his request, I redirect him to the home page. There is no 404.

In your case you bring up a 404, you try to guess what the visitor wants based on his request and display some options which may be completely irrelevant, where in my case I put him on the home page and if there is a cookie or a session present, I can append extra info about what just happened.

But with the nice 404 page you recommend you will waste resources connecting to the database, processing images, scripts and so forth on all hack attempts. With 301 you only emit headers.

whether by mistyping or by following a bad link

Mistyping can be handled by 301 redirects, if your code decodes seo links of a certain format, you reformat the request and do another 301 redirect so the seo decoder can handle the request if required.

If it's a bad link on the other hand it means bad site management. Internal site links should not be hard-coded but generated dynamically to avoid this kind of problem (template link placeholders etc can avoid it). Many navigation problems are caused because of hard-coded links in the content. So you remove a page at some point, now you have to go around on every other page checking and fixing links manually, not a good idea.

g1smd




msg:4392222
 10:12 pm on Nov 29, 2011 (gmt 0)

Read up on the negative effects of infinite URL space within a site: where any and every URL returns either content or a redirect and especially where nothing returns 404.

Google does not trust such sites. It is one signal of low technical quality.

lucy24




msg:4392256
 11:23 pm on Nov 29, 2011 (gmt 0)

Does that mean that mod_speling is a bad idea? (Looking only from the SEO rather than the security side.) Or is it OK so long as it uses redirect rather than rewrite?

I don't have it-- or at least I didn't in April when I last checked. But even if you only allow one error per request, that's an awful lot of possibilities.

enigma1




msg:4392260
 11:40 pm on Nov 29, 2011 (gmt 0)

Last I checked the GWT it will display an error if your site has links that lead to 404s. Not a good sign. When you have no errors (and 404 is basically an error and bad for SEO) it's better. I don't know from where you gather this information.

301 Redirects will occur if one of my page is no longer there and the equivalent or relevant content is located elsewhere. That's not infinite. Otherwise browsers will halt on a single redirect you know.

g1smd




msg:4392261
 11:43 pm on Nov 29, 2011 (gmt 0)

So, if I randomly type
example.com/foo-quux for your site, it will return 404 HTTP status for some values of foo-quux or not?
enigma1




msg:4392271
 11:57 pm on Nov 29, 2011 (gmt 0)

No it won't it will return 301 redirect. And the destination depends on the query.

g1smd




msg:4392278
 12:39 am on Nov 30, 2011 (gmt 0)

You do have infinite URL space and the sooner you recognise that is a problem, the better.

lucy24




msg:4392283
 1:00 am on Nov 30, 2011 (gmt 0)

And the destination depends on the query.

You've got a pre-selected destination file for foo-quux ?!

I don't think anyone is saying that redirecting nonexistent files is always wrong. But there has to be some reason for it. I've got one directory where, thanks to a horrendous blunder on my part, the files and subdirectories all have the same names: cats/ and cats.html, rats/ and rats.html and so on by pairs. It seemed safest to put in a generic rule, so requests for bare "cats" are redirected to "cats.html". (The subdirectories don't contain index files, either by name or by function.) But that's only for names that actually exist; if someone asks for "foobzzt" in that same directory, they get the directory-specific 404 page.

pageoneresults




msg:4392291
 1:22 am on Nov 30, 2011 (gmt 0)

Welcome to WebmasterWorld brix76!

I believe Google refer to these default 301s to the home page for "everything" as...

Soft 404 Errors
[google.com...]

Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Firstly, it tells search engines that there’s a real page at that URL.


Related discussion from Jun 6, 2010...

Google Displaying 'Soft 404' Errors in Webmaster Tools
[webmasterworld.com...]

What do you guys think - redirect all to homepage or nicely done custom 404 page?


Definitely serve a nicely done custom 404 for documents not found. You may want to go one step further and serve a custom 410 for documents that are gone forever.

enigma1




msg:4392297
 1:45 am on Nov 30, 2011 (gmt 0)

You've got a pre-selected destination file for foo-quux ?!

Literally foo-quux will redirect to the home page. I don't see that as infinite neither irrelevant.

Definitely redirect to a nicely done custom 404 page.

Double trouble. Not only you will give out a 404 page but you will do a redirect first.

the sooner you recognise that is a problem, the better.

I'd like to see where is the infinite url space. None of these links exist. You can make invalid requests to the server I don't see why google will ever report it as an error if the foo-quux doesn't exist anywhere in the domain. I also don't see why should I waste resources serving a nice 404 page.

pageoneresults




msg:4392302
 2:08 am on Nov 30, 2011 (gmt 0)

Definitely redirect to a nicely done custom 404 page.


Double trouble. Not only you will give out a 404 page but you will do a redirect first.


Ummm, I fixed that while you were drafting your reply. I was paraphrasing and caught the "redirect" after saving the first time. ;)

The process you describe of using a 301 for non existent content is specifically addressed in Google's Webmaster Tools Help.

Soft 404 Errors
[Google.com...]

Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic.

MichaelBluejay




msg:4392352
 7:40 am on Nov 30, 2011 (gmt 0)

Hey! You can redirect *and* give a 404 error! That's what I do. Just put this in the .htaccess file:
ErrorDocument 404 /site-index.html

That takes visitors to my Site Index where they can find what they want. I think that's more helpful than giving them a generic page that just tells them, "Page not found, nanny-nanny-boo boo, and I'm not gonna do anything to help you find what you were looking for."

g1smd




msg:4392359
 8:19 am on Nov 30, 2011 (gmt 0)

PageOne used the word "redirect" in error and removed it from his post within minutes.

The 404 HTTP status code should be served at the originally requested URL.

enigma1




msg:4392361
 8:23 am on Nov 30, 2011 (gmt 0)

The soft 404 is a combination of a 302 redirect ending in a 404 page. It's not what I described. The document talks about bad practices where your server emits a link to a non-existing page. That's a problem and needs to be fixed either in the content or web application/code.

I am talking about a site that exposes no problematic links. And what the OP asked was quite general. The argument was if someone requests a non-existing page (one that found outside your domain or one that was created for the purpose to hijack the server, or one that was mistyped or one that existed in the past but is no longer present etc) how the server should best respond.

And my point is you do a 301 in those cases, it's not bad practice and has advantages over the 404. You are not going to see errors in the GWT either.

brix76




msg:4392388
 9:55 am on Nov 30, 2011 (gmt 0)

Wow, I seem to have started quite a discussion. :)

I'm quite new to this, but I do get most of the things everyone has said so far.
Let me clarify my situation. Basically, my company is a magazine.
So, all the 404s that GWT finds are for articles or news stories that have been deleted.
Is it better to do 410 instead of 404 for those? And how is that set up?
And is it possible to have it set up so they land on a 410 and then in a few seconds get redirected to my homepage?
Thanks.

lucy24




msg:4392398
 10:36 am on Nov 30, 2011 (gmt 0)

If a page used to exist and is now gone, and there isn't a replacement, then a 410 is unequivocally the correct response.

Unlike 404 and 403, a 410 does not happen automatically. You have to put it in your htaccess or config file using mod_alias or mod_rewrite, or equivalent if your setup doesn't speak Apache. This, in turn, means you have to keep track of the deleted files in some way. Either feed in the exact names, or cheat by saying that anything containing the string /2009/ or whatever is now gone. (Even if it never existed in the first place. Which is where we came in ;)) Mechanics will depend on how your site is currently coded and how big it is.

What human visitors physically see when they hit a 410 is entirely up to you. As noted above, it's done with the ErrorDocument directive. You can have a single one for the entire site, or different ones for the separate directories if you think that's appropriate. If you have a separate section for archives, that's the kind of thing you would prominently link from a 410.

Apache has a default 410 but it's quite scary and intimidating, even more so than the default 404. So make sure you've got a custom document in place before you start returning 410s. You might decide to use the same physical ErrorDocument for both 404 and 410. Again that depends on your site.

enigma1




msg:4392419
 11:31 am on Nov 30, 2011 (gmt 0)

Page that no longer exists. 301 redirect to the most relevant page. In your case latest articles, sitemap, home page are good candidates.

Also do not use timer auto-refresh. And best to use some code in your application and do the redirect where you have greater flexibility.

pageoneresults




msg:4392428
 11:59 am on Nov 30, 2011 (gmt 0)

The soft 404 is a combination of a 302 redirect ending in a 404 page.


According to Google and the topic I referenced, a Soft 404 is also registered when redirecting (in mass/by default) what should be a 404 to one location, in this case, the home page. I mean, it says it right there in the guidelines?

Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic.


I wonder what would happen if I set up a page for indexing that contained 1,000s of non-existent destinations to your site. I could probably get rather creative and use specific path names, anchor text, etc. What do you think would happen to your site after Google got a hold of all the links I threw out there? Do you think I could have an influence on anything knowing that you 301 all invalid requests to your home page?

Don't use a 301 when you should be serving a 404 or 410 - period. I'm sure those reading along will hopefully know better after reading this topic.

And yes, you want to 301 whatever you can if there is a logical replacement for something that was removed/replaced, or moved.

I'd sure hate to manage a site that was void of 404s due to a 301 in its place. I see all sorts of challenges in that type of environment.

enigma1




msg:4392441
 12:27 pm on Nov 30, 2011 (gmt 0)

I wonder what would happen if I set up a page for indexing that contained 1,000s of non-existent destinations to your site.

On my site? nothing will happen, you see none of these 1000 links you created exists on my site. So google or any spider won't give any errors whatsoever (and that's in the their documents too).

Will they follow your links and try them on my domain? Yes they will and they will get a permanent redirect to the relevant pages. If the link is totally irrelevant that will go to the home page. There will be no penalty.

Now lets say you repeat the same experiment with another site. You could be successful actually and propagate the links within the site. If the code doesn't handle them properly and I have lots of examples with various "popular" applications which are basically vulnerable to this type of injection. Search pages, listing page splitters, get forms, to name a few, have this potential.

And yes in these cases you can artificially create lots of non-existing pages google will index because it finds them within the domain, give errors, detriment the domain ranking etc. Are you using such an application? If so the first and most important step is to fix the code.

you want to 301 whatever you can if there is a logical replacement for something that was removed/replaced, or moved.

That's precisely my point. The website exposes specific content so as the owner or webmaster you expect every single request to your website to be related with the content you expose isn't it?

Lets take it a step further, the thing here is not the visual output which can be a 404 to the home page or sitemap or a custom page because with a 301 redirect can be exactly the same output, but the HTTP header. So assume the output is exactly the same in both cases. The visitor is confronted with the same layout whether is a 404 or 301.

Which one is better? For the human visitor is exactly the same right? He's not going to read the HTTP header. Now for the spiders the 404 tell them drop everything about this page including past incoming traffic. With 301 you tell them drop the page and send the traffic to another page.

phranque




msg:4392450
 1:00 pm on Nov 30, 2011 (gmt 0)

when google verifies your site for webmaster tools it looks for a file you are supposed to upload with a known name and expected content.
(and yes i know there are other verification options.)

google also requests a file named in the following format:
http://example.com/google[longrandomalphanumericstring].notfound
google REALLY wants to see 404 Not Found for this type of request.

enigma1




msg:4392456
 1:37 pm on Nov 30, 2011 (gmt 0)

google also requests a file named in the following format:
http://example.com/google[longrandomalphanumericstring].notfound
google REALLY wants to see 404 Not Found for this type of request.


Where do you get this information? There is no reference in the sitemap in the GWT of a couple of sites I just checked. There are also no access attempts to generate a 404 on domains where I submit sitemaps to google via the server logs. And no errors regarding it in the GWT.

And finally why google would want my site to be problematic? I am trying to have the site pages returning valid content not errors.

astupidname




msg:4392458
 1:47 pm on Nov 30, 2011 (gmt 0)

Nine nine... nine.

I've got one directory where, thanks to a horrendous blunder on my part, the files and subdirectories all have the same names: cats/ and cats.html, rats/ and rats.html and so on by pairs.

Directory obviously named arc/ within another directory named flood/
:D

This 110 message thread spans 4 pages: 110 ( [1] 2 3 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved