Custom 404 page is index.htm, good or bad?

Forum Moderators: open

Message Too Old, No Replies

Custom 404 page is index.htm, good or bad?

Is this a bad idea? will googlebot frown on it?

MThiessen

5:22 am on Aug 18, 2002 (gmt 0)

I put these in my .htaccess
ErrorDocument 404 /index.htm
ErrorDocument 403 /index.htm

Sometimes people just seem to call the page wrong, in the logs I see requests for blah.htm.htm

Why the extra .htm sometimes I have no idea, there is nothing in the source code that asks for these.

To keep from loosing visitors who get the occasional 404 I simply have it default to the home page.

Will this have any effect on search engine placement? I don't think it will unless you have a bad link maybe and throw googlebot in a loop?

Whats your thoughts on this? Is it a good idea? Or could it cost you?

netcommr

5:27 am on Aug 18, 2002 (gmt 0)

I have had good results and not seen any negatives from the crawlers.

Though, I do prefer to use an actual 404 page with a quick redirect to the index or site map. I think this would come across as a little more professional.

WebGuerrilla

5:31 am on Aug 18, 2002 (gmt 0)

As long as you use relative URLS you will be fine. On Apache, if you make the mistake of using the complete URL, like:

ErrorDocument [example.com...]

The server will not return a 404 header. As long as you return a proper 404, you shouldn't have any problems with any spiders.

jdMorgan

1:42 am on Aug 19, 2002 (gmt 0)

I also prefer a "real" custom 404 page, for reasons outlined in this thread [webmasterworld.com].

After allowing enough time for the user to read the 404 page, I provide a meta-refresh redirect
to the index page (or another more appropriate page). A plain text link to the index page is also
provided, so the user doesn't have to wait for the meta-refresh to kick in.

A lot of these malformed URLs are due to incomplete cut-and-pasting into the brower's address bar,
while others may be typographical errors on sites which link to yours. If there is one variant
that you see re-occurring often, it's worth a search to see if you can find out where (what site)
the bad link is on.

For more info on WebGuerrilla's cautionary note, see the Apache Documentation [httpd.apache.org].

Jim

ikbenhet1

10:39 am on Aug 19, 2002 (gmt 0)

i have supplement question on this,

say you have a 404 to index.html
then you put a link on your site like:
yoursite.com/notanexistingfile

google comes and sees this and indexes:
yoursite.com/notanexistingfile, but gets the info in index.html

will google think this is a page of it's own (/notanexistingfile)?

or will google know that it's actually the /index file and not list it?

KakenBetaal

2:29 pm on Aug 19, 2002 (gmt 0)

ikbenhet1, GoogleBot should get the 404 response in the headers and know it's getting an error document.

c3oc3o

2:48 pm on Aug 19, 2002 (gmt 0)

I believe using index.htm as the error page is a bad idea because it will confuse users. They'll be expecting another page, and if you just throw them back to the front page, many won't even notice it's not the desired page and get frustrated when they don't find the expected information there.
I believe the ideal solution would be the front page with a short but immediately visible message saying the requested file something.html couldn't be found, and that you are showing them the front page instead. This could be inserted with a script language (even client-side, JavaScript), calling the index page with a query string as 404 document.

richlowe

2:58 pm on Aug 19, 2002 (gmt 0)

Personally, I prefer having a unique error page. I get very annoyed when I put in a URL and just get an index page because then I have no clue what happened. I usually leave the site at that point.

Why? I asked for page blahblah and I got index. What happened to blahblah? Why am I on a page that I didn't ask for?

An index page gives the visitor (ideally) some data - he typed the wrong URL or the search engine indexed a page that no longer exists or whatever.

Richard Lowe

MThiessen

3:41 pm on Aug 19, 2002 (gmt 0)

Actually looking through my error logs it is common to see links like:

[blahblah.com...]

And other entries like

[blahblah.com...]

Since the main page has a link to everywhere, including blah.htm, I like the idea of 404 being the index.htm page. Why?

What do you do when you get a 404? You hit back 99% of the time if you are like most folks.

The very slight confusion of seeing the index.htm page instead of blah.htm?

The knee-jerk instant conditioned response of hitting the back-button the second you see a 404?

I would choose the first choice. The ideas of a custom 404 page is also good as long as it does not really say 404 file not found. Maybe:

Opps, did you mean to come to one of these pages?

(list of most common urls in site, clickable)

However this leaves them feeling that what they are looking for was intentionally removed, since this has order. However, going back to the index.htm page when the wrongly worded blah.htm is clicked leaves hope it was a glitch and the page is still there.

Remember, we always validate our sites to make sure we have no dead links. So this should be rare.

The main reason for me to start this topic is because I was unsure how googlebot would behave if it found it. The answer about realtive links seems correct, as it still does return a header of 404 even though it shows the index.htm page, so this potentual loop should be avoided.

Interesting to note too, is that if you have "show friendly error messages" in IE6 you won't see the index page, but rather an IE6 generated 404 page...

Mike

gsx

5:04 pm on Aug 19, 2002 (gmt 0)

I have a custom page for 404 errors. The 404 can occur when using a site if you have accidently renamed/deleted or are temporarily modifying your site while a visitor is trying to view the same page. To be thrown back to the index page when that is the case is not a good idea.

For 403, I am using index.html as the file reference. Research has shown me that 403 only ever occurs when a users clicks on a link in a search engine that has a restricted filetype by my host (.cgi, .pl etc). In this case, it it the first page that the visitor will see and the home page is not a bad place to start (possibly better places, but I would need to refer to what they had searched for on the engine and then find them the correct page...all a bit too complicated for the small amount of 403's I get).

WebGuerrilla

9:28 pm on Aug 19, 2002 (gmt 0)

I like to use a combination of of the suggestions in this thread. The custom 404 is a copy of the home page with the addition of box containing a meesage that tells them that we were unable to locate the page that was requested, so we have sent you to our home page.

ikbenhet1

11:34 pm on Aug 19, 2002 (gmt 0)

i would like to ad here, that my 404 error file actually has pr1.

(not index.html)

MThiessen

12:04 am on Aug 20, 2002 (gmt 0)

ikbenhet1

Are you using a relative path or an [www....] format?

a relative path /my404file.html will send a header 404 but to call it by external path [blah.com...] will send a different signal. Don't know if google will ingore it or not, maybe so. Safer to us the relafive path imo.

Mike

ikbenhet1

12:20 am on Aug 20, 2002 (gmt 0)

i don't know.

i had very much trouble with .htaccess it just didn't work,
so they(my webhosting company) made this 404 on the server for me, so i don't how they done that.