Forum Moderators: open
Sometimes people just seem to call the page wrong, in the logs I see requests for blah.htm.htm
Why the extra .htm sometimes I have no idea, there is nothing in the source code that asks for these.
To keep from loosing visitors who get the occasional 404 I simply have it default to the home page.
Will this have any effect on search engine placement? I don't think it will unless you have a bad link maybe and throw googlebot in a loop?
Whats your thoughts on this? Is it a good idea? Or could it cost you?
ErrorDocument [example.com...]
The server will not return a 404 header. As long as you return a proper 404, you shouldn't have any problems with any spiders.
After allowing enough time for the user to read the 404 page, I provide a meta-refresh redirect
to the index page (or another more appropriate page). A plain text link to the index page is also
provided, so the user doesn't have to wait for the meta-refresh to kick in.
A lot of these malformed URLs are due to incomplete cut-and-pasting into the brower's address bar,
while others may be typographical errors on sites which link to yours. If there is one variant
that you see re-occurring often, it's worth a search to see if you can find out where (what site)
the bad link is on.
For more info on WebGuerrilla's cautionary note, see the Apache Documentation [httpd.apache.org].
Jim
say you have a 404 to index.html
then you put a link on your site like:
yoursite.com/notanexistingfile
google comes and sees this and indexes:
yoursite.com/notanexistingfile, but gets the info in index.html
will google think this is a page of it's own (/notanexistingfile)?
or will google know that it's actually the /index file and not list it?
Why? I asked for page blahblah and I got index. What happened to blahblah? Why am I on a page that I didn't ask for?
An index page gives the visitor (ideally) some data - he typed the wrong URL or the search engine indexed a page that no longer exists or whatever.
Richard Lowe
[blahblah.com...]
And other entries like
[blahblah.com...]
Since the main page has a link to everywhere, including blah.htm, I like the idea of 404 being the index.htm page. Why?
What do you do when you get a 404? You hit back 99% of the time if you are like most folks.
The very slight confusion of seeing the index.htm page instead of blah.htm?
OR
The knee-jerk instant conditioned response of hitting the back-button the second you see a 404?
I would choose the first choice. The ideas of a custom 404 page is also good as long as it does not really say 404 file not found. Maybe:
Opps, did you mean to come to one of these pages?
(list of most common urls in site, clickable)
However this leaves them feeling that what they are looking for was intentionally removed, since this has order. However, going back to the index.htm page when the wrongly worded blah.htm is clicked leaves hope it was a glitch and the page is still there.
Remember, we always validate our sites to make sure we have no dead links. So this should be rare.
The main reason for me to start this topic is because I was unsure how googlebot would behave if it found it. The answer about realtive links seems correct, as it still does return a header of 404 even though it shows the index.htm page, so this potentual loop should be avoided.
Interesting to note too, is that if you have "show friendly error messages" in IE6 you won't see the index page, but rather an IE6 generated 404 page...
Mike
For 403, I am using index.html as the file reference. Research has shown me that 403 only ever occurs when a users clicks on a link in a search engine that has a restricted filetype by my host (.cgi, .pl etc). In this case, it it the first page that the visitor will see and the home page is not a bad place to start (possibly better places, but I would need to refer to what they had searched for on the engine and then find them the correct page...all a bit too complicated for the small amount of 403's I get).