Forum Moderators: DixonJones
1>users type in a wrong URL(this is not the issue of website)
2>a referrer provide wrong URL(the file is not exist at all)
3>a referrer provide our of data URL(the file used to be there, but renamed or moved)
4>URL is right, the file is missed
...
Actually only the fourth case is really the problem due to the website, or case 3 also can be a problem of the website.
Now my problem is how can I identify those different cases from the log file. Or if there is this kind of tools available?
Thanks!
to checkout your site (no external links to non-existant files) for missing files, you need a link-checker. i don't know which kind of operation system you use, but many tools are available. these kind of tools scan your website for missing files/links and report to you which page links to which non-existing file. such a tool can be very usefull.
in your list, that would be point 4 and 3. and that's all you can do in short.
another way is to analyze your logfiles and check the referring pages for this wrong link. you can then write the webmaster/webmistress of that page an email and notify them about the wrong link.
some files are requested from time to time you can place on your webserver, too. for example a favicon.ico or the robots.txt file. these files are requested automatically and will produce a 404 error-log-entry if they do not exist. just place these files into the website root.
in your next analysis of your logs you'll see a reduce of 404 error entries then and you can continue to refine the hunt ;)
Furtunetely, I have already picked out the errors caused by favicon, and robot.txt.
The problem is I'm not using a live log file, and I'm not supposed to contact with webmaster. All I have is the log file of years ago, and need to do the analysis automatically.
Currently I just count all the 404 errors(exclude favicon.ico and robot.txt) without discrimination.
Hope to find some way to improve the work.