Forum Moderators: DixonJones

Message Too Old, No Replies

Custom 404 Getting Hammered

Can't find the source

         

pcarlow

5:39 pm on Jan 18, 2005 (gmt 0)

10+ Year Member



I have a custom 404 page which i setup using the ErrorDocument 404 command in htaccess.

I am getting several thousand hits a month to this page and can't figure out what's causing it.

I put a form on the page for people to email me if they see it and tell me how they got there but in 3 months only one person has emailed me.

I don't think I'm really getting that many 404's displayed, could it be a broken image or something? How can I find the source?

CaseyRyan

6:03 pm on Jan 18, 2005 (gmt 0)

10+ Year Member



If you have access to your server logs, the logs will reference a 404 error and tell what the page it is that the user is trying to access when they get the error.

-=casey=-

pcarlow

6:28 pm on Jan 18, 2005 (gmt 0)

10+ Year Member



Yes, I have access to the log files but I'm not sure how to extract that information.

I'm using AWStats to see that my custom 404 page was the most popular page on my site last year.

cgrantski

7:05 pm on Jan 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looking in the log directly is pretty easy to do. Open the log file with a text editor. A log is just a text flat file, one "hit" or request per line, with fields delimited by spaces. Look at the top few lines in the file. If there are several unique-looking lines preceded by #'s, you probably have IIS logs, which makes things easier because one of those # lines will contain field headers. Figure out which field in each line contains the status code which is probably headed "sc-status" - this field will have values like 200, 302, 404. With the text editor just do a "find" for " 404 " (blanks before and after) and look at each line it finds. (a few lines will probably have " 404 " in other fields, so check) For those " 404 " lines, look at what appears elsewhere in the line, probably earlier in the line, right after something like "GET". That's the file that got requested and returned a 404.

If you don't have IIS logs you won't have the helpful headers for fields. Just examine a few lines and figure out which field only or mostly contains values like 200, 302, 404.

Example of one line in a log:

2004-09-02 16:05:22 123.45.67.890 GET /images/header_r1.gif - 80 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.2;+.NET+CLR+1.1.4322) 404 0 0

See the "404" near the end of the line? That's the status code field. This line tells you that the image "/images/header_r1.gif" returned a 404.

Hope this helps.

pendanticist

7:13 pm on Jan 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



While Xenu is a great tool for checking broken links, it does not determine how those 404s took place.

Like, if someone types in the wrong URL, or as has been the case with my site in the past, where all files are mixed case and the request was for all lower case. A lower case request on my site renders a 404 as would all upper case.

Since you said you do have access to those log files, simply download them into your favorite text editor and start looking around. You will be amazed at the varying types of requests that render 404 error codes, which are not directly broken link oriented.

Things like, do you set up 301 re-directs when you've killed an old file? If not, those orphan files are still out there being requested from time-to-time as with other older listings, which may contain your link(s).

As an aside. If you use Xenu on any other site, be advised that Xenu is heavily banned for not requesting robots.txt. When I used it on my domain, I found it to be an excellent tool. Especially the site map function.

Added the 'l' to form the word tool.

[edited by: pendanticist at 8:05 pm (utc) on Jan. 18, 2005]

PatrickDeese

7:38 pm on Jan 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



While Xenu is a great tool for checking broken links, it does not determine how those 404s took place.

Actually, if you right click on the Xenu result (before it generates the report) - it will tell you the page(s) that have a link to the missing page - so I think that it would be perfect for his needs.

pcarlow

8:30 pm on Jan 18, 2005 (gmt 0)

10+ Year Member



cgrantski, thank you very much for the guidance. Exactly what i needed to fix the problem!

While the log files didn't tell me exactly what was causing the error, it showed me the pages that were causing the 404 to be called and I was able to track it down.

I was correct that it was a broken image. I'm using a template on hundreds of pages and the image was being called everytime the page was loaded in the background tag! The image did not exist.

I haved used xenu quite a bit and while it does have it's uses it wouldn't have done me any good in this case. Thanks again for all your help!