Forum Moderators: DixonJones

Message Too Old, No Replies

404 in logs

What do I do?

         

DerekH

8:07 pm on Jan 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Knock Knock

Hello Forum 39, I'm DerekH - I tend to live in Forum3, where you can see that I've posted quite a lot.

I've just recently put up a new site with a new ISP.
It sits on Unix/Apache and offers me Webalizer stats.

I put up a site just before Christmas, using Dreamweaver, and I'm totally sure the site has no broken links.

I put up a custom 404 page that gives visitors a way back into the site index.

I can see from the Webalizer stats that the 404 page is being served up about 20 times a day.
I can see from the webalizer.current file that the references seem to be coming from outside my site.

And there I'm drawing a blank...
The referring IP addresses seem to be IP addresses, not URLs, and I can't seem to see WHAT they are trying to access.

Can someone help me find out a little bit more about *why* my 404 page is being served?

Thanks!
DerekH

storevalley

10:04 pm on Jan 8, 2005 (gmt 0)

10+ Year Member



Have you got access to raw logs, DerekH?

You should be able to see what is being requested by scanning them.

ncw164x

10:37 pm on Jan 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It could be a spider what is finding an error with a hyperlink or it is a broken link, try checking your site using xenu

DerekH

11:04 pm on Jan 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you got access to raw logs, DerekH?

no, only the Webalizer logs - I'm on shared virtual hosting for this site.
They're tantalisingly close to telling me what I need, but not quite!
DerekH

DerekH

11:06 pm on Jan 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



try checking your site using xenu

The author seems scared of Apple Macs <smile>

Just checked the site in Dreamweaver again, and it's 100% watertight...

Since it's a new site, I can't imagine where erroneous URLs are coming from...

Thanks for the help - if I can't do it without raw logs, then you've answered my questions!
DerekH

Sanenet

11:15 pm on Jan 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Log the referrer in the 404 page.

storevalley

11:16 pm on Jan 8, 2005 (gmt 0)

10+ Year Member



I can't imagine where erroneous URLs are coming from

I quote often see requests for pages that have never existed in client site logs. 404s can also be caused by...
  • Missing robots.txt files
  • worm attacks
  • Design tricks (the old TD backgound image = "null" trick to stop tiling of table backgrounds in NN4)
If you have checked your site for broken links, I wouldn't worry about it too much :)

cgrantski

11:59 pm on Jan 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



also favicon.ico is going to be a 404 unless you have one.

Does your 404 page respond to any 404 of any kind? or just pages?

What would happen if you removed the 404 page for awhile? Does Webalyzer individually list all the 404 files?

DerekH

9:30 am on Jan 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



sanenet wrote
Log the referrer in the 404 page

Now that sounds the proper solution, but I don't have the foggiest how to do it. I'm quite happy to roll my sleeves up and do what needs doing, but how might I do this? I don't have any databases on the server that I can use to store things. Or have I overlooked a simple way to do this? As I said, I have access to htaccess, and also PHP and PERL, but not to the raw logs.

Sorry to look so helpless <grin>
DerekH

storevalley

10:35 am on Jan 9, 2005 (gmt 0)

10+ Year Member



Log the referrer in the 404 page

You could (e.g.) log $HTTP_REFERER to a text file if you use a PHP 404 page.

Just did a quick test though, and the results weren't very promising. $HTTP_REFERER just gives you the page that requested the offending URL ... not the missing item itself (which is shown on the line above the 404 page access in the raw log file).

kevinpate

12:26 pm on Jan 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> on shared virtual hosting

Have you asked your host for access to your raw logs, even a day's worth would tell you a lot.

Or, barring that, perhaps your host might add to your Cpanel, or whatever other interface the host supplies you, access to, oh say, the 300 most recent error entries.

Sanenet

12:51 pm on Jan 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In your 404 page, log the http.http_referrer and the http.query_string variables to a text file.

The http_referrer will let you know the referring page, the query_string should contain something like "404;http://yoursite.com/missingpage.htm" (under IIS, although there should be something similar under Apache).

Voila, no need to muck around trying to get the logs (although it's always useful to have them!)

DerekH

1:54 pm on Jan 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for all that - I need to go and mug up on PHP now <grin>
DerekH

cgrantski

4:11 pm on Jan 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seriously, what about just turning off your 404 page for a day or so? if you can assume the same thing is happening daily, then one day of visitors seeing a plain 404 (only 20 instances) would tell you exactly what's being requested.

Sanenet

3:06 pm on Jan 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Trouble is that DerekH doesn't have full access to his logs. It doesn't matter whether you use the standard 404 or a customised 404, you see the same info in the logs.

Derek, did you solve the problemo?

DerekH

6:48 pm on Jan 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you sanenet - yes you're right, turning off a 404 doesn't help me.

Well, I'm taking it in stages and I'll see what gives...
I get fresh Webalizer stats every 24 hours, so I'm making one change a day and seeing what happens.

First ploy was to move my 404 file somewhere else so I can start counting the new page from zero.

Next was to add a favicon.ico file to see whether that was part of it.

Next I'll add an empty robots.txt

If that doesn't cure it, I'll get into the PHP.

Thanks everyone - I can get there from here now!
DerekH

larryn

9:00 pm on Jan 12, 2005 (gmt 0)

10+ Year Member



Derek,

I have found that you need to have BOTH a robots.txt and a favicon.ico files on your site as they will be requested by robots and browsers no matter what. I would suggest that instead of using the try and find method, you just add both those files right away as they are going to both cause 404s if they do not exist.

In more detail:

Any well-behaved bot will check (periodically) for a robots.txt file, by making a http request. If you don't have one, it will generate a 404, just like any other missing file (you will also see 404s for missing collateral files such as jpg, gif, etc.). The simple solution is to put up a blank or simple robots.txt file:


# Standard - Allow Everyone Everywhere Robots Policy File
# server does not parse this file, so no includes allowed

User-agent: *# all robots
#Disallow:# allow everywhere

As to the favicon.ico, MSIE will request that file in the current or root directory of your site if someone bookmarks the page, also, many other browsers (Safari, Mozilla, etc) will use that image in the address/tab bar for the site. If you don't have logo for your site, you can find lots of royalty free icons on the web by searching the usual places.

Ideally, if you had access to your raw logs, you could tell the analysis tool to ignore requests for those files, so that your analysis had the details you need and not the items that you don't care about.

Hope this helps,

Larry

DerekH

9:42 pm on Jan 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Larry - the robots.txt and favicon went up yesterday, so this morning's Webalizer logs were inconclusive - I'll check again tomorrow.
I also tracked down a piece of PHP to go in the 404 file to append the referrer to an ASCII file - I'll give that a go at the weekend when I have enough time to bang my head repeatedly on the wall!
Thanks!
DerekH