Welcome to WebmasterWorld Guest from 35.172.195.49

Forum Moderators: phranque

Preventing a 404 from showing up in the error log

     
7:14 am on May 27, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


I've had a system set up for awhile so that if someone goes to, for example:

https://www.example.com/foo

and /foo doesn't exist, then they'll be redirected to 404.php via .htaccess:

ErrorDocument 404 /404.php


Then, 404.php dissects the URL to find /foo, then checks a MySQL database to see if I have it in a table; if so, it includes the page that corresponds. So, very likely, it will include:

https://www.example.com/whatever/index.php?id=foo

To the end user it looks like /foo/ exists, but it's really going through the ErrorDocument.

This has worked fine for ages, but I've recently noticed that they all show up in my error log. Which isn't a big deal, really, until I need to find something in the log and have to dig through a million of these!

I forced a response code, like:

$id = 'foo';

header("HTTP/1.0 200 OK");
include "/home/example/www/whatever/index.php";
exit;


and tested using get_headers($url); that it's definitely sending the 200 OK. But it still shows up in the error log, so I'm guessing that maybe it is logged as soon as it hits the .htaccess?

I thought I would be slick and use this instead of ErrorDocument in the .htaccess:

RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI}/index\.php !-f
RewriteCond %{REQUEST_URI} !-d
RewriteRule .* 404.php [QSA,L]


Buuuut, then all of the other files that I'm redirecting before that in the .htaccess break; eg:

# now this breaks
RewriteRule ^this(.*) that/index.php?id=this [L]

# maybe because it keeps on reading and finds this?
RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI}/index\.php !-f
RewriteCond %{REQUEST_URI} !-d
RewriteRule .* 404.php [QSA,L]


Can you guys suggest how I might prevent the 404s that are found in MySQL from being logged?
4:45 pm on May 27, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15937
votes: 889


they'll be redirected to 404.php via .htaccess
Please don't say this. Redirecting a 404 is incorrect, but what you’re doing is correctly defining an error document.

Digression at this point: I honestly think you should stick with what you were doing. It’s trivial to globally delete all 404s from your ErrorLog before doing whatever else you want to do with the log offline.

I'm guessing that maybe it is logged as soon as it hits the .htaccess?
Exactly. I can remember it taking a long time to wrap my brain around the fact that the response the server sends out is not necessarily the response the user receives. You see this especially in a CMS, where all requests are 200 as far as the server is concerned.

I thought I would be slick and use this instead
But, good grief, then you’re putting the server to all that extra work it has to do if you were running a CMS. If you do want to do this, you should put a RewriteRule at the very beginning of all your RewriteRules--in the same place you have the rule letting everyone see the 403 document--saying something like

RewriteRule ^404\.php - [L]

Oh, and do please put a / at the front of all your rewrite targets. In an ideal world it will make no difference, since / is the default RewriteBase, but it provides added protection against the “all your RewriteBase are belong to us” exploits.
9:21 pm on May 27, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 15, 2003
posts:2645
votes: 7


Just to be sure..... are you talking about the Apache error log?
11:18 pm on May 27, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


Yup, the Apache error log at /usr/local/apache/logs/error_log

Which rotates away regularly, but it rotated about 24 hours ago and the current log is already 5M in size. I just wish I could find a better way to make pages that don't technically exist but still show something appropriate to not be logged.
11:49 pm on May 27, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


ErrorDocument 404 /404.php

this doesn't internally rewrite a potential 404 response to /404.php.
it tells the server to send a non-default response message with the 404 status code.

Then, 404.php dissects the URL to find /foo, then checks a MySQL database to see if I have it in a table; if so, it includes the page that corresponds.

you really want to 301 redirect to the url instead of including it in the custom 404 response page.

header("HTTP/1.0 200 OK");

too late.
the 404 status code has already fired.
11:50 pm on May 27, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


pages that don't technically exist but still show something appropriate

please explain.
12:04 am on May 28, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


instead of a bunch of these:
RewriteRule ^this(.*) that/index.php?id=this [L]

you should do something like this after all external redirects and any more spific internal rewrites:
RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI} !-d
RewriteRule .* /index.php [L]


if https://www.example.com/foo is requested and gets rewritten to index.php, there are 3 possible responses:
- you send a 404 status code if the content doesn't exist
- you send a 301 status code and a Location header if /foo should be redirected to a more suitable url
- you send a 200 status code and "include" or otherwise supply a suitable document for the requested url
12:06 am on May 28, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


RewriteRule .* 404.php [QSA,L]

btw it doesn't hurt but the [QSA] flag would only be necessary if there was a query string in the rewrite target.
12:13 am on May 28, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Mar 15, 2013
posts: 1205
votes: 120


What I was trying to explain before, phranque.

For example, let's say you go to a Facebook profile and it looks like:

https://www.facebook.com/csdude55 (which isn't me, I just plugged in my username here for the example)

I assume there's not really a page at facebook.com/csdude55/index.php, they're just using something like a RewriteRule to show the results from something like:

https://www.facebook.com/user.php?username=csdude55

I have something similar set up on my end. I don't want the user to be redirected, I want them to have a pretty address that they can easily tell people and/or share.

[edited by: phranque at 12:28 am (utc) on May 28, 2019]
[edit reason] unlinked urls [/edit]

12:27 am on May 28, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


I don't want the user to be redirected, I want them to have a pretty address that they can easily tell people and/or share.

hence option 3 of the internal rewrite to index.php
12:49 am on May 28, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15937
votes: 889


this doesn't internally rewrite a potential 404 response to /404.php
Actually, it does, because an error document is really just a special kind of rewrite. Your browser's address bar still shows the originally requested, blocked or nonexistent URL, while your screen displays the error document--which could perfectly well be a php page subject to all the usual php behaviors.
2:20 am on May 28, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11873
votes: 245


as far as i'm concerned, an internal rewrite happens before an error code is thrown.

does the use of a custom error document show up in a rewrite log?
5:36 pm on May 28, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15937
votes: 889


does the use of a custom error document show up in a rewrite log?
No, because it doesn’t involve mod_rewrite. That’s why I said “a special kind of rewrite”: it’s functionally the same thing, although it’s done in a different way. Come to that, you could also think of a DirectoryIndex as a type of rewrite.

At first I misread “rewrite log” as “error log”, leading to: If you don’t have a defined ErrorDocument for some type of error, but your server thinks you do--as when shared hosting sets a default name such as “missing.html”--then that will show up in error logs as File Doesn’t Exist, immediately after the original error.