Forum Moderators: phranque

Message Too Old, No Replies

ErrorDocument 404 with 100K+ domains

treat (docs & images differently)

         

timhavens

6:04 pm on Jun 14, 2008 (gmt 0)

10+ Year Member



I would like to get some help setting up how to handle the following.

I have well over 100k domains.

People point domains to my servers randomly. This brings in requests for things on my servers which do not actually exist. In LARGE quantities.

Basically currently what is done is that there is an:

ErrorDocument 404 index.php

Setup, and not much else. Although this works it's certainly NOT efficient, nor is it the best thing to do considering we may want to handle where what to return to a browser based on filetype of the request.

For example:

GIF, JPG, PNG that are 404 I'd like to return NOTHING. Effectively ignoring the request...or return something that is certainly NOT index.php.

.JS could returns Javascript that attempts to redirect the browser to the front door (index.php).

and basically anything else that is 404 that is like a document (ie not and image...) should be redirected to index.php.

There is really very little in the .htaccess currently.

I was hoping to get some initial starting pointers from you on how best to handle this sort of thing.

T

jdMorgan

7:40 pm on Jun 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First of all, if you're using a script, then that script can handle all the various cases.

But stop and reconsider -- You're violating the HTTP protocol by returning the home page for all missing pages, and you're essentially creating a massive duplicate-content issue for your home page, since that home-page content gets returned for a potentially-infinite number of URLs.

A 404 error page should gently and somewhat apologetically acknowledge that an error has occurred, and then provide a list of resources to aid the visitor in finding what they were looking for. This is usually a link to the home page, site map, site search facility, and/or category page, as applicable. This is for human users.

I always look at a 404 error as if it is my fault. Barring a user-agent 'inventing' bogus URLs (as may be your case, here) seeing a 404 in my logs means I put up a bad link, or forgot to add a removed page or object URL to my list of 410-Gone URLs. A 404 means "Missing for an unknown reason, and may come back," while a 410 means "It's gone and we removed it on purpose, and it won't be back."

Now considering requests for non-existent non-HTML resources, how about sending a blank, transparent image (i.e. a 1x1 transparent gif) in response to any bogus image-format request?

Your JS idea won't necessarily work, because you cannot know whether the JS is creating part of the page that can legitimately invoke a redirect, or whether it is just a little counter down in the corner. So, it might be better to simply send a "return false;" and do nothing else.

If you opt not to use your script to handle errors --and I don't recommend using scripts for handling errors, due to the fact that if the script or PHP configuration is broken, you can get into a really nasty error loop situation-- then you can use simple, small stripped-down static HTML files to handle HTML-page errors, and perhaps rewrite requests for other filetypes to return more appropriate content.

Some handy 'tools' here are the <Files> and <FilesMatch> containers to create conditional handling based on the filename/filetypes, and mod_rewrite when you need even more flexibility.

Kind of a high-level view here, but details depend on exactly what you decide to do.

Jim