Forum Moderators: phranque

Message Too Old, No Replies

Need tips on how to force a 404 error on with .htaccess Apace server

         

JeffOstroff

3:09 pm on Dec 20, 2013 (gmt 0)

10+ Year Member



I need to take down one of my sites, I removed all the HTML pages hoping the apache server would just spit out a 404 error on the headers, but instead it just gives a directory like this:

Index of /

Name Last modified Size Description

[DIR] Parent Directory 28-Jun-2007 03:00 -

Apache/1.3.42 Server at



So my question is, what can I do to force a 404 on every page that someone tries to enter my site with? The headers must show 404 so I can submit it to the Google URL removal tool.

g1smd

6:41 pm on Dec 20, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Assuming the "404 page" is in a file called
/404.html
in the root.

RewriteRule !^404\.html - [G]


This will serve the "410 Gone" status code.

I'd expect Google to treat 410 and 404 the same.

lucy24

8:39 pm on Dec 20, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google, unlike Bing, seems to recognize that gone means gone.

Index of /


Yikes! This can only happen if auto-indexing is on. I can't imagine any site where you would want this to happen globally. In your main htaccess-- both for the old site and the new one-- add the line

Options -Indexes


Then, if there is some specific directory that you do want indexed, put

Options +Indexes


in a new htaccess file for that directory. It's inherited downward each time you change it.

Edit: An auto-index can only happen if the directory (folder) still exists. It sounds as if you deleted the files but left the empty directories.

RewriteRule !^404\.html - [G]

One of the very rare cases where it's appropriate to put ! in the body of a rule :)

JeffOstroff

8:51 pm on Dec 20, 2013 (gmt 0)

10+ Year Member



Thanks for the tip on the directory, we still need the directory to be there, because I don't want Google to see any html page on the site, but yet I still want to use the /images directory to serve up images for my eBay auctions. The web site is basically acting as a photo server, but I don't want Google indexing anything.

Here is my complete .htaccess file now:

Options -Indexes
RewriteEngine on
RewriteRule !^404\.html - [G]


The problem is this works in stopping the index, but returns a 410, when I really need a 404. So we are getting warm, but how do I return a 404?

JeffOstroff

9:01 pm on Dec 20, 2013 (gmt 0)

10+ Year Member



We used this script on one of our other sites today and it resulted in a 404 error for us just fine, but I cannot get it to give a 404 error on the current site that I am trying it on, also hosted on verio:

RewriteEngine on
RewriteRule ^(.*)/$ /$1/index.htm [NC,L]

The current site I am trying this on, just ignores it apparently and still shows the directory.

lucy24

9:32 pm on Dec 20, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



but returns a 410, when I really need a 404

It returns a 410 because g1smd, who --ahem, cough-cough-- knows more about this stuff than you or me, suggested a rule with [G] flag. A 410 [G] is the appropriate response when you have intentionally removed a file. It tells the search engine that you took it away on purpose and it won't be back. The only time it might not be appropriate is if you have "Now you see them, now you don't" URLs that come and go at random. But in that situation you should work up an alternative way of handling it. (For example, in commerce if a page is associated with a temporarily unavailable product, or a permanently discontinued one.)

By default you don't need to do anything to return a 404; it simply means "The server can't find the file". If the request is for a directory, and there's no available index file and auto-indexing is disabled, the automatic response will instead be 403. This is probably not what you want, since the message is "The content exists but you can't have it." Empty directories should be deleted.

I cannot get it to give a 404 error on the current site that I am trying it on, also hosted on {hostname}:

RewriteEngine on
RewriteRule ^(.*)/$ /$1/index.htm [NC,L]

I don't understand what you're doing here. I've never heard of an Apache installation that doesn't have mod_dir installed. It has only two jobs; one of them is to serve up directory-index files if they exist. Now, if you have an amazingly inept host you may need to add a line to htaccess

DirectoryIndex index.htm


although seriously I can't imagine any shared-hosting config file that doesn't already list-- at an absolute minimum-- .htm and .php alongside the default .html.

Edit:
The line
Options -Indexes
may need to be listed separately for each domain. In my shared-hosting setup, the line won't work in the htaccess file belonging to my userspace (occupied by all domains, each in its own directory). It has to go at the domain level. I don't know if this is a universal Apache fact or specific to my host's config file. Auto-indexing is on by default in 2.2, off in 2.4. But afaik it's perfectly safe to say
Options -something
if the option in question happens to be off already.

JD_Toims

11:46 pm on Dec 20, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The problem is this works in stopping the index, but returns a 410, when I really need a 404.

A 404 is defined as "not found" and can result from a number of conditions, including things like a server "glitch", an FTP upload "stalling" after the previous version of the remote file being deleted yet before the new file is saved to the server, etc. -- 404 Not Found simply means "Not Found" with no "concrete reason" why, no length of time for persistence of the issue, no reason behind the issue, nothing other than "not found [implicitly including: at this time]".

There is also no "intent" associated with a 404 and it may or may not be a permanent condition -- a 404 is a *much worse* response than a 410 in this case, because a 410 indicates: "The resource you are requesting has been intentionally, permanently removed; Please remove all references to it."



Options -Indexes

RewriteEngine on
RewriteRule !^the-directory-you-want-to-use/ - [G]

JeffOstroff

11:58 pm on Dec 20, 2013 (gmt 0)

10+ Year Member



Ok, thanks all, I am content with the 410, just confirmed on Google Webmaster news release from Wednesday that 410 code is good, and I submitted my URLs to Google's URL removal tool and the URL removal tool then accepted them and confirmed the site is gone, so by tomorrow hopefully, the links will be out of the index.

Question that I still have not seen answered on Google or anywhere else is:

Do we need to submit both the www version and the non-www version of our web site tot heir URL removal tool, and will it automatically kill all the URLS from our site or do we have to submit every URL from our site, one at a time too?

Yes, folks, every time we answer one question a million more pop up. I really miss the Alta Vista days. I could make a change today and be back at #1 tomorrow. The good old days...

g1smd

6:31 am on Dec 21, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Submit both non-www and www just to be sure.