Forum Moderators: phranque

Message Too Old, No Replies

Using 410/Gone in .htaccess

         

Odatpup

4:31 pm on Jan 19, 2005 (gmt 0)

10+ Year Member



Howdy all:

First post here, and very new to this stuff.

I have a subdomain that used to be a blog, complete with "archives" (old blog posts, some of which end in .html, some in .php). I don't care about traffic or being listed or SEs. I do like using .htaccess to keep out bad guys and spamthings.

My logs are full of requests for the old blog archives (due to links from ages ago) like this:

xx.xx.xx - - [19/Jan/2005:07:26:27 -0500] "GET /archives/000157.html HTTP/1.1" 403 769 "http://somespammyspammersdomain.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322)"

Sometimes the request is for the same thing but ending in .php (i.e., "GET /archives/000639.php").

The archives cover a large range of sequentially-numbered files: "archives/00001.html", "archives/00002.html", etc. etc. So I need to "Gone" the whole danged range of file numbers AND in both .html and .php formats. I think it might be easiest to just specify that the whole "archive" directory and everything in it (regardless of number or extension) is GONE, but I don't know how to do that.

After much searching and reading, I found this:

RewriteCond %{REQUEST_FILENAME}!-U
RewriteRule ^archives\.html$ - [G]

And this:

RewriteCond %{REQUEST_FILENAME}!-U
RewriteRule ^archives/[^\.]\.html$ - [G]

but I don't think either is exactly what I need (again, I need to specify a range of file numbers/extension, both php and html, or maybe just the whole "directory" called /archives/*.*?)

Can anyone help? Sorry this is long but it does give you an idea of how totally confused I am at the moment.

jdMorgan

7:09 pm on Jan 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Odatpup,

Welcome to WebmasterWorld!

Regular-expressions patterns don't have to be complete, so you can just leave the filename and filetype out of the pattern, and let it match any filename and filetype that is missing in /archives.

I'd suggest you also check for the presence of the HTTP/1.1-defined Host header in order to avoid returning a 410 response to an HTTP/1.0 client (which would not understand it). Because many search engines 'advertise' as HTTP/1.0 but are actually capable of handling HTTP/1.1 responses, you can just check for the Host header instead of examining the request's protocol.

For true HTTP/1.0 clients, the code will do nothing, and the error will be handled by the server's default 404 error handler.


# Check for HTTP/1.1 request Host header
RewriteCond %{HTTP_HOST} .
# Make sure the requested file does not exist
RewriteCond %{REQUEST_FILENAME} !-f
# Return 410-Gone for HTTP/1.1 requests for non-existent archive files
RewriteRule ^archives/ - [G]

Jim

[edited by: jdMorgan at 8:14 pm (utc) on Jan. 19, 2005]

Odatpup

8:08 pm on Jan 19, 2005 (gmt 0)

10+ Year Member



Thanks so much for the help Jim!

Unfortunately, I tried the code you provided (cut and pasted) and it returned the dreaded 500 internal server error. I checked the Error Log in cpanel:

[Wed Jan 19 14:24:08 2005] [alert] [client xx.xx.xx] /home/mymaindomainname/public_html/.htaccess: RewriteCond: bad argument line '%{REQUEST_FILENAME}!-f'\n

Hmmm, what the heck am I doing wrong? Was I supposed to change something in the code? If so, I feel really stupid but I am very new at this so I don't know what to change.

Could it have anything to do with the site in question being a subdomain? My htaccess file is in the root folder (for "www.mymaindomain.com"), so, as I (barely) understand it, that means my htaccess file affects everything/all subdomains below (there are others), including the one with the archives that I want to send a 410 for (www.mysubdomain.com). My host calls my subdomains "add-on" domains. Does this make any difference as far as what I'm trying to do?

Also, the old archive pages/posts are still there on the server -- don't know if I made that clear. I suppose I could delete them all, but I'd just as soon let them be (may want to print them or something, someday) IF there's a way to serve 410s every time anything/anyone requests one of them.

My goal, then, is to give 410s to anything and anyone requesting archives/#####.php/html until they all figure out that the stuff is "gone" for good and they shouldn't keep coming by and asking for it.

BTW, the subdomain site in question is a blank page, yet they all keep coming and requesting this stuff because of links out there somewhere on the 'net.

I'm sorry to be dense but like I said, I'm new to all this. You pretty much lost me with the "HTTP 1.0" etc parts of your answer. :-)

Thanks again for taking the time to try to help. I don't want to be a pest, but if you get time and want to try again, you should assume I know nothing (which is basically the truth!).

jdMorgan

8:20 pm on Jan 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the files still exist, you'll need to remove the second RewriteCond. This line probably caused the 500 error anyway, due to a typo; Posting on this forum deletes spaces preceding "!" unless special measures are taken, and the space is required by mod_rewrite (see correction to code above).

# Check for HTTP/1.1 request Host header
RewriteCond %{HTTP_HOST} .
# Return 410-Gone for HTTP/1.1 requests for non-existent archive files
RewriteRule ^archives/ - [G]

That will return a 410-Gone for all requests to the /archives subdirectory, whether the files exist or not. However, you won't have a fallback to a 404 for HTTP/1.0 clients if the files still exist. The easiest fix for that is to simply rename the /archives folder on your server, which will make all those URLs go missing.

See our forum charter for references to mod_rewrite-related documentation.

Jim