Welcome to WebmasterWorld Guest from 3.214.184.124

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Code 410 is replaced by 500

     
7:01 pm on May 17, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 7, 2007
posts:44
votes: 1


I have a small site on shared hosting. I am intentionally using the following .htaccess rule for certain conditions:
RewriteRule .* - [G]

Since about two weeks that rule results in a 500 in each and every case. There are no 410s at all in the log any longer, like there used to be. The server and the .htaccess otherwise work OK and I have noticed no true 500 errors.

Any idea what could cause this? A recent server misconfiguration? (Customer service is clueless.) Any changes in Apache? Maybe I should use "410" instead of the "G" in the rule?

What possible negative impacts could there be if I simply allow the server to continue serving 500s instead of 410s?
7:33 pm on May 17, 2016 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11888
votes: 250


usually a 500 status code corresponds to an entry in the server error log file - have you checked that for clues?
8:16 pm on May 17, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15956
votes: 898


Any changes in Apache? Maybe I should use "410" instead of the "G" in the rule?

Apache does change periodically, but if your host upgraded-- say, from 2.2 to 2.4-- without telling you then you really, seriously need to look into changing hosts. (The "clueless" line is another warning sign.) But unless you have something really unusual in your conditions, mod_rewrite shouldn't be affected by any current updates. There are a couple of new flags, and some added inheritance options, but afaik any existing rules should work the same as always.

There is absolutely no difference between 410 and G, assuming you mean replacing the single letter [G] with the element [R=410] (really). But why bother, when all you're doing is adding four bytes to your htaccess?

:: pause for moment's delayed wild excitement as I find phranque back in the land of the living ::

Are you absolutely positive you haven't changed anything in the conditions which lead up to the rule? Better come back with a fuller post, including the entire RewriteCond block. (And that's not something you hear every day. Most of the time we have to beseech people not to post their entire htaccess, or their entire HTML, or whatever it may be.)
8:31 pm on May 17, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


A quick note that from a search engine point of view that 5xx errors are broadly equivalent to "keep trying".
10:00 pm on May 17, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15956
votes: 898


5xx errors are broadly equivalent to "keep trying"

I don't know if there exists hard evidence, but you would also think that repeated 5xx errors-- especially when mixed with ordinary 200/304 responses-- are a warning of low technical quality, and not something you want to make a habit of. If it consistently happens on certain filename requests, you can only hope that the search engine will figure it out and not tar the whole site with the same brush. In this specific case, the intended response luckily is a 410, meaning that you wouldn't want them in the index anyway.
10:40 pm on May 17, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


hard evidence


I have 'soft' evidence. I used a 503 on an old (and reasonably well-visited) web app that I broke with a site upgrade and never got around to fixing (in 2004). Google ranked it for a good few months despite the server error. The URL still exists (and still uses 503). Google still asks for it (occasionally) but treats it the same as permanently deleted content. I don't believe there is any effect on the rest of the site. I should probably remove the 503 or fix the stupid code and see what happens ;)
10:24 am on May 18, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10707
votes: 1151


I'd avoid returning a 503 if at all possible. The average user (if seen by a human) might look up wikipedia and find:

503 Service Unavailable
The server is currently unavailable (because it is overloaded or down for maintenance). Generally, this is a temporary state.


This does not inspire much confidence. :)
11:35 am on May 18, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


Hey, I wasn't recommending it! It's the right server response to match my "I will get round to fixing this at some point" theory.
3:55 pm on May 18, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 7, 2007
posts:44
votes: 1


I have been studying the server log entries carefully, and really, all 500s during the last two weeks are the result of my own .htaccess rules that are intended to produce a 410.

I want to totally block certain IPs and and in those very cases give them a 410 instead of the proper 403. SE bots are in no way affected. There are no SEO implications.

I just made an experiment and for nine hours replaced the actual .htaccess file with a back up file from last November which had previously functioned perfectly. Now it produced the very same 500 codes I have been talking about above. It is like the server has been reconfigured in some way two weeks ago, but I cannot believe that.

Changing hosts would not help, unfortunately. In Scandinavia all shared hosting customers receive a similar level of service: rudimentary and elementary. The operators would not normally inform about an Apache server upgrade. Still, my present host is one of the big and most reliable ones. Going to some one-man enterprise could result in better service, but less reliability. So that is life here, and I have to sort this out myself or pay maybe ten times more a month.
4:06 pm on May 18, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


Have you got access to the error log? Anything there more specific about the error?
5:24 pm on May 18, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 7, 2007
posts:44
votes: 1


In the cPanel Error Log the last 300 entries at present consist of 6 attemps by hackers to access non-existing files and 2 requests for apple-touch-icons of a type I do not bother to provide. Nothing more.
5:32 pm on May 18, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


Are you certain it's the [g] rule? What's the rest of it?

You could also try changing [G] to something that definitely works. E.g.
RewriteRule .* /500.php [R=301,L] 
6:42 pm on May 18, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 7, 2007
posts:44
votes: 1


The complete last three lines are:
RewriteCond %{REMOTE_ADDR} ^xxx\.(17[0-9]|18[0-9]|19[0-9])\. [OR]
RewriteCond %{REMOTE_ADDR} ^xxx\.
RewriteRule .* - [G]

As an experiment I have now changed the Rule to
RewriteRule .* - [F]
and will see tomorrow what happens.

As the present codes 500 are not true errors of the server, can I simply disregard the strange wrong code? People I definitely do not want to access my site will see a 500 instead of a 410. But does it matter?
6:57 pm on May 18, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15956
votes: 898


Edit: Oops, we overlapped. New material at end of this post.
You could also try changing [G] to
<snip>
[R=301,L]
Er, wouldn't that lead to a slew of soft 404s? Manually redirecting to your 500 page may give human users the intented message, but it won't make search engines happy. In any case, if the problem is in the conditions, I don't think it would make any difference.

I have been studying the server log entries carefully, and really, all 500s during the last two weeks are the result of my own .htaccess rules that are intended to produce a 410.

Can you quote some rulesets in full? That is, not just the rule but the conditions. Clearly that's where the problem is happening so that's what we need to look at.

You are not allowed to name your own domain (it also causes problems with unwanted autolinking) but it's generally fine to keep your actual pathnames when quoting rules.* If you need to talk about more than one domain, you can say example.no, example.se and so on.

Even if your host won't tell you what Apache version you're using, there are some simple tests that will let you see exactly what mods are available to you. This, in turn, essentially tells you what version you're on. Oh, and if you don't happen to have a custom 500 page, the default error screen will generally say what Apache version is running. (So neener-neener to the host!)
-----
The complete last three lines are:

Did you mean that that's the entire ruleset? One RewriteRule, accompanied by two [OR]-delimited conditions?

As the present codes 500 are not true errors of the server, can I simply disregard the strange wrong code?

Now, wait. There are only two ways to achieve a 500 error. Either the server has encountered something it can't deal with (including a syntax error in htaccess), or you're returning the code manually. You don't have any reason to believe the host is stepping in and doing something on purpose, do you?

Based on the latest post: Are you returning the 410 to all requests from certain IP addresses? In general, you want to avoid rules in .* because it forces the server to evaluate conditions on every single request ever. If possible, constrain the rule to specific filenames, or at least to requests for pages.

Final question: I started out assuming the rule involved requests for specific pages. So I was going to ask if those pages live in their own directory and, if so, whether that directory has any further rules of its own, such as a supplementary htaccess file.


* I vividly remember one thread from some years ago whose pathnames strongly implied that the originator was running an escort service. But it can't be helped.
7:25 pm on May 18, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


Er, wouldn't that lead to a slew of soft 404s?


To clarify, I meant to do this to see whether there were still errors, rather than as a permanent fix. In the absence of a meaningful log trial and error may be all that's left!
8:12 pm on May 18, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 7, 2007
posts:44
votes: 1


Did you mean that that's the entire ruleset? One RewriteRule, accompanied by two [OR]-delimited conditions?


No. That was just the last two lines of the Conditions. Over the years I have been adding more and more IPs without removing possibly unnecessary old ones.

Are you returning the 410 to all requests from certain IP addresses?


Yes, exactly. I want to be able to fully deny those IP ranges access to any file on my site.

In general, you want to avoid rules in .* because it forces the server to evaluate conditions on every single request ever.


Now you perhaps said it, lucy24. Maybe I have simply made the server finally choke by the last IP addition made. I never imagined I could be able to reach the capacity limit of the server, which still appears to respond very fast. Also, the undesired visitors keep being blocked, but now with a 500. If so, this issue could be a useful reminder for other members too.

In that case my changing [G] to [F] would of course make no difference whatsoever. Obviously I will just have to delete all present Conditions and start "harvesting" a fresh and up to date IP list of offenders to stop. I will immediately delete half of those Conditions and see if it makes any difference. But many webmasters completely deny whole country ranges, so is it the number of Condition lines that are crucial?
11:55 pm on May 18, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15956
votes: 898


Try making this simple change to your existing rule. Where you currently have .* change it to \.html:
RewriteRule (^|/|\.html)$ - [G]
Replace "html" with "htm" or "php" or whatever extension(s) you really do use. Don't omit the closing anchor! The formulation means: any request for (a) the root ^$ or (b) any directory-index page (URL ending in /) or (c) an ordinary page (URL ending in .html). There are alternative wordings if your site happens to be extensionless.

The premise here is that your unwanted visitors will never get as far as requesting images or stylesheets if you never let them see the page that invoked them, so you don't need to keep checking. (It doesn't matter if your 410 page itself uses a stylesheet. At this point it's probably less work for the server just to hand it over than to evaluate all those conditions all over again-- especially if the visit is a robot who never asks for the stylesheet anyway.) Unwanted robots requesting images are so rare that they can generally be handled on a case-by-case basis on top of whatever hotlinking protection you've already got.

But I'm not sure we have come to the root of it. You said earlier that the 500 only crops up on requests that were supposed to get a 410. If your server is simply crashing from having to evaluate all those conditions, it would crash all the time, not just when the condition evaluates to true. In fact, in a long [OR] list, a True result would mean less work for the server, since it doesn't have to keep testing all the way to the end.

Is there any possibility that your server is having a problem with the 410 / G response itself? Try a made-up rule like
RewriteRule foobar - [G]
(no conditions) and request any made-up nonsense URL with "foobar" in it. Do you get a 410 or a 500? Look at two things: what you see right then and there on your screen, and what your access logs show.

A more serious question is whether a 410 response through mod_rewrite is the most efficient means of access control at all. mod_rewrite should really be your last resort; you're in shooting-flies-with-an-elephant-rifle territory. What's the problem with the ordinary system of Allow/Deny directives (or equivalent in Apache 2.4) using CIDR ranges?
8:23 am on May 19, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 7, 2007
posts:44
votes: 1


(a)
I have now analysed yesterday's access logs. First [G] was replaced by [F] for two hours. That did not change anything. The log contained the usual 500 codes instead of the expected 403s. Next I removed half of the remote_addr conditions. Neither did that change anything: the 500s continued to come in the log like before.

So obviously my long list of IP conditions is not a performance issue for the server. Maybe the 500s are, after all, some server misconfiguration. The server is not crashing. But I really do not think the nerds at the hosting company would listen to me. Frustrating.

(b)
Added the rule RewriteRule foobar - [G], with no conditions. The result on the screen was (I haver no custom 410 page, only a 404 one):

Gone
The requested resource
/foobar.html
is no longer available on this server and there is no forwarding address. Please remove all references to this resource.
Additionally, a 410 Gone error was encountered while trying to use an ErrorDocument to handle the request.


However, in the raw access log the error code for that request is 404.

Next I removed nearly everything from the .htaccess file, leaving only

Options +FollowSymLinks
RewriteEngine on
RewriteRule foobar - [G]
ErrorDocument 404 /notfound.html


The result was the same as previously: 410 according to the screen, 404 in the log, but my ErrorDocument was not served. Cache was cleared. (After removing RewriteRule foobar - [G] and then requesting /foobar.html the ErrorDocument was properly displayed).

(c)
I have now replaced my old Rule with RewriteRule (^|/|\.html)$ - [G] as lucy24 recommended

I started using mod_rewrite many years ago and as it functioned as I wanted until now I have only been adding to the .htaccess file. I also block a few referrers and user agents and have the ordinary www/non www rules and hotlinking protection. I do not know how to use Allow/Deny directives in practice. That is the only problem. Is it not in the mod_access module? The server's Apache version is 2.2.31.
7:18 pm on May 19, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15956
votes: 898


However, in the raw access log the error code for that request is 404.

This is sounding more and more as if everything is happening the way it's supposed to, and the real problem is in the logging-- which is out of your control. That's a bit infuriating, but not terribly serious otherwise.

This thread has become fairly long, so I now can't remember if I have already asked (or you have already answered) this:
If you add your own exact IP to the series of RewriteCond, so you yourself would get served with a 410, what response do you see onscreen? (If you don't know your current IP, there are plenty of free utilities. I'm pretty sure WebmasterWorld itself has one.) If you see a 410 onscreen, then you can safely conclude the only problem is with logging.
2:59 am on May 20, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 7, 2007
posts:44
votes: 1


I put my own IP to the rewrite conditions, and was served with a 410 on the screen. The access log, however, says 404. So it really seems to be just the logging. Infuriating, indeed. How should I express the issue professionally when I inform the hosting company?

(For the record: in my previous post I had forgetten that I was using my ISP's proxy, which caches pages. So clearing the browser cache was useless. But now I did it properly with my individual IP number.)

Could you please give some general guidance on how to put mod_rewrite and mod_access (in order to be able to use Allow/Deny directives) in the same .htaccess file?
5:29 am on May 20, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5507
votes: 5


I put my own IP to the rewrite conditions, and was served with a 410 on the screen. The access log, however, says 404. So it really seems to be just the logging. Infuriating, indeed. How should I express the issue professionally when I inform the hosting company?


FWIW, were you served with a custom 410 (as defined in your own htaccess) or the standard 410 provided by either Apache or your browser?
5:47 am on May 20, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15956
votes: 898


how to put mod_rewrite and mod_access (in order to be able to use Allow/Deny directives) in the same .htaccess file

Each module is an island. So there's absolutely no reason you can't have directives from mod_rewrite, mod_setenvif, mod_auththingummy (you said mod_access, but that was its 1.3 name), mod_dir and lots of others in the same htaccess. The only two mods you should not try to combine are mod_rewrite with mod_alias ("Redirect" or "RedirectMatch" directives), because they'll get mixed up and things will happen in the wrong order.

If your host permits mod_rewrite, the chances are pretty overwhelming that they also permit mod_authzwhatsit. (You will have deduced that I can never remember its exact name.) They work under different Override settings, but I can't imagine someone permitting FileInfo but not Limit. (Confusingly, that's the name of the Override category as set in config; nothing to do with <Limit> envelopes.)

:: detour to look up ::

mod_authz_host. No wonder I can never remember it; there are two lowlines.

Order Allow,Deny
Allow from all
Deny from aa.bb.cc.dd

where aa.bb.etcetera is your own IP. If you find yourself locked out, then you know you can use Allow/Deny directives. (There are other ways to find out, but nothing beats direct experimentation-- at least when you can do it without bringing your site crashing down.)

<tangent>
Oh, how interesting. AllowOverride has got to be the first Apache directive I've ever encountered that can only be used in <Directory> sections. Not in vhost, not loose in config. (And of course not in htaccess, since the whole point of this directive is to say whether you can use htaccess at all.) Never noticed that before.
</tangent>

In general, modules operate in reverse alphabetical order. (Not a hard-and-fast rule, but a good enough approximation.) So you can have
mod_setenvif
mod_rewrite
mod_authzzzz
each building on each other's actions. Mod auth_etcetera then does the final mopping-up, and issues any 403s that haven't already been issued by mod_rewrite. In my own htaccess, I use mod_authblahblah in conjunction with mod_setenvif, so the lines will say things like
Deny from env=keep_out
Deny from env=bad_agent
Deny from env=bad_ref
and so on.

Once a 403 has been issued by any mod that's empowered to issue it, no other mod can override that decision. In particular: if one mod issues a redirect, and some other mod issues a flat-out 403, then the visitor will never know about the redirect, no matter which mod executed first. All they see is the 403.
5:56 am on May 20, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15956
votes: 898


Follow-up:
Additionally, a 410 Gone error was encountered while trying to use an ErrorDocument to handle the request.

I overlooked this before, and it's interesting, because why on earth would an internal request for an error document lead to a 410? That's not a built-in Apache error; you have to serve it deliberately.

It's quite common for an ErrorDocument request to result in an internal 404: it happens especially when the host has a list of standard ErrorDocument names like "missing.html" or "forbidden.html", and then if you never got around to making your own custom page--or if you put it in the wrong directory--the server looks but doesn't find it. (But the request still gets the right numerical response.)

At the time you got this "410 Gone" message, was your own IP still on the experimental list of RewriteConds-- the ones for the rule that's supposed to end up issuing a 410? If so, that's a simple explanation. And a doubly interesting one, because it would again demonstrate that your intended 410 responses are being served.

a custom 410 (as defined in your own htaccess) or the standard 410

I think he talked about this earlier in the thread. No custom 410 page; just the Apache default.

Which reminds me: If you're serving up a genuine 410 that is intended for humans ("I'm sorry, I did formerly have the page you asked for, but I took it down") then you really should have a custom 410 page. Or at least share your existing 404 page:
ErrorDocument 410 /missing.html
As you've seen, the Apache-default 410 is not something you'd ever want a human to see. It's a bit intimidating. But if your 410 is only intended for unwanted robots, then it doesn't matter. We will not talk about responses in the area of "You're not my friend any more and I've changed the locks" :)
6:58 am on May 21, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 7, 2007
posts:44
votes: 1


Additionally, a 410 Gone error was encountered while trying to use an ErrorDocument to handle the request.
At the time you got this "410 Gone" message, was your own IP still on the experimental list of RewriteConds-- the ones for the rule that's supposed to end up issuing a 410? If so, that's a simple explanation.


Yes, my own IP was still [G] blocked.

As could be seen from the experiments with foobar.html and my own IP the screen correctly shows a 410 error although the raw access log lists a 404 code for that same request. The 404s continue in the log also for IPs, referrers and user agents that are blocked in .htaccess by [G].

Also is continuing the false but alarming problem originally noticed: misleading 500s instead of 410s. When the log contains a 500 and when a 404 seems random, but a 410 is nowadays never logged (although a [G] has been served). During the last six months before mid of May there was not a single 500 error in the log but numerous correct 410s.

Those requests blocked are for example scrapers, mainly Chinese, downloading even the whole site (I never get any legitimate traffic from China) and automatic referrer spammers usually from Ukraine and Russia. They do not deserve to be served a custom 410 page. My site is entirely in our local language and the contents are of a national interest only. I can hardly see any real reasons for humans from abroad to visit it.

In order to make the .htacess file less resource intensive I will now try to implement mod_authz_host too.

As it looks like I am not yet eligible to vote here I want to say a special thank you to lucy24.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members