Forum Moderators: phranque

Message Too Old, No Replies

More fun with htaccess: tracking down a possible infinite loop

         

csdude55

8:26 pm on Apr 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My sandbox for the rebuild is in a subdomain, ww2. I using CentOS with WHM/cPanel, so the actual directories are:

// Live public directory
/home/example/www (an alias for /home/example/public_html)

// Subdomain
/home/example/ww2.example.com


I've been working on the .htaccess file in the subdomain, so it SHOULDN'T affect the live directory. Which is why I'm surprised and lost right now.

Yesterday, I uploaded a new .htaccess to the subdomain. No errors, it was working fine except that it was redirecting a specific page when I expected it to rewrite, so there's a logic error somewhere but not a fatal error.

About an hour later, I found that my server wasn't responding! I had to reboot, and the server did a full filescan so it took an hour to reboot. During peak time!

But it finally came back up, and everything seemed OK. I watched my server loads, everything was normal. Then about an hour later, I found that my main site was unresponsive, with an error that it's taking too long to respond. Other sites on the server were loading fine, it was just this one site.

On a hunch I renamed the .htaccess in the subdomain (literally the only thing that had changed), and then the live site came up immediately.

Not sure if it was a coincidence (since it didn't make sense), I waited a few minutes and renamed it back. This was at around 11pm, and by 3am everything was still fine so I went to bed.

I woke up at 9am to check on things, only to find that the site became unresponsive again at around 6am. I renamed the .htaccess in the subdomain again, and again, the site began to respond immediately.

I'm at a complete loss. The htaccess at the subdomain shouldn't be impacting the live directory at all... but it does. No one access the subdomain except for me, so there wouldn't have been any traffic on it while I was in bed. And even if there was, there were no errors when I tested it and everything ran fine for several hours.

My only guess is that something is stuck in a loop that runs in the background, and after a few hours it locks something up? But according to Munin, the CPU load was normal right before it stopped responding. RAM was higher than usual, though; it's usually at around 5.25G during that time, and this time it was at 6.75G. I only have 4G installed, though, so I'm not sure exactly what that means... swapping, I guess?

How in the world do I track down a possible infinite loop that's not throwing an error until it crashes?

Or do you guys and gals think it could be something else entirely?

lucy24

9:50 pm on Apr 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: grasping at straws ::

What do error logs say? Crank up the LogLevel for the subdirectory to the highest level (or, er, lowest level, I always forget which order it goes, but you know what I mean) and see if there's anything unexpected.

Another thing to try is, instead of renaming the htaccess, comment-out the entire thing and see if anything changes. If yes, you can start un-commenting line by line. (Voice Of Experience says the problem will be in some line that you hadn't even considered looking at, because there is absolutely zero possibility of any way on earth it could ever be causing any kind of problem.)

An alternative to the preceding--if you can do it without putting the server out of commission for another hour--is to change the offending directory's AllowOverrides settings, one by one, and see if there's some particular category of override that's causing the trouble. (This is assuming it doesn't lead to an instant 500 if the htaccess includes a rule it isn't allowed to have. I don't think that's supposed to happen, though.)

w3dk

10:20 pm on Apr 6, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Assuming the `.htaccess` file for the subdomain is located at "/home/example/ww2.example.com/.htaccess"? And the ".htaccess" file for the main site is at "/home/example/public_html/.htaccess"? And you have no other .htaccess files? Or directives in the server config / virtual host?

The main site does not reference the subdomain in anyway?

Any server-side caching? Proxy caches? Make sure you are not seeing any cached responses... 301s are persistently cached by the browser (test with 302s and/or with the browser cache disabled).

The htaccess at the subdomain shouldn't be impacting the live directory at all... but it does.


In the scenario painted above, the .htaccess file at the subdomain cannot influence the live directory. They would appear to be in entirely separate directory trees? "Never the twain shall meet" as they say.

No one access the subdomain except for me, so there wouldn't have been any traffic on it while I was in bed.


Are you sure? What do your server logs say?

And even if there was, there were no errors when I tested it and everything ran fine for several hours.


Although did you test every possible URL combination, including the extreme edge cases and beyond?

My only guess is that something is stuck in a loop that runs in the background, and after a few hours it locks something up?


It's quite possible for directives in .htaccess to get "stuck in a loop", but not for a few hours. An internal rewrite loop will break within a few seconds (assuming you've not increased the LimitInternalRecursion value from the default 10?). If you are using the "N" (next) flag on the RewriteRule directive then things can break horribly if there is no way for the rule to "fail" - memory and system resources can go through the roof and bring your server to a halt. However, you would see this in a few minutes at most - unless it had been stuck in this "frozen" state for a long time - although I would think Apache would crash before too long?

How in the world do I track down a possible infinite loop that's not throwing an error until it crashes?


I'm sure the server's error log must hold some answers? Using the "N" flag without an appropriate "fail" state can certainly break a server in the way you describe (by default it fails after 32,000(!) iterations - but many servers will "break" before then).

Have you enabled some debug logging.... "LogLevel debug" and/or "LogLevel rewrite:trace4" - or whatever is appropriate for the modules/directives you are using.

(Unfortunately, if the LogLevel is set very high and the server gets into a spiralling loop then the detailed error log can almost compound the problem.)

What version of Apache are you running? (I'm assuming 2.4)

csdude55

8:19 pm on Apr 7, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What do error logs say?

Nothing yet... I haven't implemented LogLevel yet because I'm nervous about letting it run long enough to crash everything, I lost a lot of money when it did that! I'm already living on credit cards (in part thanks to coronavirus) so I'm hesitant to risk losing more :'-(

Without LogLevel set up, though, there was nothing in my error logs at all to explain it.

Another thing to try is, instead of renaming the htaccess, comment-out the entire thing and see if anything changes. If yes, you can start un-commenting line by line. (Voice Of Experience says the problem will be in some line that you hadn't even considered looking at, because there is absolutely zero possibility of any way on earth it could ever be causing any kind of problem.)

An alternative to the preceding--if you can do it without putting the server out of commission for another hour--is to change the offending directory's AllowOverrides settings, one by one, and see if there's some particular category of override that's causing the trouble. (This is assuming it doesn't lead to an instant 500 if the htaccess includes a rule it isn't allowed to have. I don't think that's supposed to happen, though.)

Good ideas... OK, I'll make myself initiate this at around 10pm tonight and see what happens. Last time it didn't mess up until the next day, though, so it's gonna be hard to be confident in any of the results for awhile!

Assuming the `.htaccess` file for the subdomain is located at "/home/example/ww2.example.com/.htaccess"? And the ".htaccess" file for the main site is at "/home/example/public_html/.htaccess"? And you have no other .htaccess files? Or directives in the server config / virtual host?

That is all correct. There's actually another .htaccess at /home/example, created by cPanel for compression. It hasn't been changed in 2 years, though, so it shouldn't be important.

(side note, the same compression text appears to be in post_virtualhost_global.conf, so I don't know if it's even relevant here or if I could delete it)

And I haven't made any custom changes to httpd.conf in several years, other than setting a few globals via WHM. I did it manually once many years ago, then lost it all when updating Apache! So I've been gun-shy to change anything else manually. I know that there are protections now, but still... fool me once, blah blah blah.

The main site does not reference the subdomain in anyway?

Correct, the only people that know about it are my office staff. I checked the error logs and there haven't been other accesses, BUT! It's relevant to note that I did have a tab open with the /ww2/ page, and it runs a script in the background every 30 seconds to check for new private messages. So that's a potential source, but nothing in the .htaccess references the directory for those messages, so while it shouldn't have been a problem... it could be.

Any server-side caching? Proxy caches? Make sure you are not seeing any cached responses... 301s are persistently cached by the browser (test with 302s and/or with the browser cache disabled).

Not at this point, no. While in productions I'm trying to do the opposite, limiting any caching other than images.

Although did you test every possible URL combination, including the extreme edge cases and beyond?

I did not, I just made sure that the page loaded and that there weren't any fatal flaws :-(

It's quite possible for directives in .htaccess to get "stuck in a loop", but not for a few hours. An internal rewrite loop will break within a few seconds (assuming you've not increased the LimitInternalRecursion value from the default 10?).

To my knowledge I haven't changed this... it's not in my .htaccess and I don't see it in httpd.conf.

But now you have me thinking about the private messages script that runs on a setInterval(), that's the only thing I can think of that would have been on a loop... but I haven't changed it in a month and there's no reference to it in the .htaccess, so unless something is inadvertently hitting it...?

If you are using the "N" (next) flag on the RewriteRule directive then things can break horribly if there is no way for the rule to "fail" - memory and system resources can go through the roof and bring your server to a halt. However, you would see this in a few minutes at most - unless it had been stuck in this "frozen" state for a long time - although I would think Apache would crash before too long?

I went through and checked, I don't use [N] anywhere. I use [NE] a few times and [NC] a lot so there was the potential for a typo to have accidentally created [N] when I meant [NC], but I'm not seeing it anywhere.

What version of Apache are you running? (I'm assuming 2.4)

Correct, v.2.4.41

csdude55

3:54 am on Apr 8, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, my friends... I have an update, but little info to go on.

First, let me note that I had to enable LogLevel via WHM in the Apache configuration, not in .htaccess. It didn't give "trace" as an option, but I was able to set it to "alert".

I enabled the .htaccess in /ww2/ at around 7:30pm, and let it run until 8:15pm with no apparent problems. I did have a typo in the .htaccess file that was written to the error log, though, so I know that was working. But then I had to step away, so I renamed it again.

Then I came back at 11pm (when the server load is usually pretty low), and within a few minutes found the site to be unresponsive. I opened "top -c" via SSH and found that the server load was at around 80 (I have 2 CPUs, so 2 would be a high load... 80 is off the charts). Nothing looked particularly high, though! %CPU for MySQL was at around 20 and %MEM was at around 10, both normal numbers when something is running.

I renamed the .htaccess again and restarted Apache, then everything came back to normal pretty quickly.

From there I looked at /var/log/apache2/error_log, and nothing new was there. After the error from 7:30pm, the only thing was from the reboot:

[Tue Apr 07 23:22:16.277433 2020] [suexec:notice] [pid 1103] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Apr 07 23:22:16.293067 2020] [:notice] [pid 1103] ModSecurity for Apache/2.9.3 (http://www.modsecurity.org/) configured.
[Tue Apr 07 23:22:16.293094 2020] [:notice] [pid 1103] ModSecurity: APR compiled version="1.7.0"; loaded version="1.7.0"
[Tue Apr 07 23:22:16.293492 2020] [:notice] [pid 1103] ModSecurity: PCRE compiled version="7.8 "; loaded version="7.8 2008-09-05"
[Tue Apr 07 23:22:16.293542 2020] [:notice] [pid 1103] ModSecurity: LUA compiled version="Lua 5.1"
[Tue Apr 07 23:22:16.293562 2020] [:notice] [pid 1103] ModSecurity: LIBXML compiled version="2.9.7"
[Tue Apr 07 23:22:16.293582 2020] [:notice] [pid 1103] ModSecurity: Status engine is currently disabled, enable it by set SecStatusEngine to On.
[Tue Apr 07 23:22:18.938193 2020] [mpm_prefork:notice] [pid 1225] AH00163: Apache/2.4.41 (cPanel) OpenSSL/1.1.1f mod_bwlimited/1.4 PHP/5.6.40 configured -- resuming normal operations
[Tue Apr 07 23:22:18.938374 2020] [core:notice] [pid 1225] AH00094: Command line: '/usr/sbin/httpd'


I also have an error log and a slow log for MySQL in /tmp/, so I checked those out. But again, there was nothing relevant, just the notice for the reboot.

So now the only thing I know to do is plug in a few lines at a time, then let it run for awhile and see if the load skyrockets again. I hate that type of debugging because it takes foreeeeever, and doesn't really help at all if there's more than one issue (or if one issue makes a second issue occur, but it only happens if they're both active). But with nothing in the logs, I don't know what else to do.

lucy24

4:43 pm on Apr 8, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I hate that type of debugging because it takes foreeeeever
You could start by doing one module at a time: comment-out all the mod_rewrite lines, all the mod_dir lines, all the mod_alias lines, all the mod_authzthingummy lines. Or, heck, just the mod_rewrite lines vs. the mod_everything-else lines, since your typical htaccess is probably three-quarters mod_rewrite ;)

That's assuming your htaccess is all nicely organized with all the different modules' directives grouped together. The server doesn't care, but at times like this, it makes your life easier.