Forum Moderators: phranque

Message Too Old, No Replies

Setting up logresolve?

         

lappert2001

7:03 pm on Apr 21, 2019 (gmt 0)

10+ Year Member



Our server runs Apache 2.4.10 and has approximately 30 virtual servers. Some are very active and some are not. There is one domain in the less active category (it's new) that we wish to log the resolved domains, not just the IP. To test, I enabled hostnamelookups on apache2.conf and restarted apache. As expected it looked-up and resolved all new http requests.

That's great, but as I understand it, that turn on the resolving for all domains on the server, and adds a performance hot. We do not want that.

Some research tells me to use logresolv as a post-processing program. -- logresolve [ -s filename ] [ -c ] < access_log > access_log.new
But I'm not clear where and how to use this. Do I put it in apache2.conf?
Or in /sites-available/mysite.conf in one of the directives?
Or in a cron job?

Thanks for any information on this.

lucy24

7:47 pm on Apr 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, there’s always cheating. In the htaccess or <Directory> section for the relevant site, include an invalid ip address (in a Require line or similar). This will throw the server into lookups mode for that site only. At least it did in 2.2. (Three guesses how I know.)

But can't you put the hostnamelookups directive in a section that applies only to that one site, rather than loose in the top level of the config file?

There will be a performance hit--there's no way around it, since you are explicitly asking the server to do extra work--but that's the tradeoff you've chosen to make. And I really doubt it's significant enough to be noticeable, unless the site in question gets millions of daily requests. In which case you can kick them out to a physical server of their own.

lappert2001

9:17 pm on Apr 21, 2019 (gmt 0)

10+ Year Member



Thanks, I'll need to think about all this. We're not in the million-per-day category, but certainly 5-10,000. If not that significant, perhaps using hostnamelookups for the entire server would not kill us. On all our sites, it's mostly text and photos; we don't stream video or audio.

Any links showing implementations and exact language of the cheating method you've described?

lucy24

9:45 pm on Apr 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the cheating method
Well, I discovered it by accident once when I'd modified a couple of Deny lines (I'm in 2.2 on shared hosting) and ended up with something like

Deny from 1.2, 1.29

That lone comma, having no place in an IPv4 (or, for that matter, IPv6) address, and not needed by the module's syntax, was sufficient to throw the whole site's logs into lookups mode. I can say with certainty that only the one site was affected, because I had other sites on the same server, in fact in the same userspace, and their logs carried on as normal.

But that's the way to do it by accident. I should think that anything which can be done by accident can also be done on purpose.*

:: detour to Apache docs ::

Yup. The hostnamelookups directive, aka Doing It On Purpose, can be used in directory sections and also in vhost envelopes.

Finally, if you have hostname-based Require directives, a hostname lookup will be performed regardless of the setting of HostnameLookups.
I think that's the part that comes into play in the Doing It By Accident version: if it isn't a valid IP address, the server assumes it's meant to be a hostname. And once you're looking up one name, you have to look up all of them.

phranque? Can you find where in the Apache docs it says where the logresolve directive goes? The documentation seems to consist of about three lines, telling us what to say but not where to say it.


* Except in the children's room of the library, where small children routinely cause the computers to do things they have been explicitly set up not to be able to do.

lappert2001

10:28 pm on Apr 21, 2019 (gmt 0)

10+ Year Member



Thanks, I'll start looking at this.

phranque

10:54 pm on Apr 21, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



But I'm not clear where and how to use this. Do I put it in apache2.conf?
Or in /sites-available/mysite.conf in one of the directives?
Or in a cron job?

phranque? Can you find where in the Apache docs it says where the logresolve directive goes?


it's not a directive per se.
it's just a standalone program that slurps an access log file on stdin and spits out records with IPs resolved on stdout.

i would suggest a cron job for this.
you would typically have a cron job to rotate log files.
(using rotatelogs [httpd.apache.org] perhaps?)
you would add a new cron job subsequent to the log rotation that would do the DNS resolutions for the newly rotated log file, outputting the results to a new file.

if you needed to observe something for the current access log file, you would have to run logresolve ad hoc from a command line or however you do such things (run programs on your server) in your configuration.

another option would be to download log files from the server and run logresolve locally.

lappert2001

10:15 am on Apr 22, 2019 (gmt 0)

10+ Year Member



So either I'm dense or something is wrong. I'm attempting to use logresolve, at first running as a command (perhaps later in a cron). Pretty much any page on the subject uses this construct: logresolve [ -s filename ] [ -c ] < access_log > access_log.new

My access logs are at /var/log/apache2/mysite.access.log

I have tried various combinations:
logresolve mysite.access.log mysite.access.log.resolved
logresolve mysite.access.log > mysite.access.log.resolved
logresolve < mysite.access.log > mysite.access.log.resolved
logresolve -s mysite.access.log.stats -c < mysite.access.log > mysite.access.log.1.resolved

... and it just hangs. What the heck am I doing wrong? At first I just assumed the "<" and ">" were just containers for access_log used in the example, and not literally used. But after several tries, I put them in just to see if they actually are part of the command. But still, nothing.

Thank you.

lappert2001

10:55 am on Apr 22, 2019 (gmt 0)

10+ Year Member



UPDATE. OK, perhaps I'm impatient. Still ten minutes is pretty long to wait for completion.

My original log file was 326,627 bytes. After a long wait, it finally finished with mysite.access.log.resolved at a size of 346.498.

The command I used was: logresolve < mysite.access.log > mysite.access.log.resolved

It worked, but when I added the [ -s filename ] [ -c ] to the command, I got a segmentation fault.

So now I'm trying it using the -s and -c, but removing the < and > symbols, and we'll see what happens.

30 minutes later, still hanging
logresolve -s mysite.access.log.stats -c mysite.access.log mysite.access.log.resolved

I'm ready to give up.

lappert2001

11:34 am on Apr 22, 2019 (gmt 0)

10+ Year Member



Found this:
[manpages.ubuntu.com...]

Installed and will look at it later. (we run debian on our server, not ubuntu, but that shouldn't matter).

lappert2001

11:46 am on Apr 22, 2019 (gmt 0)

10+ Year Member



OK, that took about 10 seconds. I think this means problem solved. I use the info in a program called Accesswatch (similar to awstats), very old, but works. I have Accesswatch read the log file a few times a day. So now I can set up ip2host to run in cron before that.

phranque

11:52 am on Apr 22, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The square brackets mean that those are optional to the command. You do not use the square brackets when typing the command.

lappert2001

1:38 pm on Apr 22, 2019 (gmt 0)

10+ Year Member



Thanks. I wasn't using the square brackets. I did understand that :)

The questions I had were about the greater than ">" and less than "<" symbols. I was unclear about using:

logresolve -s filename -c access_log access_log.new
or
logresolve -s filename -c < access_log > access_log.new

lappert2001

3:30 pm on Apr 22, 2019 (gmt 0)

10+ Year Member



So now I have two follow-up questions that are related to all this.

1. In setting up a cron job, the log file analyzer I'm using is perl script Accesswatch (http://www.accesswatch.com/ -- old but does what I want it to do). So I execute this 3X a day in a cron job:
sh /home/mysite/public_html/access/access.sh ... where access.sh is a small script that runs two scripts via perl:
perl /home/mysite/public_html/access/accesswatch.pl
perl /home/mysite/public_html/access/saveday.pl

Now let's say one instance executes at 3:58 AM. But I want ip2host to run before I run access.sh.
Is there a way to hold off on the second command (accesswatch) until the first one (ip2host) is finished? Can I somehow combine the two commands in the crontab, or add ip2host to the access.sh file?

I hope this all makes sense.

2. Second question: Once I run ip2host, I have a new file mysite.access.log.resolved. How would I go about rotating the resolved files in the same manner that the regular log files are rotated? Would that be a directive in /etc/apache2/sites-available/mysite.conf
Or somewhere else?

Again, thanks to all who have responded.