|Access Log IP field format|
Changes when I modify .htaccess
My apache access log has always logged the first field, the IP, in the standard #*$!.#*$!.#*$!.#*$! format. Recently, I started seeing different data in that spot, like:
crawl-66-249-72-205.googlebot.com - - [14/Feb/2012:00:00:18 -0500] "GET etc."
I turned some things on and off in my .htaccess file until I finally figured out what is causing the change - it's this:
deny from 18.104.22.168
deny from 22.214.171.124
deny from 117.135.129 # SosoSpider
deny from 126.96.36.199 # Romania hacker
allow from all
When those lines are in place, my log format changes to the expanded IP as above. When I comment it out, I get the usual #*$!.#*$!.#*$!.#*$! IP format.
Any idea why this happens? Ideally, I want to keep blocking those IP addresses, but I want to keep the usual IP format. If need be, I could block these IP addresses in the httpd.conf file (or the pre-include through cPanel on my VPS), or at least I think I could.
I discovered this several years ago... it's apparently either/or. Take out the "comment" (# SosoSpider, etc.) and it will return to IP only.
FYI, triple x's will get caught in the forum filters. use nnn.nnn.nnn.nnn
Yes, that seems to have worked. I put the comments above the deny lines, and that was fine. I actually have a deny that is:
deny from Sosospider
When that is in, I immediately see the expanded format. So add to your suggestion not to put comments at the end (um.... I guess maybe that's not valid in .htaccess...) to not use "deny from name", which I did read somewhere was ok to do.
This is an interesting coincidence. Only a day or two back, I discovered that adding even a single Regular Expression to my IP block list has the same effect. Something as rock-bottom simple as changing
(I experimented to confirm it.)
|Regardless of the setting, when mod_authz_host is used for controlling access by hostname, a double reverse lookup will be performed. This is necessary for security. Note that the result of this double-reverse isn't generally available unless you set HostnameLookups Double. For example, if only HostnameLookups On and a request is made to an object that is protected by hostname restrictions, regardless of whether the double-reverse fails or not, CGIs will still be passed the single-reverse result in REMOTE_HOST. |
|This configuration will cause Apache to perform a double reverse DNS lookup on the client IP address, regardless of the setting of the HostnameLookups directive. |
None of this leaves me any the wiser :( I can't think where else to look.
my guess is that anything in an Allow or Deny directive that isn't obviously a simple IP address including comments and regular expressions may look like a possible hostname and the double reverse DNS lookup is in effect.
once you have the remote hostname it uses that for the %h value in the default common log format.
i would assume you could have it both ways by using a custom log format that specifies %a (Remote IP-address) in the first column.
@lucy24: No regex in Deny,Allow, but can use cidr blocks:
More info here: [25yearsofprogramming.com...]
|@lucy24: No regex in Deny,Allow, but can use cidr blocks: |
Yup, that's where it helps if your father taught you the binary system when you were eight. At first I had to count on my fingers and draw rows of dots and plug in the abacus, but now I can look at a pair of numbers and say /19 or /12 without even counting. Barring those weird ranges that keep nibbling at adjoining blocks until they end up with something like nnn.1.0.0 - nnn.40.255.127
The issue came up in this thread [webmasterworld.com]. I've never used anything but CIDR* blocks. Would never have occurred to me that you could use anything else.
* Detour to g### here. Classless Inter-Domain Routing. Sounds like a modern commuter train.
I thought the issue had to do with HostnameLookups. From the apache2.conf> I prefer to have it on, but the default is off.
# HostnameLookups: Log the names of clients or just their IP addresses
# e.g., www.apache.org (on) or 188.8.131.52 (off).
# The default is off because it'd be overall better for the net if people
# had to knowingly turn this feature on, since enabling it means that
# each client request will result in AT LEAST one lookup request to the
it's faster for your server and therefore the visitor to have it off since the response won't be returned to the user agent until after the reverse dns lookup occurs.
if you really need this information it is far better to post-process the log file.
We decided to have it on because our server is not high-load to start and our ancient log analyzer does not do post processing.
Can you recommend an application or utility that would do post processing?