Forum Moderators: phranque

Message Too Old, No Replies

Apache log file missing some IPs

         

Dagger

11:58 am on Jun 21, 2006 (gmt 0)

10+ Year Member



When I checked bbclone today for webtraffic I noticed the ip 208.66.195.2 identifying itself as psycheclone. It appears to be a robot crawling for email adresses.

When I looked for the same ip adress in the apache log, there is no record of it, or any record in the same timeframe.

I doubt our host has blocked it out from the logs, googlebot and a host of other bots certainly shows. Besides, it seems to be a very recent bot, starting in june from what I can gather by searching the net.

What can cause Ips from being excluded from the apache log?

jdMorgan

2:08 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1) Are you on shared virtual hosting with non-real-time log file updates? If so, access records may be dropped if the cron job that 'splits up' the main server log fails to execute or if it runs out of time to get its job done.

2) Have you used the CustomLog directive (mod_log_config) to exclude some accesses from being logged? If so, look to the logic (usually implemented with mod_setenvif) used to determine whether a particular access should be logged -- there is probably a bug in it.

Jim

Dagger

2:19 pm on Jun 21, 2006 (gmt 0)

10+ Year Member



I do not have access to the apache configuration file. The daily log is generated for us one time a day. Guessing it happens too all the other domains too at that time. Will see if it happens with the same IP adress tomorrow when the job for today is done.

As far as I understand, I cant use CustomLog because I do not have access to apache configuration, only htaccess.

jdMorgan

2:42 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> The daily log is generated for us one time a day.

"Daily log" -- Then it's most likely that scheduled process is failing or running out of time. This is usually done by a cron-scheduled script that goes through the raw server access log and splits up the access records among the hosted sites based on the logged HTTP_HOST header value.

> As far as I understand, I cant use CustomLog because I do not have access to apache configuration...

Correct.

I would recommend asking your host why access records are being dropped. I doubt that it has to anything to do with one specific IP address or user-agent, unless that address or user-agent is flooding their servers and they have blocked it and excluded it from being logged in order to keep the log file size down. If your 'stats' program records response codes, check to see if those requests are getting 200-OK responses, 403-Forbidden, or something else.

Jim

Dagger

7:22 am on Jun 22, 2006 (gmt 0)

10+ Year Member



The hit from psycheclone was 30 minutes past midnight or so, meaning the first hit on our site for that day, got alot of hits after that time. Wouldn't the job go through it sequentially?

When I check the logs for last night, there is alot of hits from psycheclone. Several hits per second actually. Strange that all the hits for our pdf and word documents returns 404, even though the adress and the documents they look for exists there. Other bots get normal response codes.

Another thing with this psycheclone. 0-2 days after it hits our site, there are some smaller hits shortly after each other from lots of different countries, like India, Korea, Vietnam, Jordan, Poland etc. Guessing this program for some reason go through proxy servers to check some more?