Forum Moderators: DixonJones

Message Too Old, No Replies

Annoying, strange entries in my log file

Requesting domains I don't have ...

         

physics

5:39 pm on Aug 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is showing up in my access_log. The strange thing is that I don't own www.datecam.com or www.happysky.us and they don't resolve to the same IP as my server (not a shared server).


211.XXX.13.110*[03/Aug/2004:06:09:54 -0700]*GET http://www_datecam_com/dchit.php?cid=712425 HTTP/1.1*http://www_happysky_us*M
ozilla/4.0 (compatible; MSIE 5.5; Windows 98)*302*38*-

Does anyone know what is going on here?

Thanks!

(dots replaced with underscores in the domain names)

[edited by: DaveAtIFG at 5:49 pm (utc) on Aug. 3, 2004]
[edit reason] IP obscured too [/edit]

physics

7:17 pm on Aug 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see yet more domains requested in a similar way. They appear to be owned by the same company. My guess is it's a log file spamming technique designed to get webmaster's attention. It worked in that sense but do they really think I'll buy something from them? I added that IP to my .htaccess block list. Can I to post the whole ip with _ instead of dots or whatever so others can block it also?

drbrain

7:56 pm on Aug 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are your statistics pages spiderable? They may be doing referrer spamming to try and get more PR. Search google for 'allintitle:"usage statistics for"' and you'll find plenty of example sites.

nutsandbolts

9:02 pm on Aug 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seen this on some of my other site logs. Most of them are links to forums. Cannot view the forum until you register. You register because you are interested if someone is talking about your site. Once you do this, you notice about 10 pages of people saying don't spam my log files and being very aggressive for having to register just to see nothing but a spam thread and the webmaster saying "I Thought it was a good idea!..Sorry!"

physics

9:23 pm on Aug 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, I suspect you guys are right ... spammers all the way. My stats aren't spiderable but I imagine they're hoping they are. I added the lines

# LOG FILE SPAMMERS
RewriteCond %{REMOTE_ADDR} ^211\.100\.13\.110$ [NC,OR]
RewriteCond %{HTTP_REFERER} (happysky\.us¦mytravelrates\.com¦thetravel\.us¦blazerunner\.com¦abcsearch\.com) [NC,OR]
RewriteCond %{THE_REQUEST} (happysky\.us¦mytravelrates\.com¦thetravel\.us¦blazerunner\.com¦abcsearch\.com) [NC]
RewriteRule ^.* - [F]

to my .htaccess so they will be banned (htaccess gurus feel free to correct).

kapow

1:11 pm on Aug 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is log spamming. A lot of sites do this using Whois data to find domain names, then they run a bot that request their own domain name from every domain on Whois. They do this to get millions of links from the spiderable stats pages and thus increase their link pop. The Search engines are aware of it and I believe are trying to filter it out. There is lots of info on WebmasterWorld about log spamming. I hate the stuff, it makes my nice stats pages look ugly :(

physics

6:36 pm on Aug 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Right, it's really annoying me because it's throwing off my referral numbers. What I posted above doesn't really work because the entries are still stored in access_log. Does anyone know how to use rewrites to tell apache not to even log certain transactions? Another option is to set up a post processing script which removes all of the entries from these referrers from access_log hourly or daily but that's certainly a pain.

physics

12:29 am on Aug 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, I think I may have the solution.


RewriteCond %{THE_REQUEST} ^GET\ http.*$ [NC]
RewriteRule ^.* http://%{REMOTE_ADDR}/ [F,E=nolog:1]

This should block all requests of the type
GET [......]
Unless you use requests this on your server.
For me normal requests are
GET /foo/index.html
(i.e. no http)
The part E=nolog:1 is supposed to tell apache not to log it. I haven't really tested all this though, any mod_rewrite gurus out there?

kapow

5:42 pm on Aug 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Physics

I don't know much about how requests work (or about mod_rewrite) but wouldn't an initial request from a link in a search engine or on another site - include the 'http://' bit?

I've been hoping for a solution to log spamming for a few years now. Please tell me I'm wrong :)

physics

10:18 pm on Aug 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Kapow. I tested that by clicking on a link to my site from Google and it worked OK. So there's some thorough testing for you ;) Well, also we've gotten orders since I've changed .htaccess so that should tell you something... Your mileage may vary of course! IMHO, it's worth a shot anyway.

The above didn't work as far as not logging the requests. I'm going to try changing my CustomLog line from


CustomLog /usr/local/apache/logs/access_log custom

to

CustomLog /usr/local/apache/logs/access_log custom env=!nolog

as suggested by a tutorial [perlcode.org] I read on this topic... will let everyone know if it works.

kapow

12:20 pm on Aug 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Physics
The above didn't work as far as not logging the requests

Do you mean the log spams still appear on your web stats?

will let everyone know if it works.

I would be very excited to know if you find a solution + a lot of people on WebmasterWorld would use it too :)

physics

7:54 pm on Aug 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi. I'm sad to report that it didn't work (i.e. log spams still show up in stats) but am determined to find something that does... You'll be the first to know (well, after me).

physics

8:30 pm on Aug 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have also tried, in httpd.conf


<IfModule mod_setenvif.c>
# try to stop log spam
SetEnvIf Request_URI "^http.*" nolog
# doesn't work!

# for testing ... worked :
SetEnvIf Request_URI "^/fubar.*" nolog
</IfModule>

The first one didn't work but I think that's because of how the Request_URI is handled (I think in a request like GET [foo.com...] only the goo.htm part is contained in the Request_URI ... I think).
apache mod_setenvif documentation [httpd.apache.org]

I'd really like to get the rewrite version working anyway though. Does anyone know how to replicate these bogus get requests? If so could you please tell me/us (I won't spam your logs, promise ;). The reason is so that I can test things instead of waiting for them to spam me before knowing if things work. I've turned up the logging on mod_rewrite and left those mod_rewrite directives in there so we'll see what those logs show.

BTW, if you want to try any of this make sure you have mod_setenvif and mod_env and mod_rewrite installed.

physics

2:27 am on Aug 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi. Still haven't got this working (but not for lack of trying). In the mean time I've added

grep -ve "GET http"

in the appropriate place to my log rotation script in the appropriate places so that these no longer clutter up my logs. But before doing that I actually made a report of who is log spamming me so I'll know if and when my efforts have become successful. BTW, if I were you I'd give the mod_rewrite method a shot. Someone I spoke to said they have the same thing as me and it's working so maybe it will work for you...

kapow

9:47 am on Aug 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Re. grep -ve "GET http"

Surely legitimate visitors from a SE, directory or link would produce a request with 'http...'?

physics

10:09 pm on Aug 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not on my site anyway. Legit visitors requests look like

66.66.66.66*[16/Aug/2004:01:55:21 -0700]*GET /foo/goo.html HTTP/1.1*http://www.google.com/search?hl=en&ie=ISO-8859-1&q=foo+goo*Mozil
la/4.0 (compatible; MSIE 5.0; Mac_PowerPC)*200*17853*-

Even though the link they click has [yoursite.com...] the actual GET request only requires the URI part of that URL.
I can confirm by looking at my reports that all of the "GET http" requests are junk. AFAIK a request like "GET http" is only legitimately used if you are making some sort of proxy request but even then you need mod_proxy installed, etc. In short, you probably never get requests like this. Have a look in your logs and I bet you'll find the same.