Forum Moderators: open

Message Too Old, No Replies

Pesky bot/snooper with a bogus referrer.

         

littleman

6:57 pm on May 2, 2001 (gmt 0)



Someone hit on this one before, but I can't find the thread.

Some IPs
208.139.196.241
195.166.232.153
172.138.54.66 -> AC8A3642.ipt.aol.com
216.232.242.23 -> bkgk47gpy1ql.bc.hsia.telus.net
209.214.144.134 -> host-209-214-144-134.msy.bellsouth.net
208.12.29.237 -> host-29-237.dsl-sea.seanet.com
172.167.154.111 -> ACA79A6F.ipt.aol.com
209.86.200.206 -> user-38ldi6e.dialup.mindspring.com
63.102.245.89
66.47.165.130 -> user-112v9c2.biz.mindspring.com
63.102.245.89
66.26.167.26
12.84.111.118 -> 118.chicago-08rh16rt.il.dial-access.att.net
66.26.167.26 -> ilm26-167-026.ec.rr.com
208.12.29.237 -> host-29-237.dsl-sea.seanet.com

Obviously, they are covering their tracks here. They are either going through dialups or proxies. Perhaps the only way to track these guys down would be to contact the ISPs.

The referrer is:
[iaea.org...] -> very good chance it is completely unrelated to the bot. My guess is that it is being used to track the buzz on the spider activity, hence the weird referrer (which probably means they will be reading this). Also to circumvent the referrer based cloaking.

There is also a very good chance they are template sniffing.

UA -> Mozilla/3.0

BoneHeadicus

7:14 pm on May 2, 2001 (gmt 0)

10+ Year Member



I've been getting hit by that forever. I kept going there looking for a serp on the site or something but I can't find one. They keep looking for stuff that hasn't been there for a year or more. If you can figure out a way to block them let me know.

littleman

7:36 pm on May 2, 2001 (gmt 0)



;)
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.mydomain.com.* - [F]

BoneHeadicus

3:10 am on May 3, 2001 (gmt 0)

10+ Year Member



Went back and dug this up. I sure would like to know what this is all about.

uc.nombres.ttd.es (212.170.181.xxx) - Other Agent (Unknown Platform)
[iaea.org...]
01 May -- 03:34:06 -- -- Code 404 Not Found
unresolved (148.78.254.xxx) - Other Agent (Unknown Platform)
[iaea.org...]
20 Apr -- 22:04:20 -- -- /
qld.bigpond.net.au (61.9.208.xxx) - Other Agent (Unknown Platform)
[iaea.org...]
27 Mar -- 19:20:45 -- -- Code 404 Not Found
[iaea.org...]
27 Mar -- 19:55:41 -- -- Code 404 Not Found
qld.bigpond.net.au (61.9.208.xxx) - Other Agent (Unknown Platform)
[iaea.org...]
25 Mar -- 16:50:49 -- -- Code 404 Not Found
ipt.aol.com (172.173.82.xxx) - Other Agent (Unknown Platform)
[iaea.org...]
25 Mar -- 12:39:29 -- -- /
unresolved (209.58.116.xxx) - Other Agent (Unknown Platform)
[iaea.org...]
23 Mar -- 06:07:47 -- -- Code 404 Not Found
internetconnect.net (64.148.19.xxx) - Other Agent (Unknown Platform)
[iaea.org...]
21 Mar -- 13:03:20 -- 00:10 -- /
[iaea.org...]
21 Mar -- 13:03:30 -- 00:37 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 13:04:07 -- 04:49 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 13:08:56 -- 00:27 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 13:09:23 -- -- Code 404 Not Found
[iaea.org...]
21 Mar -- 14:16:28 -- 00:15 -- /
[iaea.org...]
21 Mar -- 14:16:43 -- 00:30 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 14:17:13 -- 04:26 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 14:21:39 -- 00:32 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 14:22:11 -- -- Code 404 Not Found
[iaea.org...]
21 Mar -- 14:55:25 -- 00:18 -- /
[iaea.org...]
21 Mar -- 14:55:43 -- 00:30 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 14:56:13 -- 04:08 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 15:00:21 -- 00:30 -- Code 404 Not Found
[iaea.org...]
21 Mar -- 15:00:51 -- -- Code 404 Not Found
dsl.gtei.net (4.40.145.xxx) - Other Agent (Unknown Platform)
[iaea.org...]
17 Mar -- 22:02:05 -- -- /
[iaea.org...]
17 Mar -- 22:37:24 -- -- /

littleman

3:59 am on May 3, 2001 (gmt 0)



Hey BH, what is up with the X'ing out of those IPs? Does your tracking script do that automatically?

BoneHeadicus

5:14 am on May 3, 2001 (gmt 0)

10+ Year Member



Yeah it's weblog...do I have a setting set wrong in there somewhere...it really a pain and I just live with it.

littleman

5:52 am on May 3, 2001 (gmt 0)



Find
$TrimmedDomain =~ s/(\d+\.\d+\.\d+)\.\d+/$1\.XXX/;
and put a # in front of it so it looks like this
#$TrimmedDomain =~ s/(\d+\.\d+\.\d+)\.\d+/$1\.XXX/;.

There is two of them, one at 1061 another at 1084.

Edited by: littleman

BoneHeadicus

6:11 am on May 3, 2001 (gmt 0)

10+ Year Member



Thanks little...BTW the 404's are referencing an .asp file. My site hasn't had asp anything in it since Feb 2000.

littleman

10:52 pm on May 17, 2001 (gmt 0)



Another approach, this one will work in the httpd.conf file:
SetEnvIfNoCase Referer "^http://www.iaea.org" spam_ref=1
<FilesMatch "(.*)">
Order Allow,Deny
Allow from all
Deny from env=spam_ref
</FilesMatch>

toolman

10:55 pm on May 17, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You know I was thinking about this one today...it hasn't been hitting me so much anymore.

Woz

12:32 am on Jan 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



digging up an old topic, but I had visits today from our old friend Larbin, this time with a UA of larbin_2.6_basileocaml+xxxxxxxx.xxxxxxxxxxx@cea.fr

CEA.fr is, you guessed it, Commissariat à l'Energie Atomique. Sound Familiar?

Question, is it possible to use wildcards in UAs in robots.txt. With so many variations of Larbin around I would like to do

User-agent: *larbin*
Disallow: /

Does that work? It validates. (I'm on IIS so cannot htaccess)

Onya
Woz

Key_Master

1:20 am on Jan 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Woz,

If you are refering to IP 132.166.133.192 then I thought you might like to know that it really does originate from the cea.fr domain. I have no idea of it's purpose though.

Woz

1:28 am on Jan 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep! That's the one. Maybe I'll email him and ask.

Onya
Woz