Forum Moderators: open

Message Too Old, No Replies

AlkalineBOT

behaving a bit aggressive...

         

RonPK

3:23 pm on Apr 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My site's been hit today by AlkalineBOT, a spider operating from an IP that belongs to the University of Vienna. It peaked with 3814 HEAD requests in 22 seconds, and 220 GET requests in 30 seconds. Has anyone else been bothered by this spider?

I think I'll try to block it before it starts GETting those 3814 pages ;-)
The documentation at alkaline.vestris.com says "Alkaline robots support can be disabled for individual configurations [...]", so just putting it in robots.txt might not work. I guess a line in httpd.conf or htaccess will have to do the job.

pendanticist

7:07 am on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the 'heads-up' RonPK. :)

Can you provide the UA string for us?

Pendanticist.

RonPK

7:56 am on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"AlkalineBOT/1.7 (1.7.1904.0)"

pendanticist

8:07 am on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe I should have said: "...the whole string." <chuckle> My bad.

Pendanticist.

RonPK

8:23 am on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe I should have said "this is the whole UA-string" ;-)
There's no mentioning of Mozilla/4.0 or whatsoever.

Here is an example line from my logs:
131.130.xx.xx - - [08/Apr/2003:09:22:08 +0200] "HEAD /menu.htm HTTP/1.0" 200 0 "-" "AlkalineBOT/1.7 (1.7.1904.0)"

pendanticist

8:40 am on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmmmm. I don't have that IP Number showing in my deny file.

I wonder if it would be best to ban by IP range, or 'AlkalineBOT/1.7 (1.7.1904.0)' or just plain 'AlkalineBOT/1.7'?

Pendanticist.

RonPK

9:17 am on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm banning it by 'AlkalineBOT'. If I've read the documentation correctly, the thing was designed to spider the owner's site, and should not be let free on the web ;-)

It originates from 1 IP address only, in my logs. nslookup tells me the IP belongs to a PC in a department that runs a site that links to my site. Maybe they're just spidering their own site and forgot to limit the spider correctly...