Forum Moderators: open

Message Too Old, No Replies

$%^$%$# Wget!!

Strange occurance...

         

icehousedesigns

2:29 pm on Jun 28, 2001 (gmt 0)



Hello,

I just noticed something strange. All has been quiet then all of a sudden I get slammed with tons of Wget requests from 10 different IP's, mostly from universities around the world. They are all .htaccess banned now..however I was wondering what are all your thoughts on why this happened?

PS. I'm using Froggyman's ban_bot.cgi script..but it Wget is ruthless in not giving up.

littleman

5:05 pm on Jun 28, 2001 (gmt 0)



Wget is a command line spider that is packaged with most *nix distributions. You could read about it here [gnu.org]. It is a very powerful and handy spider. It is very easy to abuse a site using such a tool. Did it feel like a DOS attack? It could be a simple shell script Cron'ed to launch from several locations.

Or, if the requests are coming one after another it could be a script using wget via proxies. Unfortunately, wget has the ability to change it's user agent, so just banning the UA may not stop it if the guy on the other end is clever.

icehousedesigns

3:11 am on Jun 29, 2001 (gmt 0)



Thanks LM. I knew it was *nix related, but wasn't sure if it was a spider or some sort of browser. Actually there is a DOS port to it..I have it around here somewhere..all I know is it requests docs WAY to fast..I'll analyze that bit of the log for you and get back to you tomorrow.

icehousedesigns

12:03 pm on Jun 30, 2001 (gmt 0)



Ok after contacting a guy who has a server running wget ( I ended up banning his whole domain via .htaccess ) This is the e-mail I got back from him.


The machine is acting as a client for a distributed search engine, and
is crawling sites sent down to it from a central server. You can hit the
web site of the project at www.grub.org, and probably submit your URL to
be placed on a "Do not crawl" list *grin* You might want to email Kord
(the head of the project) with any suggestions as to throttling and
such.

So I e-mailed them at support@grub.org. :)

This seems to explain my problem in the first post..