Forum Moderators: open

Message Too Old, No Replies

Is it spider or ........

         

circuitjump

8:54 pm on Jul 5, 2001 (gmt 0)

10+ Year Member



216.183.203.5 - - [05/Jul/2001:08:23:02 -0400] "GET /cables.asp HTTP/1.0" 200 25207 "-" "Wget/1.6"

24.128.27.97 - - [05/Jul/2001:08:23:49 -0400] "GET /mail.asp HTTP/1.0" 200 19565 "-" "Wget/1.6"

I wanted to ask if anyone knows if these are spiders or what my stat program claims they are spiders but I'm not sure of it.

Thanks in advance

littleman

12:38 am on Jul 6, 2001 (gmt 0)



All about Wget [gnu.org].

There have been several posts about it laity. I guess it is an old utility making a comeback?

toolman

12:43 am on Jul 6, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been welcoming them with this:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Wget.*
RewriteRule .* - [F]

littleman

2:03 am on Jul 6, 2001 (gmt 0)



That's cool, but the utility does have the ability to change UAs. One way to track wget is that it always requests the domain with the port number, so in your logs it will look like domain.com:80. So if you see a standard UA with a request like that there is a good chance that it is Wget. Another spider that does the same thing is FDSE and some (but not all) other utilities that use perl's IO::Socket module.

toolman

2:16 am on Jul 6, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Most of the annoyances are straight Wget at my haunts. I didn't realize you can change it's UA. AAAAHHH the joys of spider wars and battling bots. What's a good all around mod_rewrite solution for the most common bots, little?

littleman

4:39 am on Jul 6, 2001 (gmt 0)



I really don't think there is one, not if you want to let some in. You have to take it on a case by case basis.

awoyo

2:14 pm on Jul 6, 2001 (gmt 0)

10+ Year Member



>One way to track wget is that it always requests the domain
>with the port number, so in your logs it will look like
>domain.com:80.

It looks like someone has compiled that nifty little feature out of the ones that have hit me. Out of 458 hits to just my front page in the past month none have left reference to a port number in the logs. The only common denominator is that when they come, they hit every page, and they do (did) it often and hard.

I'm using mod_rewrite on some sites and mod_access on another to deny by UA. When that doesn't work, because as you pointed out, Littleman, the ability to disguise UA in Wget is very easy, I begin to deny the IP. Since I started denying the UA the hits have dropped of significantly. There have been a few persistent little kids who are playful enough to change the UA and hit me again, but my thinking is that if it's a bugger, and it's requesting every single page in my internal link structure and only spending 0.3 seconds looking at the page, and doing it over and over, every day, it's got to go. I don't know of any legitimate bots (that I care about) that act so rudely. The nice thing is that the ones that have bothered be are coming from cable modem providers so it's relatively easy to deny these little pranksters. Static is nice!

circuitjump

7:28 pm on Jul 6, 2001 (gmt 0)

10+ Year Member



I saw this post on this site. I think this will help out too.

[webmasterworld.com...]