Forum Moderators: mack
Just saw this guy, fell into a spider trap:
131.107.137.47 - - [11/Apr/2003:01:31:08 -0600] "GET /a/deep/link.html HTTP/1.1" 200 12589 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
No referer, came in on a deep link (like from a SE), and d/l pages but no images. After about 5 hits, he tried to grab a trap, and got banned. Grabbed a page every 5 secs or so...
IP resolves to Redmond.... did Bill just get himself banned?
dave
>>>> security or three-cents-worth of bandwidth
I am pretty happy with the security on my site, that is not too much of an issue for me. I have a firewall that blocks ALL but port 80 (and one other port, but ONLY if the request is from my IP). My cgi is secure, and I do NOT soley rely on .htaccess password protection. So I am pretty happy there.
My pages do 2-3 gigs a day in bandwidth. The guy I took down yesterday did 79.60 MB (directly from AWStats)... he was not caught by all the bells and whistles (and bait) I throw out. That number APROX equals the bandwidth Google has used month to date on my site (It is within 2-3 megs).
My site also has a lot of really good (some very rare!) pictures of widgets on it. My widget pix are ALWAYS turning up in forums (as avatars, or just to illustrate a point). One of my sites- which sells widgets- is always having it's widget pix show up on e-Gag, too. I am sure THAT bandwidth, unchecked, would exceed pennies a day.
But my main site- the NPO- is an informational site, and is over 500,000 pages long. This one was a Yahoo site-of-the-day a couple months ago (great PR boost, btw!) It sells NOTHING, and only receives income from donations and banners on the site. One, as it's "product" is information, I need to protect that product. If you sell something, you are able to proect yourselves from theives and such, to the extent and in the manner you see fit. Blocking site d/lers is how one does protect their information product. Also, since it exists and is paid for by banner ads, I want/need people to see those on my site, not to d/l the information and leave me with nothing for my work making the site.
If I had none of this in place, I am sure I would be looking at upwards of a gig a day extra in bandwidth. As it is, I always see some stupid robot chuck down 500-1000 403's before they get wise. I wonder what that bandwidth would be if they were getting real pages (at 12-15k each) rather than a 1.2k 403 page? (Yes, I can do the math!)
I agree- things like requests for default.ida are a minor annoyance.... and I deal with them (in a different way) while I go after this bigger site d/lers and other bots. But I ALSO have another reason for actively blocking Bots!
I have been the victim of Nameprotect/Cybervallence. Not just the bandwidth drain... but an actual pending lawsuit based on my "possible" trademark infringement. And it was TOTALLY absurd! COMPLETELY! But, I HAD to hire a trademark attorney to defend myself. I would consider that money directly lost due to NOT blocking a spider!
So, you see, I have my reasons- and some quite good- for doing this. I think, to me SPECIFICALLY- it is more than pennies a day. My main site is a NPO, and does accept tax-deductable donations... you are welcome to become a member for only pennies a day! :)
I think it is hard to know how much these rogue spiders drain from an individual site until you block and log them and see for yourself. I know that I have much more of a problem that say Jim Morgan (number-wise), and so I would say it varies GREATLY depending on the website. And I do not think I am exagerationg at all to say my bandwidth would be 25% higher (at least) w/o blocking.
I must say that this forum has made me much more zealous about this topic... I have gone from thinking I am presenting a website to the world, to thinking this is MY private property, and I need to protect it as such. So, I do what I can.
Besides, it is so easy to do!
dave
So, you see, I have my reasons- and some quite good- for doing this.
Hi Dave! Good answer, and I agree with your point entirely. I think some people (not anybody who posted in this thread, though!), can get carried away by the spider hunt...
Thanks for your illuminating comments (I'm ususally not illuminated before 5 pm, ya' know)!
:) Y
>>> I agree with your point entirely
I was thinking you might, once I made my case. :)
But I agree also that each individual webmaster needs to make the decisions for their specific website, too, and to what extent they need to ban (if at all). I am doing what is good for my site. Don and I disagree often with the extent of a ban, but we both learn from each other, too. Same with Jim and I. But I figure if we keep throwing the info out here, it will help some people. I know I have gotten more than a few PM's about how to ban this or that, and I am more than happy to help out.
>>> I think some people (not anybody who posted in this thread, though!), can get carried away by the spider hunt
I disagree... I HAVE gotten overzealous myself! And have had to cut back a bit. Banned myself once (damn!). I also once misunderstood the extent of a ban on "_vti_bin" (or something like that).
If one is going to ban, one must also look carefully at what you are banning! You HAVE to read those logs, and make adjustments. Just because I- or Don, or Jim, or Martinibuster- say we spotted something and WE are going to ban it, doesn't mean everyone should. That is why I always post the IP, the UA and what it did... I do not bother banning bots that only ask for robots.txt and move on... for ME, that is a waste of time. For others, that is the first indicatiuon of a pending attack, and they act accordingly for their website
So it is GREAT advice to temper ALL ouradvice (from whoever) with a grain of salt, and see how it fits in with the goals and trafic for YOUR website!
dave
Now since MS has a new product that could compete with my product, I silently removed their name from my newsletter list and banned the bot till it stopped. Don't want to ban MS just incase they want to license our stuff.
Also just FYI, I went to a big ISP and read their user policy and found out that at least some give users a static IP for DSL. So I have banned bots from DSL on those ISP’s by just using the IP they cam in on in it’s entirely.
Wow- this guy is a busy little beaver! Could you possibly PM me the e-mail... I want to see if he signed up for MY newsletter, too.
>>> at least some give users a static IP for DSL
I have one, but I had to ask for it. I have telnet/ssh denied at the firewalll for all but my ONE IP... is that security or what?
dave
Could you possibly PM me the e-mail...
131.107.163.47 ... "MicrosoftPrototypeCrawler (please report obnoxious behavior to newbiecrawler@hotmail.com)"
Anyone can get a hotmail address, but the ip-address is owned by Microsoft.
He is crawling a site here now at about one page every 5-10 minutes always reading /robots.txt and then the page.