Forum Moderators: mack
Just saw this guy, fell into a spider trap:
131.107.137.47 - - [11/Apr/2003:01:31:08 -0600] "GET /a/deep/link.html HTTP/1.1" 200 12589 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
No referer, came in on a deep link (like from a SE), and d/l pages but no images. After about 5 hits, he tried to grab a trap, and got banned. Grabbed a page every 5 secs or so...
IP resolves to Redmond.... did Bill just get himself banned?
dave
Regards...jmcc
I would say that pretty much clinches it as a bot.... on my site, d/l a bunch of HTML and NO Images is WEIRD (My site is PICTURES of WIDGETS!)
Wilderness:
You blocked the whole IP- 131.107.xxx.xxx?
What do you think they are doing?
Not that I am all surprised that a 'bot out of Redmond is not polite (that is, respects robots.txt), but you would think MS of ALL people would put every possible doo-hicky and thingamabob into something they write. So, do you think the "Redmond Robot" is about 20 megs in source code size? :)
dave
:) :)
Dave
I might better appreciate your despair if I went all the way up to 131. ;)
It's doesn't much matter to me what they (whether its MS or somebody falsely representing themselves as such) are doing?
For me the determining factor is three fold;
1) First they began visits with both referer and ua blank.
2) When the denies began as a result of their actions in line one above, they changed to a UA to get around that. While still not providing if they are a MS bot or providing a link back to the bot which gives us an answer we desire.
3) now they change IP's
I've learned time and again that every time I deny short that it comes back to haunt me. In this instance the 131.107. may even be short :(
Don
LOL!
Do you run a spider trap? For me, that has worked really well. I do not want to get too much into it here, but I run a few different versions. The overlap works great, and it catches them in process, not after the fact- I guess that is why I do not go for blocking whole C, B or -GASP- A blocks. I do, on occasion, when traffic is all over an IP range, and I know I do not have to care at all about the range (read Maylasia, Cybervalence, IA, etc)...
dave
I'm sure there are some bots I haven't seen and perhaps never will due to both the content of my sites and the narrow market. Most of the other malicious ones are already denied.
Thanks for the hint :-)
Don
Well, between Jim and I, I think we have perfected the trap! He comes up with a new idea.... then I add something else.... we have it so it is pretty darn foolproof now. And it gets a lot of what gets past the IP and UA blocks. But then there are some that get by that, too.... and I catch in a bandwidth or CPU throttle. If they get by all that- and I just discovered one that did!- they deserve to get whatever they can! (Kidding)
Jim is a bit more cautious than I am in regards to the trap... I am a bit more, uh, proactive. I am ALWAYS banning Ask Jeeves (which is a very poorly behaved spider), and I know Jim makes allowances for that one.
Anyway, I just see it as another line of defense, and I would reccomend you do it!
dave
Martin,
You have a more varied particpation than Dave and myself in these forumns.
Not sure how you either miss or not understand the concept or method?
Each webmaster makes a determination as part of the goals for their website on visitors and use of their content. In the end it's the overall scheme of things rather than a solitary portion, whether it's pennies or buttons ;)
"My bandwidth" rather than defining pennies might better be interpetd as boundaries.
Don