homepage Welcome to WebmasterWorld Guest from 54.242.231.109
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Spider Hunting or Site Building?
Goes in waves, do you trust your bot blocker?
incrediBILL




msg:3931382
 4:02 pm on Jun 11, 2009 (gmt 0)

A few of you probably noticed I haven't been posting much about new spiders lately because I've been busy doing other work to ramp up revenues on my sites when I'm able. Thanks to a bad back, I've barely been able to stay in my chair for more than a few hours a day for many weeks due to debilitating back pain, now being helped with physical therapy.

Anyway, the site is still busy snaring, trapping and logging stuff left and right, just haven't had enough time to look and see whats in the logs that might be interesting enough to discuss.

Does anyone else ever feel like they've gotten to the point that their bot blockers are solid enough you can ignore it days, weeks or months even and feel relatively secure that the site isn't being horribly abused without being closely monitored?

 

Samizdata




msg:3931412
 4:42 pm on Jun 11, 2009 (gmt 0)

Does anyone else ever feel like they've gotten to the point that their bot blockers are solid enough

I have felt that way many times. And then something new came along...

I am confident, though, that the vast majority of undesirables are thwarted, and I lose no sleep over the occasional one that slips through - I deal with the issue as and when time allows.

Site building is infinitely more important than bot blocking.

...

blend27




msg:3931675
 12:08 am on Jun 12, 2009 (gmt 0)

I guess I got it(random logic script) to the point that I understand that if they really want to scrape my sites -they will, but one thing I am Sure of is that they'll realllllllllly have to work for it. And if they get thru, Lord Bless them for their knowledge of how the web works and evolves.

With above said in mind I don't loose any sleep either over an occasional couple of pages being victims of "hit & run", simply cause the content of those pages is doomed to be mentioned as snippets on major SEs and I am on the front page for those that are desired for quiet some time.

But I have to admit one thing, what I have learned, I have learned Here.

Thank You WebmasterWorld, Members, and Participants!

Blend27

enigma1




msg:3931890
 8:21 am on Jun 12, 2009 (gmt 0)

I think, it's an ongoing effort so no I cannot say that for my part. There are also different areas bots target and the countermeasures in place can detect certain things far easier than others.

At the application level, scraping is getting harder to detect and block but at least hacking detection is a bit easier. Then with the server, you can never tell if there is something wrong with the host on time, that's perhaps another back door.

I am also getting false positives from time to time and so I always need to revise these countermeasures. Plus quite a few of these defenses are experimental and so I cannot just recommend them to others.

Hope you have a quick recovery Bill.

Hobbs




msg:3932113
 2:52 pm on Jun 12, 2009 (gmt 0)

I get no solid continuous sleep, I have no way to work uninterrupted, never have I looked closely without manually blocking a handful of IPs or ranges, it's becoming a nightmare.

Pfui




msg:3942022
 6:24 pm on Jun 28, 2009 (gmt 0)

I've been mulling this over since you first posted it, Bill. And mulling. And mulling. And I think my Number One impediment to creating new content, a.k.a. my Number One management problem, is not related to identifying/stopping search-related bots or notorious UAs and IPs. Rather, it's foiling Zombies [en.wikipedia.org] and spam Botnets [en.wikipedia.org].

-- Every five minutes or so, tag teams of two, three, five and more coordinated spambots from different infected Hosts/IPs make a beeline for specific POST files and launch their code. Used to be it was a single zombie somewhere but now, yeow! The numbers and locations of simultaneously controlled machines is a management nightmare. And the magnitude of infestation is also pretty alarming when you think of the Conficker, etc., potential to control, coordinate and attack.

-- If you have POST-possible files, how do y'all handle botnets? Do you lump them in with bad bots and auto-block based on X, Y or Z characteristics? Or--?

I use a daily-updated combination of mod_rewrite and deny from when it comes to notorious Hosts/IPs, even entire countries, ditto UAs, plus mod_rewrite REQUEST_URI and REQUEST_METHOD. And I have other blocks and such here and there that catch 99%. (knocks wood) But I'm still overrun by foiled attempts.

And alas, Key_Master's bot-stopping script [webmasterworld.com] requires writing to .htaccess and after re-approaching it every few months, I've yet to get it to work, let alone adapt it for botnets. Maybe next time, if I try...

-- So anyway, that's my I'd-rather-be-making-not-blockading tale o' woe, and I commiserate with all of us independent webmasters contending with the daily increase of bots of all kinds.

Heck, even Google can't keep spambots from its e-mail and groups, ditto Pogo and spambots in its chats, so chances of my solo success are slim. But I keep trying!

Hmm...

Maybe Tom Clancy will write a book about computer zombies and botnets, and the hero will solve it all:)

Lord Majestic




msg:3942076
 8:52 pm on Jun 28, 2009 (gmt 0)

Focus on revenues from your sites and get well soon.

Life is too short trying to fight every single bad bot out there, get better hosting with more bandwidth not to care about resources and only fight those who steal your content to repost it.

dstiles




msg:3942090
 9:26 pm on Jun 28, 2009 (gmt 0)

Majestic: that was not a useful post. Botnets are not to do with bandwidth but with trying to hack into servers or kill them. There is a vast difference between bots and botnets.

Pfui: last night I got about 200 hits from about 40 IPs within about an hour. All used the same UA - a very basic MSIE one that, from my current block-list, seems to be used for nothing BUT bad bots. This time it came with a new combination of headers but hey, what's one more MSIE casualty! :(

The target files were security types - "buy" folders, SSL cert folders & files, that sort of thing. None of them exist on the target site but I guess they thought a directory of shops might come up with something. None of our (more and less popular) sites were hit so I guess the domain name may have been the magnet. We do get quite a few attempts on it but not usually this bad.

An hour later the attempts came back. I turned off the site for five minutes and they went away.

A few hours after that attack there was a new one on the same domain, this time with a wide variety of UAs from mobiles to scrapers. This time it was only a single Comcast IP, but the files requested were try-your-luck types (eg iisstart.asp) rather than security-concious ones. No idea why they try such things but it seems to please them.

So Bill, I would say that with the number of evil and half-cunning sods out there trying to either subvert servers or, in a few cases, kill them, I would say it pays to be on guard. I appreciate that your own guard is more mature than most of us have so you may be able to relax a bit. On the other hand, there's always something new in the offing.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved