homepage Welcome to WebmasterWorld Guest from 54.234.147.84
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Top reasons to ban bots
Clark




msg:3775708
 12:52 pm on Oct 29, 2008 (gmt 0)

The top n reasons to ban bots are:

  1. Reduce Bandwidth
  2. Stop competitive intelligence on your site
  3. Stop scrapers from competing with your organic traffic on unique phrases
  4. Performance of your web/dB servers
  5. Stop potential script kiddies from finding holes in your scripts
  6. Kind of redundant but, stopping nefarious googlebot spoofers from hammering away at your server

Did I miss any?

 

incrediBILL




msg:3776122
 7:14 pm on Oct 29, 2008 (gmt 0)

6. Kind of redundant but, stopping nefarious googlebot spoofers from hammering away at your server

A better way to phrase #6 IMO would be to "Stop SE hijackings of pages on your site" as more often that not I've found it's really the SE being directed to crawl through a proxy server.

dstiles




msg:3776158
 8:03 pm on Oct 29, 2008 (gmt 0)

Lately, most of the fake googlebot IPs I've seen are from broadband. Possibly someone is using a proxy hosted on a compromised computer but they never seem to go beyond two or three hits.

Item 5 should be worded more strongly or another item added. Script kiddies ain't too much hassle and fairly easily blocked - one false move and they're toast. The major problem is botnets trying to plant SQL Injection and suchlike exploits on your sites. Although at the moment they seem to be off attacking something else - Georgia, again?

GaryK




msg:3776234
 9:18 pm on Oct 29, 2008 (gmt 0)

For the past 2-3 months all my GoogleBot spoofers have been leaving a referrer that tracks back to a link in a message someone posted on a university-based forum. Weird.

My only reason for blocking bots: Stealing is wrong.

dstiles




msg:3776309
 11:07 pm on Oct 29, 2008 (gmt 0)

Out of the last 29 fake googlebot occurrences only one came with a referer - from google australia (I'm in uk but the site was relevant). Which, as noted elsewhere, makes me wonder if this is some idiots following some stupid recommendation on a dummies forum somewhere.

Not sure if this reason is a big one: stop sites pretending to have information about your site in order to draw traffic to themselves. For example, aboutus.org has absolutely no information about our domains except the name but claims it has. One of them has never been (legitimately) indexed by any search engine - it's disallowed in robots.txt - and when you try to go to the page it's completely blank. Personally I blame google for letting them get away with this: I see several other similar scams when googling for stuff. It's another of those scams that ebay and kelkoo (used to?) perpetrate: "Buy DSTILES Now" - I KNOW I'm not for sale!

Others "services" to block include nebuadd, phorm, barefruit and other site interceptors who usually download non-compressed data to make their own life easier, hence not only pushing up the bandwidth but making advertising revenue on the side. Unfortunately these hits can't be intercepted without a lot of hassle and relying on such things as javascript, which is (should be) taking a down-turn at present anyway because of security (SQL Injection infected sites etc).

Not sure about some of the "security" services, either, some of which give a single-point data source to hackers. I'm not really keen on the world knowing what OS my server is running and how fast my pages load and punters can discover that for themselves by legitimate browsing.

wilderness




msg:3776311
 11:16 pm on Oct 29, 2008 (gmt 0)

Here's an old thread [webmasterworld.com] of a similar topic

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved