homepage Welcome to WebmasterWorld Guest from 54.204.249.184
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
How To Block Possible Click Bombing
EmptyRoom




msg:4661120
 1:27 pm on Apr 7, 2014 (gmt 0)


System: The following 3 messages were cut out of thread at: http://www.webmasterworld.com/search_engine_spiders/4660478.htm [webmasterworld.com] by incredibill - 9:41 am on Apr 7, 2014 (PST -8)


I hope you won't mind a rookie question: How do you know if it's a "bad" visitor? How do you know you should ban a certain IP or IP range? Do they send many visits to your website?

I think I am being click bombed, but I can't find anything suspicious in the logs. Normal number of visits... nothing out of the ordinary.

 

lucy24




msg:4661180
 4:22 pm on Apr 7, 2014 (gmt 0)

Distinguishing between a human and an ordinary robot is easy. Distinguishing between good and bad humans-- or real humans and pseudo-humans-- is not so easy. Do your logs show nothing but humanoid visits?

webcentric




msg:4661184
 4:46 pm on Apr 7, 2014 (gmt 0)

I think I am being click bombed


I'd start by banning traffic from AMAZON/AWS as this is a known source of this activity (but not the only source). Then work your way back through this thread and the Server Farm threads for past months/years. Click bombing seems to be bot-related and bots often live in server farms. Beyond that there's a great sticky post in this board related to Identifying Bot Activity and I've also found that blocking entire countries (per recommendations in this board) when they don't contribute worthwhile traffic per your website's business goals and objectives can help to narrow down the problem for you.

Added: I'll add that this is advice from another "rookie" in this realm so it's offered for what it's worth. ;)

EmptyRoom




msg:4661203
 7:13 pm on Apr 7, 2014 (gmt 0)

Thanks for your answers.

I banned a few IPs and few IP ranges that were suspicious to me (HostKey servers from Netherlands and Russia, some SoftLayer servers, etc).

I checked almost every IP that visited my site and I banned all that were suspicious. We'll see if the pattern of "click bombing" continues or not. I'll keep you posted.

webcentric




msg:4661816
 2:51 pm on Apr 9, 2014 (gmt 0)

@EmptyRoom -- I recently banned 5 server farms (with more to come) including Amazon/AWS, Softlayer, Rackspace, DigitalOcean and GoGrid/ServePath and I can't begin to describe the effect it's had on my server resources. Most of those hosts were exposed to me through the actions of a single user agent and I watched it bounce around as I shut each door. The user agent is now also banned but the point is that this problem is wide-spread and blocking small ranges only causes the problem to move around and come at you from a new direction. Blocking an entire hosting company, colo, etc. not only has the effect of leaving one less place for your current antagonist to hide but also means any other antagonist living there is equally thwarted. Over in the Adsense forum, people have reported temporary reprieves from click bombing after blocking AWS only to have the issue crop up from elsewhere. IMHO, it's well worth the effort to block the entire host when you find it rather than try to pick off specific ranges one at a time. I think it will save you time and money in the long run. Again, rookie advice but as I said above, the results I've experienced recently have been dramatic.

Andy500




msg:4662880
 1:43 pm on Apr 14, 2014 (gmt 0)

Ma I ask, how are you guys blocking such wide ranges of IPs? Are you doing it with IP tables or other available resources that are very resource intensive? Or do you have any other methods for blocking IPs/countries that don't affect server performance?

Unfortunately, all I have read so far are methods that involve the server/firewall having to check thousands if not millions of IPs and thus delaying every single (good) visitor's access to the site.

Would appreciate any other methods aside from the resource intensive ones :-) Thanks!

wilderness




msg:4662894
 2:03 pm on Apr 14, 2014 (gmt 0)

IP Ranges alone are not server CPU intensive, and done properly in htaccess cause little delay in serving pages.

With htaccess files @ sizes that most noobs would consider extreme, the delay is in milliseconds.

Andy500




msg:4662924
 4:26 pm on Apr 14, 2014 (gmt 0)


IP Ranges alone are not server CPU intensive, and done properly in htaccess cause little delay in serving pages.

With htaccess files @ sizes that most noobs would consider extreme, the delay is in milliseconds.


Thanks for the reply.

I see. I had read what I mentioned in my previous reply from relatively-credible blogs and even, as I recall, Hostgator's tecnical support's team, although at the time, my main interest was blocking all non-English speaking countries (so 90% of the world's countries). I was told doing so would be crazy and would slow down the page loading of my site dramatically. I do know that Technical support of Hostgator is questionable, but I kind of figured out such would be the consequence using a shared hosting account. Plus I'm guessing doing this in a shared hosting account that advertises itself as "unlimited bandwith" would soon cause trouble.

Fast forward a couple of months later and now we have a solid VPS at a reputable hosting company. The issue is that we have about 10,000 uniques per day (2-3 pages visited per unique) and most of it concentrates at certain peak hours (US evening time). Would we, in our scenario, still be good with blocking countries via IP ranges?

I am not questioning your input, but rather, would like to confirm it with regards to our case. I am actually glad of your reply and would trust the opinion of you guys from that of bloggers and what not trying to plug in their affiliate link to X and Y host. So, if you could please confirm that, that'd be much appreciated.

We are in fact about to fork out 200 bucks a month for Cloudfare to block countries and bad bots, and, considering we are only wanting Cloudfare for that, I think it's a bit too much of a price to pay if we can simply block countries via .htaccess without impacting our page loading time (and thus on-page SEO), or at least impacting it in trivial amounts.

Lastly, I have read in other forums that using IP ranges is not optimal to block desired countries since not all countries are restricted to certain IP ranges and you could be blocking IPs that belong to USA and UK (for example) by blocking the IP ranges of other countries (e.g. Uganda or India). I can understand that there would be a little blending of some IPs, but would you say that sticking to a good table of IP ranges (e.g. one recommended by forum members) would pretty much block all access from banned countries and only give a couple of false positives?

Sorry to go a bit off topic although I think this is still relevant to the main OP. If it isn't, please let me know and I will post it as my own thread.

Thanks again!

keyplyr




msg:4662934
 5:37 pm on Apr 14, 2014 (gmt 0)

The server does process the htaccess directives before delivering the page to the user, but as wilderness said, it is not a significant issue unless it contains complex directives such as database redirects and other resource intense rules. If you just have simple rules and IP blocks (within a reasonable amount) the hit on server response to deliver the page to the user is not an issue. There are many more things to worry about when it comes to server response time.

With a VPS, the IP blocks are done differently and would not be an issue anyway.

IP ranges are not purely regionally specific. Example, China uses many ranges assigned to USA IP blocks. If your objective is to block non-English speaking users, IPs are not a good thermometer. You'll have to be more surgical. Besides, English is the second language for most countries nowadays.

Andy500




msg:4662936
 6:26 pm on Apr 14, 2014 (gmt 0)

The server does process the htaccess directives before delivering the page to the user, but as wilderness said, it is not a significant issue unless it contains complex directives such as database redirects and other resource intense rules. If you just have simple rules and IP blocks (within a reasonable amount) the hit on server response to deliver the page to the user is not an issue. There are many more things to worry about when it comes to server response time.

With a VPS, the IP blocks are done differently and would not be an issue anyway.

IP ranges are not purely regionally specific. Example, China uses many ranges assigned to USA IP blocks. If your objective is to block non-English speaking users, IPs are not a good thermometer. You'll have to be more surgical. Besides, English is the second language for most countries nowadays.


I see, thanks for your reply.

So, in the case of a VPS, should we just mention what you've proposed to our webhost and they should be able to set the blocking rules?

With regards to our site, the targeted market is USA and UK so we really would concentrate on those countries and maybe a oouple of others. We don't have anything against other countries, in fact, the majority of the people in our team are not US/British, but unfortunately, it's a weighing of pros and cons, and blocking just about anything outside of USA and UK is what we'd prefer (maybe also allow Australia/Canada/Western Europe). It's risk vs. reward, and to make it worst we don't make money from any traffic outside USA and UK anyway.

I am in another thread I created where a forum member has mentioned that blocking server farms and database centers would help a lot in blocking scrapers and other bad bots. So, say we want to block countries (maybe not so drastic as above) and lots of server farms (from researching this forum section/asking/reading), do you think all of this would be possible with a VPS and without affecting the performance of our server as well as page loading time?

I'm fairly certain that our webhost has a good firewall (they have great reputation, we are happy with them and they specialize in VPS), so perhaps we could even ask them to do the blocking themselves at the firewall or anything other than the .htaccess file?

P.S. We are looking at both CloudFlare and Incapsula to block countries and bad bots, but if this isn't as difficult or resource intensive as we initially thought (thanks t you folks!), then we'd be happy to drop the considering of CF and Incapsula, and try to do this ourselves with the help of out webhost support and studying and asking in this forum. We'd like to actually learn all about this and not just pull a funny thing when the sizzle hits the fan.

wilderness




msg:4662940
 6:46 pm on Apr 14, 2014 (gmt 0)

my main interest was blocking all non-English speaking countries (so 90% of the world's countries). I was told doing so would be crazy and would slow down the page loading of my site dramatically.


I've been doing same for more than a decade and it doesn't slow my server/host down.

Unfortunately, (at least from your perspective), my sites and pages are simple html in design (and my widget content is very focused), NOT using any of the following:
PHP
MySQL
Java
and a solitary script.

I've provided multiple instances for beginners on denying key Class A's as preliminary restrictions.

wilderness




msg:4662941
 6:49 pm on Apr 14, 2014 (gmt 0)

Lastly, I have read in other forums that using IP ranges is not optimal to block desired countries since not all countries are restricted to certain IP ranges and you could be blocking IPs that belong to USA and UK


100% accuracy does NOT exist in these issues.
There are always going to be a few innocents sacrificed.
You simply monitor your raw logs and ADD exceptions allowing the innocents back in.

wilderness




msg:4662950
 7:14 pm on Apr 14, 2014 (gmt 0)

I've provided multiple instances for beginners on denying key Class A's as preliminary restrictions.


This may not suit your purpose [webmasterworld.com] (I'm not interested in any traffic from the UK), however it's a beginning.

Separating the Aussie and Kiwi IP's from the remainder of APNIC is something I've had in place since 2002-03, however I've not keep the ranges current.

webcentric




msg:4662954
 7:22 pm on Apr 14, 2014 (gmt 0)

Here's a rookie story. My first attempt at country blocking was to take a ridiculously large list of IP addresses (countyy by country) and use them for rules in a software firewall. My very excellent hosting company pointed out to me that the firewall was using more server resources than my database, email and http services were using combined (and I have a pretty large database with a lot of useage). Oh, well, back to the drawing board which lead me to this board.

Now I think more top-down in my approach or perhaps "layered" would be a better term. Let's say, I've decided I don't want traffic from Africa on my site (even if some countries do have large English-speaking populations). I've found a good first step is to simply block the primary AFRINIC ranges (the allocated /8 blocks). It's a short list and while that approach may not stop every visit from that part of the world, it's a huge step in the intended direction. Don't need South America or the Caribbean and some US Territories? See LACNIC?

Of course, then there are all the Legacy ranges which are a mess so sometimes I find myself blocking ISP's in various countries because they're not part of a RIR's allocated /8 blocks and have to be dealt with separately. I guess my point here is that if you can deal with any of the RIR's en masse, it's a pretty effective narrowing tool. Not bullet proof by any means but a simple way to reduce the scope of the problem a bit. It's a sledgehammer approach but it works in some cases.

Added: Or just read the post wilderness just posted as I was writing this.

Andy500




msg:4662960
 7:41 pm on Apr 14, 2014 (gmt 0)

Thank you so much for you replies!

Well, for starters, the sites we're interested in doing all of these are forums, so talk about resource intensive :-D

We are very happy to let in benign human visitors in our site from blocked countries (email or tweet to us), not a problem. But at this point we are going to take a "kill a fly with a cannonball" approach. Interestingly, wilderness, the link you posted to that thread was having issues with spam. In our case, we have disabled registration from many countries and disallow the posting of URLs (plus clean up inactive accounts), and we haven't had any issues with spam. We used to get spam from Bangladesh, China, India, Thailand, Ukraine, and as soon as we blocked registration from those countries, the spam stopped (and disallowing URLs killed it). It's the damn scrapers we are after, so now we want to extrapolate what we did with registrations to incoming traffic.

I think it's pretty clear by now what to do, and I think that we're going to go by continent too (can we allow Europe as a continent but then block Eastern Europe with further blocking rules country by country, for example?).

Now, considering we have a VPS and a good host, where would you recommend that we place the blocking rules/method? Firewall? .htaccess? speak to host first and tell them to block as high as possible in the chain?

Many thanks again.

P.S. We are still looking at a CDN for further bad bot protection (i.e. those from US, Europe, Australia).

[edited by: Andy500 at 7:43 pm (utc) on Apr 14, 2014]

wilderness




msg:4662961
 7:43 pm on Apr 14, 2014 (gmt 0)

webcentric,
There's not any one-size-fits-all in these issues.
Each webmaster must determine what is beneficial or detrimental to their own site (s).

I've never liked the country listings being supplied by some resources, especially when a webmaster begins adding multiple countries and NOT consolidating the IP ranges. In the latter instance a multi-country-list could become very large, very fast, and as a result, making the entire task unmanageable (as you found out).

wilderness




msg:4662970
 7:50 pm on Apr 14, 2014 (gmt 0)

(can we allow Europe as a continent but then block Eastern Europe with further blocking rules, for example?).


You may do anything you wish, as long as your willing to spend the necessary time separating and consolidating the IP ranges.

Any possibility that you'll locate a copy and paste solution to such tasks are virtually impossible. Paying your host an hourly rate to do the same would be an absurd cost.

webcentric




msg:4662971
 7:55 pm on Apr 14, 2014 (gmt 0)

Consolidation of ranges may be one of the the most singularly important points this board has injected into my thinking. Definitely one I had to learn the hard way. Finding ranges to merge is like finding buried treasure. Quite a satisfying pastime.

wilderness




msg:4663003
 10:05 pm on Apr 14, 2014 (gmt 0)

Finding ranges to merge is like finding buried treasure. Quite a satisfying pastime.


RewriteCond %{REMOTE_ADDR} ^24\.(123|153)\.(12[89]|1[3-9][0-9]|2[0-5][0-9])\. [OR]

I'm sure lucy could condense this even further

not2easy




msg:4663046
 5:03 am on Apr 15, 2014 (gmt 0)

I bought a countries list with updates a few years ago and regretted that right away. Don't use those, they can't block everything you need to block and they don't show you what is included so you either need to spend way too much time looking up the details for information you have bought. Free lists are worse, copied, shared and not up to date.

The country lists are huge, because you are blocking ISPs, residential users, everything - and to no good result. There are server farms/hosting servers everywhere so blocking everyone in East Serverstan won't ensure no unwanted visits.

Use log analysis to see non-human activity, and maintain your list of offensive IPs to check against. I don't even share lists between my own sites. What scrapes site1 never visits site 2 or 6. No reason to have a huge set of IPs to check everyone against when a few hundred can handle it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved