Forum Moderators: open

Message Too Old, No Replies

Blocking countries & server ranges

using Windows Server

         

webcentric

3:47 am on Mar 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




System: The following 6 messages were cut out of thread at: https://www.webmasterworld.com/search_engine_spiders/4784479.htm [webmasterworld.com] by keyplyr - 5:05 pm on Mar 7, 2016 (UTC -8)


We'll after shutting down my dedicate server, I'm back in this game again with my own hand-written blocking and tracking system. Long story short, I'm running into stuff I can't really get a handle on...like this...

91.200.12...
blocked by cider 91.200.0.0/20 It's Ukrainian (I'm blocking the whole country but I'm wondering if it's tied to anything bigger. I can alway block with host string but if there's a larger problem, Might as well deal with it if possible.
host = seo007.heilink.com
UA = Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36

keyplyr

4:17 am on Mar 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



or.net.ua (Orion City, Ukraine 91.200/20) appears to be an ISP to me. But like most ISPs, they lease servers for business & private use. Your problem may be a bad company or an infected account. Personally, I don't block countries, only bad behavior. YMMV.
.

webcentric

1:47 pm on Mar 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks keyplyr. I know this probably isn't the thread for getting into the country blocking debate and I'm down with your approach to blocking bad behavior (which is my only only option for my US traffic) but I' found some country blocking to be effective and not detrimental for me (have a US website little or no relevance to people in the Ukraine). This thread has been most helpful to me over the past few years but I haven't really had the time to put a lot of it into practice. Search works pretty well on this site when trying to quickly nail down a set of ranges by colo or whatever (sometimes). Other times, the information is hard to follow through the threads.

I don't know how any of the following could be accomplished but here are my thoughts on making this better...

1. Separate out general how-to discussions (log analysis, regex, etc.) from actual host/agent/range info.
2. I've often thought that there needs to be an index and separate threads for at least some of these rascals. I'll use AWS as one of the more complex examples.

Sticky post with a
1. quick summary of the AWS problem
2. A maintained list of all known AWS ranges (with CIDR notation included). Use at your own risk obviously. Personally, I think there are better ways of dealing with AWS than trying to track it's ranges. But this would work for any colo I suppose.
3. Options for blocking via host or agent information. You may find it extreme but blocking any host with "amazonaws.com" in it has a very dramatic impact and I haven't found it to be detrimental in any way to my operation (yet). I've whitelisted a few things but this makes the point.

Then let the thread proceed with discussions related to AWS.
If new ranges, are discovered, they can be announce here in this thread as usual but then added to the Sticky in the AWS thread. Bottom line is, it would be helpful to be able to find a summary of a service rather than have to look through reams of historical back and forth on the subject.
Anyway, I know things like this have been wished for around here before and it's not really. I'm sure I'm being a bit naive about it but I know what bugs me about how this thread isn't the effective way for a beginner to get up to speed on the subject(s).

Enough from me. I'll stop cluttering the thread with philosophy now and let it be what it is (which is useful if you're into this kind of stuff).

Thanks for all the info. It's the best resource out there on this subject.

keyplyr

8:38 pm on Mar 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@ webcentric - of course blocking (anything) is a personal choice and what works for one person may not for another. Each web site has its own dynamic. I actually *did* block a couple world regions for a while until I learned to put together a comprehensive system of: header info, request behavior, UA and IP range. I found the prejudice of excluding entire geo regions was blocking a lot of users & potential customers.

Yes, I agree that long threads can be tedious to read through, but as you've said, the search utility here at WW is pretty good, so IMO that's the best way to find specific answers. Some forums address some issues while you may need to read through other forums for the rest.

As for changing the thread hierarchy, or adding more stickys... of course everyone has their preferences but the evolution here has been organic to what works best for the most members.


Here are a couple older threads discussing AWS IP ranges & blocking:
[webmasterworld.com...]
[webmasterworld.com...]

FYI - depending on the type of site, blatant blocking of AWS (or any server farm, colo, datacenter, etc) may not be the best approach. A *huge* percentage of server ranges now include cloud computing which is used for apps & mobile connectivity. For example, both iPhone & Android Facebook apps use AWS, so blocking AWS without allowing these users through may not be the result you had planned.

As I mentioned above, what works best for *my* site is a comprehensive system of: header info, request behavior, UA and IP range filters, allowing some through while blocking others.

webcentric

3:34 pm on Mar 7, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@keyplyr Again thanks for the feedback. There are a few things I keep in mind ...the type of system you describe requires certain skills and assets...

1. Not everyone runs on Linux so using tools like .htaccess isn't even an option for many. .Net does have the web.config file which can be used effectively for certain things but that can get messy in a hurry (difficult to maintain).
2. Many on shared hosting don't have full access to server logs. I know, get a better hosting company ;)
3. Sites running on shared servers don't typically have a firewall available to them.

I run such sites (and I'm totally happy with my hosting BTW). I develop in .Net primarily on shared hosting and needed to be able to deal with this pretty much purely from a database and in code. I'm lucky enough to have the coding and database skills to get it done but, as I'm certain you're aware, whatever you build to manage the problem, it can take a great deal of time (not everyone's forte) and skill to accomplish.

So, bottom line 1 is that I'm not arguing with you or complaining about the structure of this thread or anyone's approach. Bottom line 2 is that this thread is really a highly specialized topic that seems to only be useful to more-skilled webmasters/programmers (maybe I'm wrong). I've learned a lot here and will most assuredly keep learning. I appreciate everyone who helps keep this going and look forward following along as the thread evolves.

lucy24

8:09 pm on Mar 7, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



much funnier than xtglobal

Then again, few things are less funny than most of the robots that crop up in this subforum.

:: back to working on my own behavior-based lockouts, whose code has now advanced one step further, to a site that is-- albeit rarely-- visited on purpose by actual humans ::

adding more stickys
Noooo! I know one forum that has so many stickys-- some of them dating back as much as 10 years-- that they completely fill the first screen, and established users immediately learn to ignore stickies categorically.

Not everyone runs on Linux so using tools like .htaccess isn't even an option for many
.
I kinda think what your fingers typed isn't what your brain intended, but I'm not sure what the intention was :( Surely every type of server offers some type of access control on the directory level?

And yup, if you don't have full access to your logs, and/or can't use htaccess-or-equivalent, it really is time to change hosts. Assuming, of course, that you want either of those things-- and if you didn't, you wouldn't be reading this subforum.

webcentric

2:03 pm on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Lucy24 -- If I understand correctly, htaccess is an Apache component and it's found most commonly on boxes running Linux. My point really was that folks running on a Windows server (for example) don't use this much if at all. Not sure if it's available on a Windows Server install of WordPress for example but who knows? Not me. All I know is that htaccess examples don't work for me and I have to use other means (in .NET, the primary tool is the web.config file). Still, even htaccess examples are translatable to other systems.

Regarding stickies. I was thinking of sticky posts inside threads but actually all that's required there is to make the first post provide meaningful context to all that follows. So, maybe, never mind ;) Speaking of which, it looks someone (could it be our new mod @keyplyr) has taken it upon themselves to clean up some of the info here. Awesome! I think this will make searching and using this info so much more effective. Again, we'll see.

keyplyr

6:21 pm on Mar 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@webcentric - you might reach out to dstiles with some of your questions regarding blocking on a Windows Server. I know he has a pretty good handle on it.

dstiles

7:08 pm on Mar 10, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



He does indeed! Problem is: it's homegrown and difficult to migrate.

I'm willing to try a migration but the migratee needs to be a techie familiar with programming ASP Classic (I've been doing this since before .NET).

wilderness

5:03 am on Mar 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



dstiles,
It's my recollection (apologies if I'm mistaken), that your were constantly frustrated with the amount of time and effort this required?

It's is also my recollection that in the past few years you determined a more standard application of htaccess?

lucy24

5:32 am on Mar 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



htaccess is an Apache component and it's found most commonly on boxes running Linux. My point really was that folks running on a Windows server (for example) don't use this much if at all.

Apache comes in versions for all platforms. (Just recently there was a parallel discussion somewhere hereabouts, about case sensitivity in Apache depending on what platform it's running on.) I don't know how many flavors of IIS there are, and as for nginx, I can't even pronounce it.*

Yes, .htaccess is the part of Apache that can be used by individual sites when it's not your own server and you don't have access to the config file. (Which is presumably called that, or something very similar, regardless of server type.)


* I know how they say to pronounce it, but the recommendation does not appeal.

blend27

5:01 pm on Mar 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@webcentric

A lot of shared Windows hosting accounts these days offer Helicon ISAPI_Rewrite for IIS. You could use .htaccess rules as almost the same. That is what I do actually.

On most sites I have there is WWW folder where I place 2 files:

1. web.config - this one contains the following:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
<staticContent>
<clientCache cacheControlMode="UseMaxAge" cacheControlMaxAge="1.00:00:00" />
</staticContent>
</system.webServer>
</configuration>


2. .htaccess - This file has all the UA and Refferer, some IP Ranges blocks:

#referrer spam
RewriteCond %{HTTP_REFERER} (pizza|burger|button|for-your-|semalt|seo|--production|x00_|s-anal|responsive)
RewriteRule .* - [F]


#RewriteMap file example for blocking individual IPs
#123.123.123.123 123.123.123.123
RewriteMap blockedips txt:anothersubfolder/blockedips.txt
RewriteCond %{REMOTE_ADDR} (.*)
RewriteCond ${blockedips:%1|NOT_FOUND} !NOT_FOUND
RewriteRule .? - [F]


# Firefox/6.0.2 ? from Japan/NTT IP Ranges
RewriteCond %{REMOTE_ADDR} ^210\.163\.39. [OR]
RewriteCond %{REMOTE_ADDR} ^210\.232\.15. [OR]
RewriteCond %{REMOTE_ADDR} ^211\.0\.145. [OR]
RewriteCond %{REMOTE_ADDR} ^211\.123\.206. [OR]
RewriteCond %{REMOTE_ADDR} ^211\.123\.208. [OR]
RewriteCond %{REMOTE_ADDR} ^211\.6\.121. [OR]
RewriteCond %{REMOTE_ADDR} ^210\.163\.39. [OR]
RewriteCond %{REMOTE_ADDR} ^210\.232\.15. [OR]
RewriteCond %{REMOTE_ADDR} ^211\.0\.154.
RewriteRule .* - [F]

#Block bots that identify themselves and I don't want deal with
RewriteCond %{HTTP:User-Agent} (?:CFNetwork|MSIE8|masscan|SeznamBot|NetSeer|proximic|QippoBot|Advisorbot|coccoc|oBot|Comodo-Certificates|Synapse|MLBot|ShopSalad|aihit|Yeti|cuil|Purebot|sai-crawler|LinkCheck|Tasapspider|80legs|java|Made|Toata|ichiro|someone|baidu|yanga|sogou|voila|YoudaoBot|GingerCrawler|copyright|zhewang|QACC|Netcraft) [NC]
RewriteRule .? - [F]




#extra file extensions trap(remove asp and aspx if your app uses them )
RewriteCond %{REQUEST_URI} (.*).asp|php|html|aspx$ [NC]
RewriteRule .? - [F]
# or RewriteRule .? /trap/trap.cfm [L]
# note that trap subdirectory is a separate folder and a separate APP, that modifies anothersubfolder/blockedips.txt file so as soon as it is determined the IP gets written to a file and any subsequent requests get blocked by RewriteMap file example example. Both trap and anothersubfolder are sitting in my WWW folder.


....etc, etc....


That above is the first line of defense for me.

and a mapping to a folder that sits in the root of WWW for my domain hosted on this account

# .htaccess main domain to subdirectory rewrite
RewriteCond %{HTTP_HOST} ^(www\.)?mydomain\.tld$ [NC]
RewriteRule !^subdirectory/ /subdirectory%{REQUEST_URI} [L,NC]


----------------------------------------------------------------------------------------------

Then the subdirectory itself contains its own web.config(some of actual web app logic) and .htaccess(rewrites and redirects) and additional logic in the code itself.

Check it out. Helicon ISAPI_Rewrite for IIS.

Hope this helps.

dstiles

8:50 pm on Mar 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wilderness...

> It's my recollection (apologies if I'm mistaken), that you were constantly frustrated with the amount of time and effort this required?

Only insofar as I believe no one should have to spend time on criminals and idiots. :) Usually takes about 20 minutes a day for two servers (total about 4 dozen sites). Takes much longer to manage the mail server (which IS linux) to combat spam and viruses.

> It's is also my recollection that in the past few years you determined a more standard application of htaccess?

'Fraid not. I run Classic ASP which does not easily implement htaccess - it can do but it is (at least, was) expensive and per site not per server. I believe newer versions of IIS run a kind of htaccess but it would take longer to convert than it does to maintain this system and I would have to somehow include MySQL and a homegrown Control Panel.

I have a home-grown ASP inclusion script which uses MySQL to store IP details and another script to hold UA, Referer etc info.