Welcome to WebmasterWorld Guest from 54.158.86.243

Forum Moderators: Ocean10000 & incrediBILL & phranque

Blocking the IP of a Bot that is hogging resources

     
2:07 pm on Sep 27, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2003
posts: 721
votes: 1


My site has been hit by this fake IP for two months now. I tried blocking the IP, contacted the host but they continue and it appears to be a spoof IP. They consume so much data that it prevents real users from using the site.

It's a bot without any determined name. How can I block it?

It looks like this. 114-103-162-69.static.reverse.example.net

Thank you
2:33 pm on Sept 27, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3104
votes: 120


Before suggesting something that can't help, a few questions to help get it under control. It is important to look at the raw access logs for your site if possible.
1. What do you see in your logs? Are requests receiving 200 responses, 404 or 403 or something else? If they are already getting a 403 response, they are already blocked but you can't prevent them from trying.
2. What is the UA? If the UserAgent is unique, you can try blocking via UA - but not if they are already seeing 403 responses.
3. What methods have you tried that are not effective?
4. Are you working in .htaccess or httpd.conf?
3:11 pm on Sept 27, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2003
posts: 721
votes: 1


I tried htaccess mostly. Now I'm about to try blocking the bot but not sure about how to block a bot with empty strings or random characters.

Do I need to add all possible characters here? How do I indicate empty strings?

RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule ^ - [F]
3:34 pm on Sept 27, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3268
votes: 20


as not2easy is said, what do your logs say?

if the requests are all coming from the same ip, it is irrelevant if it is fake, you just block that ip in htaccess or if you have access to http.conf ... end of story.
(although there isn't a lot of point making a request from a fake IP - so are you sure it is a fake IP?)

what you posted seems to be a hostname, if those numbers are real then it is a chinese static broadband account, i doubt you'll be able to get them to stop.

what does a typical line in your logs say?
it is trivial to block a bot with an empty User Agent or no user agent, however is that what you actually meant by a 'bot with empty strings'?
3:41 pm on Sept 27, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 269
votes: 19


I tried blocking the IP...


What did you do (what code did you try) to block the IP?

Now I'm about to try blocking the bot...


What were you doing before (by blocking the IP), if you weren't trying to block the bot?


...How do I indicate empty strings?

RewriteCond %{HTTP_USER_AGENT} ^-?$ 
RewriteRule ^ - [F]


That does block an empty user agent string. I think the -? bit is superfluous though (I think that comes about from a misunderstanding of what people see in their raw access logs?).
4:41 pm on Sept 27, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2003
posts: 721
votes: 1


Blocking all of China? I know it's a controversial thing to do...
4:59 pm on Sept 27, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3104
votes: 120


China has many IP ranges, some of them shared with other countries. It's not one line of code, and it is not static either.

Looking at your raw access logs would make this a simple matter, at this point no one knows whether the offender is seeing more than error pages or even whether these are "GET" requests.
6:46 pm on Sept 27, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3268
votes: 20


>>blocking all of china

not that controversial, lots of people do it, but it requires using a database that matches ip's to countries ... also as not2easy says the allocations change over time.

>>Looking at your raw access logs would make this a simple matter,

agreed!
10:09 am on Oct 1, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7743
votes: 262


The IP in the OP is reversed so I would block: 69.162.103.114

But as not2easy said, you'll still see the requested file attempts in the server log even if they are blocked.
3:59 pm on Oct 1, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3104
votes: 120


As keyplyr said, the IP shown in the OP is not the IP to block, that's why checking the logs was suggested. Then you could also check whether all that traffic is from only that IP or whether you should be blocking the CIDR. Blocking a single IP address seldom blocks all related unwanted access.

In the long run, it pays to know who/what you are blocking. An IP lookup can give you more complete information before blocking an IP, to make sure you aren't accidentally blocking some service that you are using and to make sure that blocking it will be effective. In this example an IP lookup can show you that the IP 69.162.103.114 belongs to a Limestone/Dallas server so you know it is not likely to be a real person or their IP would probably be an ISP company, not a server hosting company. (please forgive me if I don't ramble off into headers here).

The range is shown as NetRange: 69.162.64.0 - 69.162.127.255 and the CIDR (that stands for "Classless Inter-Domain Routing" and would include every number from 69.162.64.0 to 69.162.127.255) would be 69.162.64.0/18

When you "block" IPs with a 403 error (that's the "[F]" part) you should set up your own 403 error page and point to it in .htaccess the same way to designate a custom 404 page for the same reason: to give people who experience these errors a way to help you avoid these errors. Sometimes real people get blocked.
1:43 am on Oct 2, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7743
votes: 262


The IP (69.162.103.114 ) belongs to:

Host: limestonenetworks.com cloud hosting
69.162.64.0 - 69.162.127.255
69.162.64.0/18

Other ranges of theirs include:
63.143.32.0 - 63.143.63.255
63.143.32.0/19
64.31.0.0 - 64.31.63.255
64.31.0.0/18
74.63.192.0 - 74.63.255.255
74.63.192.0/18
192.169.80.0 - 192.169.95.255
192.169.80.0/20
208.115.192.0 - 208.115.255.255
208.115.192.0/18
216.144.240.0 - 216.144.255.255
216.144.240.0/20
216.245.192.0 - 216.245.223.255
216.245.192.0/19

Since it is cloud hosting, this actor is likely to come randomly from various nodes within the above ranges, so it would be prudent to consider blocking them all (I do.)
3:48 pm on Oct 3, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2003
posts: 721
votes: 1


Thank you all. I admit that I am not as technology inclined so I feel a bit overwhelmed. @keyplyr, you say that you block all of their ranges. Have you been attacked by them too?
8:01 pm on Oct 3, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7743
votes: 262


Server farms & hosting companies usually don"t have any legit reason to request files from websites. Many webmasters block them as a proactive preventitive measure.

There are of course some beneficial agents (may be different for each site) that come from these ranges. These are allowed through exceptions.
8:58 am on Oct 4, 2016 (gmt 0)

Senior Member from LK 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 16, 2005
posts:2626
votes: 75


@keplyr, a bit OT as its not the OP's problem, but the problem is that persuading people that you are an exception (or, even worse, persuading people who provide blocking services) is not easy. Many people running small sites do not even know how to make exceptions through their blocking service.

I have a client who runs a beneficial bot (search engine for a language/ethnic group/would-be country). Sites in the niche tend to get DDOSed and otherwise attacked (disliked by people from a particular country), so a lot of them use services to block bots and a good many do not know how to make exceptions or are suspicious so are wary about making exceptions.
9:22 am on Oct 4, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7743
votes: 262


Agreed graeme_p

Example: many webmasters block all AWS ranges, due to the high volume of bad activity, without considering that both the Facebook Android & the Facebook iOS apps also use those ranges.

Being a webmaster means being informed, or at least hiring a trusted person who is.

RE: persuading people you are an exception... That's what the bot info page in the UA string is for. On that info page should be compelling reasons that allowing this UA is of benefit to site owners.
10:55 am on Oct 6, 2016 (gmt 0)

Senior Member from LK 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 16, 2005
posts:2626
votes: 75


@keyplyr it should be, but unfortunately there are a lot of good small sites run by people without much in the way of technical skills or understanding - at least in my customers niche, and they are very wary because they are targeted on the net (and persecuted in real life as well, in some places).

Going back to the original question:

1) Hiawatha web server come with the ability to ban IPs that make excessive numbers of requests.
2) You can do the same with Apache, but you need to install other stuff.
3) If you are using Linux, you could use fail2ban to block an IP after a certain number of attempts. This is probably the most efficient, but while I am fairly sure it would work, I do not know how to do it off hand (I have only used fail2ban to protect ssh)
8:19 am on Oct 9, 2016 (gmt 0)

Junior Member from PE 

10+ Year Member

joined:Mar 28, 2004
posts: 144
votes: 0


Use a bot trap like this one:

[danielwebb.us...]

The idea is simple:

Add a page that you don't want good bots (Googlebot, bing, etc) to crawl to your robots.txt file.

Bad bots don't follow robots.txt rules, and will craw that page that good bots won't.

As soon as the bad bot crawls the trap page, it is blocked by automatically adding a do not allow from to your htaccess file.

I have been using it for alomost ten years now.
8:58 am on Oct 9, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7743
votes: 262


@rominosj - bot trap scripts have been discussed here many times.
As soon as the bad bot crawls the trap page
Yes, they are good for that, however that is also their limitation; the bot needs to take the bait. There are hundreds of agents that behave differently and would not be blocked with these traps.
9:18 am on Oct 9, 2016 (gmt 0)

Junior Member from PE 

10+ Year Member

joined:Mar 28, 2004
posts: 144
votes: 0


But this php script blocks by IP not by UA.

If the bad bot does follow robots.txt rules, then it won't be possible to block. However, the great majority of spammers don't care a bit about rules, and just craw as many pages as possible.

You could also block access to the robots.txt file only to good bots, and forbid access or view to everyone else, then most bad bots will really have a hard time figuring out what pages not to crawl.
9:25 am on Oct 9, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7743
votes: 262


"But this php script blocks by IP not by UA."

Yes, by IP. I used one for a while a few years ago.
2:37 pm on Oct 9, 2016 (gmt 0)

Junior Member from PE 

10+ Year Member

joined:Mar 28, 2004
posts:144
votes: 0


Ok. But did Harry (the post starter) already try that? It might help.
2:14 pm on Oct 31, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2003
posts: 721
votes: 1


I blocked IPs and they still came. Finally, the IPS where I get my VPS from blocked through the firewall and then it stopped. Now, I need to get rid of another bot that swallows our stuff and reposts it, often ranking above us. I've complained to Google and DuckDuckGo many times but they still list them. They seem to be a major and large farm. Not sure why Google has not penalized them yet.
2:28 pm on Oct 31, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3104
votes: 120


You need to complain to the site and their host, not to Google and other search engines. Those search engines will remove the results after you contact the offending sites and tell them to remove your content. This is not the forum for a discussion about DMCAs though. You can start a new thread on the Content, Writing and Copyright Forum: [webmasterworld.com...]

we try to keep things on topic ;)
4:39 am on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 56
votes: 2


114-103-162-69.static.reverse.example.net

I know this thread is a bit long on the tooth, but if you have only the host name but wish to block it you will need the IP address. There's no use in blocking the wrong IP. The easiest way is to find it from your raw access log. Otherwise you need to try various Host command lookups (unix "Host 114-103-162-69.static.reverse.example.net" and see what is returned) to see if one of them match. You can also throw the host name into Google Search and see if someone else will provide the IP address. If the spammer is prolific they will have left a worldwide audit trail of spammed servers for you to find their IP, or at least how to decode the host name. Their pattern might be quite evident after seeing a couple of Google examples.

There are so many variations in IP addresses for host names, even when it says "reverse" in the host name, it might not be reversed. There is the straight a.b.c.d, the reversed d.c.b.a, but some Indian hosts do b.c.d.a. I have also seen c.d.a.b. Sometimes they use hex in the host name, a bit more complex.These hosts try to hide their IP so they do not get banned.

Since forum rules say you must replace the actual name with "example", it will be difficult to figure out the actual IP address. We could guess but may not be accurate. Post the host name and I'll take a crack at it. I do love obfuscated host name challenges, weird, I know.
9:32 am on Feb 12, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2546
votes: 50


Expand the Apache logging to log IP addresses in addition to host names. That way you can easily identify problem IPs and spoofed IPs. Then if you need to identify scrapers, it can be done with a few lines of script code. When you have the IP, you can do a WHOIS lookup to check the IP's owner. If the IP is in a data centre or VPN or Cloud provider, you can make the decision to block the IP or the range of IPs. Don't waste time using Apache config files to block, use iptables to block connections to ports 80 and 443 from the problem IP(s).

Regards...jmcc
7:06 am on Feb 13, 2017 (gmt 0)

New User

joined:Feb 13, 2017
posts: 12
votes: 2


It took almost 6 months to identify -all- the bad IP ranges and learn about their tech, and into HTAccess they went;
rewrite to 410; 0 bytes served. BEST of all, no access to any other part of the website.
If you don't like cookies, then other ips can go to 403 with a custom error page, requiring someone to read your phone# in words, or click a link. bots don't like reclicking links, but if they learn how, then use a dead link requiring copy/paste.
After a couple years you'll be mostly off their lists; the remainders you can block forever.
7:10 am on Feb 13, 2017 (gmt 0)

New User

joined:Feb 13, 2017
posts: 12
votes: 2


Btw you can rename the 403 so it shows your chosen title in the browser bar, not an error.
The 403 is a mini HTML page, excellent for mobile viewing.