Forum Moderators: goodroi

Message Too Old, No Replies

Bad bots

         

ownerrim

10:23 pm on Jun 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anyone have a list of bots that should be banned wholesale?

Dijkgraaf

1:48 am on Jun 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do a search for "bad bot" in your favourite search engine, and you will find various lists, most of them written by web site owners who have noticed behaviour of bots visiting their sites. I don't know any definative world wide lists of bad bots.

The definition of a "bad bot" would vary depending on who you ask, as there are various behaviours that can be/are considered bad.
e.g. not reading robots.txt, disobying robots.txt, requesting too many pages in a short time span, revisiting pages too often, e-mail harvesting, Guestbook spamming, Log Spamming, munging URL's.
Some of the above are rather subjective measures (e.g. how often is too often) and others a require some effort to identify (e.g. e-mail harvesting).

Trying to use robots.txt for bad bots that either don't read robots.txt or disobey it will fail to work of course, so other methods e.g. .htaccess will have to be used against them.
Some bad bots change both their User Agent and IP address frequently, so the only way to ban these is to have a robot trap (search:robot trap) to automatically ban them.

ownerrim

4:59 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the tips. I have a site in development that's not on a dedicated server and, tech-wise, even if it was a dedicated box I know nothing about configuring or maintaining a server (.htaccess--have no idea what this is really, other than the fact that, from what I've read, this seems to be a means by which to ban bots and specific IPs), so I'll guess I'll have to speak with the host regarding security.

My biggest concern is content theft, down the road. For this reason, the thought of banning all bots aside from Yahoo, googlebot, and msnbot seems appealing. Also, since the site will be an english language one, I have no need for visitors from china, south korea, or any number of other countries. Ideally, I would like to ban their access, though I haven't a clue as to how you can gather all the various IP ranges for certain countries and ban their access wholesale.

DanA

5:35 pm on Jun 7, 2005 (gmt 0)

10+ Year Member



Banning people, countries is quite easy (ip2country, geoipcountry, hostip databases with about a 5% error rate)
AOL users may come from anywhere but seem to be located in the US (or the UK)...
Banning people who read English and do not live in English speaking countries is easy!
Banning robots being indexed by Google and Yahoo is really difficult but searching for robot traps or web spider traps or crawler traps or offline browsers traps or site rippers traps and so on may help you.
Banning spammers is also difficult once your site is indexed.
Once content is indexed, Google cache or translate facility will allow anyone to browse your site...

jatar_k

5:38 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



read this whole thread
A Close to perfect .htaccess ban list [webmasterworld.com] long, 3 parts

ownerrim

5:40 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Banning people, countries is quite easy (ip2country, geoipcountry, hostip databases with about a 5% error rate)"

How do you do this? I downloaded a list of IP ranges once (I think it might have been geoip-something), but it didn't give you, say, a whole contiguous list of all the ranges for china. Instead it listed all the ranges and showed which sliver of the range went to china, which piece of the range went to canada, etc, etc.

From looking at it, it looked as though to ban one country such as china, or india (just examples), you'd have to track down and input hundreds and hundreds of chunks of IP ranges.

jatar_k

5:49 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



it's fairly easy to do

I use [ip-to-country.webhosting.info...]

in my case I don't want anyone from a list of countries to be able to signup with us

get the ip of the person trying to signup
use that to get their country from mysql database
if that country is not allowed stop them

ownerrim

7:46 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not trying to block individual users from certain countries. I'd like to block WHOLE COUNTRIES. And by that I mean ALL of india, ALL of china, ALL of south korea and all of any other non-english country which is rife with content theft.

DanA

8:08 pm on Jun 7, 2005 (gmt 0)

10+ Year Member



get the ip of the person trying to get the page
use that to get their country from mysql database
if that country is not allowed stop them

DanA

8:33 pm on Jun 7, 2005 (gmt 0)

10+ Year Member



You can find scripts and links to detect the country there :
[ip-to-country.webhosting.info...]

ownerrim

9:30 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Either I'm not following properly or I'm not making it clear enough. I am not trying to detect the country that a visitor comes from. I want to set up IP bans so that no one from certain countries can EVER get to the site. Not once, not ever.

jatar_k

9:39 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



you could have a small included file that redirects anyone from a specific range using the mysql db we linked to

you could read the thread I mentioned above re: htaccess and use REMOTE_ADDR in a method such as

SetEnvIf REMOTE_ADDR ^(127\.0\.0\.1¦192\.168\.2\.¦192\.168\.3\.¦10\.) bad-ip
<Directory /docroot>
Order Deny,Allow
Deny from env=bad-ip
Allow from All
</Directory>

ips would need to be changed and some paths fixed for your own site

you could also read this for reference

[httpd.apache.org...]

GaryK

9:39 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You initially expressed hesitation over having to enter hundreds of IP Address blocks. The only way to get around it is by checking the IP Address of each visitor to your site. If the IP Address is from a country you want to ban the site can respond with a 403 Access Denied status message. :)

jatar_k

9:42 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



the other thing is

>> My biggest concern is content theft

if they want to steal your content they will, they could just use a proxy, spoof an ip or whatever else. You can only make it difficult, not impossible.

ownerrim

4:25 am on Jun 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"if they want to steal your content they will, they could just use a proxy, spoof an ip or whatever else. You can only make it difficult, not impossible."

I agree. Well, thanks all for the info and tips.