Forum Moderators: open

Message Too Old, No Replies

Could you help me identify these robots?

         

skuba

4:45 am on Feb 19, 2005 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have been hitting hard by these hosts. My stats program doesn't show them as search engines, probably need to be updated.

220.181.26.66 - 220.181.26.72 [range]

64.62.168.3 - 64.62.168.78 [range]

The first one is hitting hard.

Do you know what robots are these?

Thanks a lot

volatilegx

6:05 pm on Feb 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think the first belongs to the Chinese search engine Sohu.

The second is Gigablast.

bull

10:01 pm on Feb 19, 2005 (gmt 0)

10+ Year Member



I regularly see sohu from 61.135.130.* and 61.135.131.* as well as from the mentioned 220.181.26.66-72 range.

skuba

6:18 pm on Feb 22, 2005 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks.
What's the best way to block this Sohu robot on my robots.txt?

Could somebody please give me the best syntax?

Thanks

idoc

1:06 am on Feb 24, 2005 (gmt 0)

10+ Year Member



Someone else may know otherwise, but this should be about right:

User-agent: sohu-search
Disallow: /

I can't say for sure because I can't use and don't need or want the web traffic or email from either 61. or 220. so, I have 61. and 220. along with a few other class A address ranges in hosts.deny and beyond that disallow these address ranges at the firewall.

pendanticist

7:04 am on Feb 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>What's the best way to block this Sohu robot on my robots.txt?

To a bot that does NOT respect robots.txt, banning via the IP Ranges bull mentions is the ticket. Least ways, that's what I had to do some weeks back as sohu persists, despite being fed 403s.

fiestagirl

4:26 pm on Feb 24, 2005 (gmt 0)

10+ Year Member



Yes, Blocking the ip is the only way to go with these guys. They've also been known to use the UA "googlebot" at times.

skuba

5:16 pm on Feb 24, 2005 (gmt 0)

10+ Year Member Top Contributors Of The Month



I can't find online how to block IP ranges. Would you please help and send me the syntax?

Thanks a lot

wilderness

1:01 am on Feb 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



how to block IP ranges

A Simple Beginning
[webmasterworld.com...]

Close to Perfect htaccess
[webmasterworld.com...]

(edit wilderness)

Using idoc's criteria:

deny from 61.
deny from 220.

idoc

5:27 am on Feb 25, 2005 (gmt 0)

10+ Year Member



The links above are a good place to begin for sure. Yes, you can block i.p.'s in hosts.deny, in .htaccess with ipchains, in the apache config file... etc. If you have access to all of these the issue becomes really which is impacts your server less and is yet efective. As I posted above, I found it better to take out several problematic class A's in hosts.deny and at the corp firewall for the lans. One consideration is IF you add an IP to hosts.deny and use tcpwrappers with sendmail then you are also denying those ip's from mail access also. Therefore, for example domestic ban lists I keep against site scrapers that include many of the discount shared web hosts I add directly to the apache config because I don't want to impact email as I use tcpwrappers to augment spam control with sendmail. For folks on a shared host, .htaccess is probably all you will have access to.

skuba

7:02 pm on Feb 25, 2005 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks for the advice.

By class A, you mean you block users from certain countries right?

Is there a list of known malicious bots somewhere?

and

what's the difference of blocking IPs on .htaccess by using the Deny command versus using RewriteCond %{REMOTE_ADDR}?

Thanks a lot

wilderness

8:15 pm on Feb 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



skuba,
each IP range is composed of four "blocks or classes," with each "block or class" having a range from 0-255.
An example in one of the yahoo bots:
66.196.91.166

the 66 is Class "A"
the 196 is Class "B"
the 91 is Class "C"
the 166 is Class "D"

Some time ago the Class A's may have been associated with specific countries or reigons. That is NOT so today. One good example of thise exception is the 134 Class A.

The 61 Class A is primarily APNIC the Orient, however there are aslo some Ocenaic ranges in the 61 Class A block.

It is the tradition of this forum that each webmaster makes a decision as to what is detrimental or benefical to their own websites.
There are many tool sites of which you may use to assist you in making determinations. One from a dedicated participant in this forum:
[webmasterworld.com...]

The "Close to Perfect Htaccess" thread I supplied a link to provides many bots of variety, some of which many of us are in agreement on.

cooldoug

11:54 pm on Mar 3, 2005 (gmt 0)

10+ Year Member



In htaccess use
order allow,deny
deny from 220.181.26.66
deny from 64.62.168.3
allow from all

wilderness

1:11 am on Mar 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In htaccess use
order allow,deny
deny from 220.181.26.66
deny from 64.62.168.3
allow from all

That method of precise IP's is not very effective in stopping bots or harvesters and you'll learn that very FAST.

Don

skuba

6:21 pm on Mar 4, 2005 (gmt 0)

10+ Year Member Top Contributors Of The Month



That method of precise IP's is not very effective in stopping bots or harvesters and you'll learn that very FAST.

Don

So that method is?
That's why I started this thread. TO know what method is good.

Thanks

wilderness

2:14 am on Mar 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So that method is?
That's why I started this thread. TO know what method is good.

skuba,
I've previously supplied two links within Webmaster World of old threads, on two occassions to you.
On both occassions, you've returned with nearly identical issued before the URL's were provided.
Either your not taking the time to read the threads or your not taking the time to expand on what you read?

Nobodyin this forumn has the capability to advise you what is the best method for your website (s).

I learned the very basics of htaccess nearly five years ago, going through page after page of google references to the solitary term htaccess.

What has been shared in this forum has allowed me expand on methods and in some instances even provide new methods.

If you have a straight-forward question?
Than by all means, please aire it!
Your questions are too vague and leave the door-of-rely open for some serious time and depth.

If you have time, it's best spent in the WW archives where there are an almost infinite qanity of methods.

If you have a more precise question, than please find a way to communicate what precisely your looking to achieve.

Don

skuba

5:23 pm on Mar 7, 2005 (gmt 0)

10+ Year Member Top Contributors Of The Month



Well, I have been to your links. And I actually have some experience with .htaccess.

I think my question is pretty much straight-forward. A lot of people here went thought the same issues, so if I ask what works better. All people need to do is answer: Yes, it works great to use deny, I never had agents wasting my bandwidth anymore. Or, I tried deny and didn't work, I found out that by doing this other method ... worked great. etc...

I am basically trying to figure what works better in most of the cases.

jdMorgan

9:28 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If your Denys are set up properly, they will work properly. Some folks are on hosts that don't support mod_rewrite, so for them, using Allow/Deny in mod_access is the only method that will work.

I posted some comments in another recent thread [webmasterworld.com] (msg16) that may be useful to you. Personally, I prefer the automated approach to blocking user-agents that attempt to abuse my sites -- I don't have time to watch over them on an hourly or even daily basis. Therefore, I recommend the scripts I cited in that thread to you.

Jim