Forum Moderators: open
Is 216.88.158.142 a valid IP for the Zyborg bot, or is somebody spoofing Zyborg's U-A? 216.88.158.142 is assigned to:
OrgName: SAVVIS Communications Corporation
OrgID: SAVV
Address: 1 SAVVIS Parkway
City: Town and Country
StateProv: MO
PostalCode: 63017
Country: US
NetRange: 216.88.0.0 - 216.91.255.255
CIDR: 216.88.0.0/14
There is no reverse DNS configured for 216.88.158.142. The complete U-A of the bot was:
"Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http: //www.WISEnutbot.com)" (I added the space in the URL to prevent linking).
If this is the real ZyBorg bot, I should lift the ban on the IP, right? Isn't Looksmart a desired search engine? If they violate robots.txt, it's going to be a real pain to put in mod_rewrite rules to keep them out of disallowed areas.
There is a relationship between savvis and looksmart. From a older log:
[code]64.241.243.124 - - [30/Jun/2003:01:39:12 +0200] "GET /indexd.html HTTP/1.1" 200 2786 www.-.net "-" "Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http://www.WISEnutbot.com)" "-"[/code]
-->
SAVVIS Communications Corporation SAVVIS8 (NET-64-240-0-0-1)
64.240.0.0 - 64.243.255.255
Looksmart, LTD SAVV-S82358-1 (NET-64-241-242-0-1)
64.241.242.0 - 64.241.243.255
216.88.158.142 - - [27/Jul/2003:17:16:46 -0700] "GET /cgi-bin/weather.pl HTTP/1.1" 200 27432 "-" "Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http:// www.WISEnutbot.com)"
Seen this bot around lately, is it a good guy?
Looksmart is connected to MSN right?
It seems the joke is on me.
Although Zyborg didn't visit my main site often thus far (Jul,) less than a dozen times. They did NOT respect the disallow.
As a result I have denied the range
RewriteCond %{REMOTE_ADDR} ^216\.(8[8-9]¦9[01])\. [OR]
Thanks for the heads up.
Although I'm not all sure that denying ZyBorg is a good thing. I had a look at their pages which are fee based for submissions.
Don
[edited]
These folks have a ton of ranges:
SAVVIS / ClearBlue Technologies, Inc. SAVV-ZZLVNE (NET-64-243-84-24-1) 64.243.84.24 - 64.243.84.31
Savvis Center SAVV-ZBLUES (NET-209-16-204-0-1) 209.16.204.0 - 209.16.204.15
Savvis Center SAVV-ZBLUES (NET-66-100-117-128-1) 66.100.117.128 - 66.100.117.191
Savvis Communications SAVV1-GLZ (NET-206-114-219-32-1) 206.114.219.32 - 206.114.219.63
SAVVIS COMMUNICATIONS FON-34550620161903 (NET-205-240-16-0-1) 205.240.16.0 - 205.240.31.255
SAVVIS COMMUNICATIONS FON-3428859904832 (NET-204-96-64-0-1) 204.96.64.0 - 204.96.95.255
SAVVIS Communications Corporation SAVVIS7 (NET-216-88-0-0-1) 216.88.0.0 - 216.91.255.255
SAVVIS Communications Corporation SAVVIS5 (NET-209-176-0-0-1) 209.176.0.0 - 209.176.255.255
SAVVIS Communications Corporation SAVVIS6 (NET-209-223-0-0-1) 209.223.0.0 - 209.223.255.255
SAVVIS Communications Corporation SAVVIS-1 (NET-209-16-192-0-1) 209.16.192.0 - 209.16.223.255
SAVVIS Communications Corporation SAVVIS2 (NET-209-44-0-0-1) 209.44.0.0 - 209.44.63.255
SAVVIS Communications Corporation SAVVIS4 (NET-209-144-0-0-1) 209.144.0.0 - 209.144.255.255
SAVVIS Communications Corporation SAVV2 (NET-207-15-16-0-1) 207.15.16.0 - 207.15.31.255
SAVVIS Communications Corporation SAVV (NET-66-100-0-0-1) 66.100.0.0 - 66.101.255.255
SAVVIS Communications Corporation SAVVIS8 (NET-64-240-0-0-1) 64.240.0.0 - 64.243.255.255
SAVVIS Communications Corporation SAVV1 (NET-206-114-192-0-1) 206.114.192.0 - 206.114.223.255
SAVVIS Communications Corporation SAVIXA-NET1 (NET-206-40-128-0-1) 206.40.128.0 - 206.40.159.255
SAVVIS Communications Corporation SAVSEMAPHORE-BLK3 (NET-204-194-8-0-1) 204.194.8.0 - 204.194.15.255
SAVVIS Communications Corporation SAVVIS-USWASH6 (NET-147-208-0-0-2) 147.208.0.0 - 147.208.31.255
SAVVIS Communications Corporation SAVVIS-USSNTC6 (NET-147-208-128-0-1) 147.208.128.0 - 147.208.191.255
SAVVIS Communications Corporation SAVVIS3 (NET-209-83-128-0-1) 209.83.128.0 - 209.83.255.255
SAVVIS Communications Corporation SAVIXA-NET-CBLK (NET-199-242-16-0-1) 199.242.16.0 - 199.242.25.255
SAVVIS Communications Corporation SAVIXA-NET6 (NET-209-102-0-0-1) 209.102.0.0 - 209.102.127.255
SAVVIS Communications Corporation SAVIXA-NET4 (NET-207-149-0-0-1) 207.149.0.0 - 207.149.255.255
SAVVIS Communications Corporation SAVIXA-NET3 (NET-206-129-0-0-1) 206.129.0.0 - 206.129.255.255
SAVVIS Communications Corporation SAVIXA-NET2 (NET-199-217-64-0-1) 199.217.64.0 - 199.217.95.255
SAVVIS Data Center SAVV-ZDATHZ-1 (NET-209-16-208-128-1) 209.16.208.128 - 209.16.208.143
SAVVIS Data Center SAVV-ZDATHZ-3 (NET-206-114-194-168-1) 206.114.194.168 - 206.114.194.175
SAVVIS Singapore Company PTE Ltd SAVV-ZZTRSG (NET-66-100-48-0-1) 66.100.48.0 - 66.100.48.15
Savvis Communications SBC067066134168030317 (NET-67-66-134-168-1) 67.66.134.168 - 67.66.134.175
those ip ranges for savvis
matt
You can do it for most any provider through ARIN.
From the inquiry for 64.241.143.124
resulted in:
SAVVIS Communications Corporation SAVVIS8 (NET-64-240-0-0-1)
64.240.0.0 - 64.243.255.255
Looksmart, LTD SAVV-S82358-1 (NET-64-241-242-0-1)
64.241.242.0 - 64.241.243.255
the next step is to just cut and paste the SAVVIS into the ARIN block and it returns your inquiry.
I can't believe the other one is a legitimate spider from Looksmart.
Why not?
Both UA's are the same and both backbone providers are the same!
SAVVIS Communications Corporation SAVVIS8 (NET-64-240-0-0-1) 64.240.0.0 - 64.243.255.255
SAVVIS Communications Corporation SAVVIS7 (NET-216-88-0-0-1) 216.88.0.0 - 216.91.255.255
This could be an out-of-control beta test, or a different company that is intentionally spoofing Zyborg because they use the same web host, or maybe SkyNet has just achieved self-awareness and is downloading all human knowledge, but it doesn't appear to be the regular LookSmart spider.
There was some misbehaviour on a smallish scale on July 1, and it returned a few more times without incident. Then things began to go haywire on July 25, with the spider running executables in prohibited directories by the thousands.
Reference: [webmasterworld.com...]Has anyone contacted them about the rude behavior of the ZyBorg bot? I am going to send them an email today.
I am sure that members of LookSmart patronize this site. I am sure that they don't want a lot of angry webmasters, hosts, companies to ban their bot for snooping in unauthorized places.
From Daniele at Looksmart:
<snip>
So hopefully everything will be fixed soon.
-Jazzguy
[edited by: Brett_Tabke at 10:16 pm (utc) on Sep. 30, 2003]
[edit reason] We can NOT copy in content from email to posts. [/edit]
However, problem with the "promise" of fixing things is just that - promises mean nothing and solutions are the *only* thing that counts.
Ever since 2+ years or so ago - when Kord Campbell, the lead developer for the Grub project (now owned by LS) owned up to the robots.txt issue it seems that there has been far *long enough to fix the problem*.
Ya, deploying faulty code etc - does happen. However, given the track record of the company, call me cynical but I'll believe they have gotten some respect for US when I see it in my log files ;)
Daniele from Looksmart seemed sincere, so I will give her and Looksmart the benefit of the doubt in this case and hope that they quickly fix the problem. But I agree, ultimately the log files tell all.
This indicates that the Zyborg robot is specifically configured to look at the robots.txt and not to just ignore it. This would indicate that the Zyborg bot has to call the robots.txt, parse it, located "secret" trap directory by reading my robots.txt.
If you ask for http://www.pbm.com/~lindahl/food.html/
(note trailing slash) then Apache returns the contents of food.html. Now if you are stupid, you think that all of the relative links are based on food.html/, instead of off of lindahl/.
So this distributed set of sites is trying to crawl urls like:
~lindahl/food.html/recipes/cariadoc/articles/recipes/foo/bar/baz/barf/etc.html
Yes, it's an endless loop.
And they ignore /robots.txt. So I added redirects. I have been averaging 1,000 hits per day of this.
A friend of mine who works for a Major Search Engine said, "Yeah, we don't chase URLs that have '.html/' in them."
My what a coincidence;)
Mozilla/4.0 compatible ZyBorg/1.0 DLC (wn.zyborg@looksmart.net; [WISEnutbot.com)<...]
DLC = the new Dead Link Checker?I hope you all consider the possible ramifications of banning Looksmart vis-a-vis disappearing from MSN and possibly other search engine results. While this appears to be a problem related to WiseNut, who knows what the future relationships may be?
The term "race condition" is a specific term which indicates a time-dependent problem existing between two processes which run independently. In this case, Daniele's explanation corresponds with the posted log file evidence.
I agree that the 'bot has misbehaved, and I believe that each webmaster should make their own decisions about which 'bots to allow and which to disallow, but let's give these folks a chance to fix their problem before "convicting" their 'bot and imposing a life sentence without parole.
In the long run, we need (well-behaved) robots to visit our sites in order to get traffic much more than any search engine needs to list our individual sites in their index. If they can't list a few sites because their 'bot is banned, it really won't affect their search results much; It will be unimportant... Except to the webmasters of those sites. IOW, we need the 'bots much more than they need our sites.
Personally, I'm glad to see WiseNut crawling; I recently got a new site listing in WiseNut, but the crawl data was from April. I'll appreciate an update, thanks.
Anyway, let's have a dialog here and try to help these folks fix the problems with the 'bot and with the broken robots information link in the user-agent string. If these posts are ignored and the problems persist, then we will have cause to ban these bots "permanently". However, until they've had a chance to fix the problems, I don't think a permanent ban is a very good idea.
Frontpage nailed it in post #17 - The first step is to contact the organization deploying the robot, and report problems. If they do nothing, or if they claim they will correct bad behaviour and then fail to do so, then ban 'em.
MHO - YMMV,
Jim