Forum Moderators: open

Message Too Old, No Replies

ZyBorg/1.0 violates robots.txt

came from 216.88.158.142

         

jazzguy

8:50 pm on Jul 27, 2003 (gmt 0)

10+ Year Member



A bot claiming to be ZyBorg/1.0 disobeyed my robots.txt file, and got itself automatically banned (thanks to the Perl scripts posted in the forums). The disallowed file has been in the robots.txt file for over a month and ZyBorg has fetched robots.txt many times since then.

Is 216.88.158.142 a valid IP for the Zyborg bot, or is somebody spoofing Zyborg's U-A? 216.88.158.142 is assigned to:

OrgName: SAVVIS Communications Corporation
OrgID: SAVV
Address: 1 SAVVIS Parkway
City: Town and Country
StateProv: MO
PostalCode: 63017
Country: US
NetRange: 216.88.0.0 - 216.91.255.255
CIDR: 216.88.0.0/14

There is no reverse DNS configured for 216.88.158.142. The complete U-A of the bot was:
"Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http: //www.WISEnutbot.com)" (I added the space in the URL to prevent linking).

If this is the real ZyBorg bot, I should lift the ban on the IP, right? Isn't Looksmart a desired search engine? If they violate robots.txt, it's going to be a real pain to put in mod_rewrite rules to keep them out of disallowed areas.

bull

9:06 pm on Jul 27, 2003 (gmt 0)

10+ Year Member


was on my site too, not even asking for robots.txt for two days.

There is a relationship between savvis and looksmart. From a older log:
[code]64.241.243.124 - - [30/Jun/2003:01:39:12 +0200] "GET /indexd.html HTTP/1.1" 200 2786 www.-.net "-" "Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http://www.WISEnutbot.com)" "-"[/code]

-->

SAVVIS Communications Corporation SAVVIS8 (NET-64-240-0-0-1)
64.240.0.0 - 64.243.255.255
Looksmart, LTD SAVV-S82358-1 (NET-64-241-242-0-1)
64.241.242.0 - 64.241.243.255

nwctwx

12:53 am on Jul 28, 2003 (gmt 0)

10+ Year Member



I was just about to ask the question myself...

216.88.158.142 - - [27/Jul/2003:17:16:46 -0700] "GET /cgi-bin/weather.pl HTTP/1.1" 200 27432 "-" "Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http:// www.WISEnutbot.com)"

Seen this bot around lately, is it a good guy?

Looksmart is connected to MSN right?

wilderness

4:06 am on Jul 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



jazzguy,
I went through my July logs in what I thought would provide you with an insight into ZyBorg's RESPECT of my robots disallow.

It seems the joke is on me.
Although Zyborg didn't visit my main site often thus far (Jul,) less than a dozen times. They did NOT respect the disallow.
As a result I have denied the range

RewriteCond %{REMOTE_ADDR} ^216\.(8[8-9]¦9[01])\. [OR]

Thanks for the heads up.

Although I'm not all sure that denying ZyBorg is a good thing. I had a look at their pages which are fee based for submissions.

Don

[edited]
These folks have a ton of ranges:
SAVVIS / ClearBlue Technologies, Inc. SAVV-ZZLVNE (NET-64-243-84-24-1) 64.243.84.24 - 64.243.84.31
Savvis Center SAVV-ZBLUES (NET-209-16-204-0-1) 209.16.204.0 - 209.16.204.15
Savvis Center SAVV-ZBLUES (NET-66-100-117-128-1) 66.100.117.128 - 66.100.117.191
Savvis Communications SAVV1-GLZ (NET-206-114-219-32-1) 206.114.219.32 - 206.114.219.63
SAVVIS COMMUNICATIONS FON-34550620161903 (NET-205-240-16-0-1) 205.240.16.0 - 205.240.31.255
SAVVIS COMMUNICATIONS FON-3428859904832 (NET-204-96-64-0-1) 204.96.64.0 - 204.96.95.255
SAVVIS Communications Corporation SAVVIS7 (NET-216-88-0-0-1) 216.88.0.0 - 216.91.255.255
SAVVIS Communications Corporation SAVVIS5 (NET-209-176-0-0-1) 209.176.0.0 - 209.176.255.255
SAVVIS Communications Corporation SAVVIS6 (NET-209-223-0-0-1) 209.223.0.0 - 209.223.255.255
SAVVIS Communications Corporation SAVVIS-1 (NET-209-16-192-0-1) 209.16.192.0 - 209.16.223.255
SAVVIS Communications Corporation SAVVIS2 (NET-209-44-0-0-1) 209.44.0.0 - 209.44.63.255
SAVVIS Communications Corporation SAVVIS4 (NET-209-144-0-0-1) 209.144.0.0 - 209.144.255.255
SAVVIS Communications Corporation SAVV2 (NET-207-15-16-0-1) 207.15.16.0 - 207.15.31.255
SAVVIS Communications Corporation SAVV (NET-66-100-0-0-1) 66.100.0.0 - 66.101.255.255
SAVVIS Communications Corporation SAVVIS8 (NET-64-240-0-0-1) 64.240.0.0 - 64.243.255.255
SAVVIS Communications Corporation SAVV1 (NET-206-114-192-0-1) 206.114.192.0 - 206.114.223.255
SAVVIS Communications Corporation SAVIXA-NET1 (NET-206-40-128-0-1) 206.40.128.0 - 206.40.159.255
SAVVIS Communications Corporation SAVSEMAPHORE-BLK3 (NET-204-194-8-0-1) 204.194.8.0 - 204.194.15.255
SAVVIS Communications Corporation SAVVIS-USWASH6 (NET-147-208-0-0-2) 147.208.0.0 - 147.208.31.255
SAVVIS Communications Corporation SAVVIS-USSNTC6 (NET-147-208-128-0-1) 147.208.128.0 - 147.208.191.255
SAVVIS Communications Corporation SAVVIS3 (NET-209-83-128-0-1) 209.83.128.0 - 209.83.255.255
SAVVIS Communications Corporation SAVIXA-NET-CBLK (NET-199-242-16-0-1) 199.242.16.0 - 199.242.25.255
SAVVIS Communications Corporation SAVIXA-NET6 (NET-209-102-0-0-1) 209.102.0.0 - 209.102.127.255
SAVVIS Communications Corporation SAVIXA-NET4 (NET-207-149-0-0-1) 207.149.0.0 - 207.149.255.255
SAVVIS Communications Corporation SAVIXA-NET3 (NET-206-129-0-0-1) 206.129.0.0 - 206.129.255.255
SAVVIS Communications Corporation SAVIXA-NET2 (NET-199-217-64-0-1) 199.217.64.0 - 199.217.95.255
SAVVIS Data Center SAVV-ZDATHZ-1 (NET-209-16-208-128-1) 209.16.208.128 - 209.16.208.143
SAVVIS Data Center SAVV-ZDATHZ-3 (NET-206-114-194-168-1) 206.114.194.168 - 206.114.194.175
SAVVIS Singapore Company PTE Ltd SAVV-ZZTRSG (NET-66-100-48-0-1) 66.100.48.0 - 66.100.48.15
Savvis Communications SBC067066134168030317 (NET-67-66-134-168-1) 67.66.134.168 - 67.66.134.175

mattdwells

11:58 pm on Jul 28, 2003 (gmt 0)

10+ Year Member



wilderness,

how did you get all those ip ranges for savvis?

matt

wilderness

12:08 am on Jul 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



those ip ranges for savvis

matt
You can do it for most any provider through ARIN.
From the inquiry for 64.241.143.124
resulted in:

SAVVIS Communications Corporation SAVVIS8 (NET-64-240-0-0-1)
64.240.0.0 - 64.243.255.255
Looksmart, LTD SAVV-S82358-1 (NET-64-241-242-0-1)
64.241.242.0 - 64.241.243.255

the next step is to just cut and paste the SAVVIS into the ARIN block and it returns your inquiry.

DavidT

4:42 am on Jul 29, 2003 (gmt 0)

10+ Year Member



Zyborg, as an aside, is very slow on the updating dns front. Just changed hosts and have left the site up on the old host for a week now. The only visits it now gets are from Zyborg.

wilderness

1:57 pm on Jul 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Zyborg, as an aside

No Scooter?
Those folks are chasing old pages I redirected two years ago.

davelms

8:10 pm on Jul 29, 2003 (gmt 0)

10+ Year Member



For what its worth, ZyBorg/1.0 from 216.88.158.142 disobeyed my robots.txt file yesterday (28th July) too.

kewlbeezer

9:18 pm on Jul 29, 2003 (gmt 0)

10+ Year Member



Same thing happened to me too...

216.88.158.142
Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http:// www.WISEnutbot.com)

Whoever they are, can no longer visit my page, as they visited a banned directory on my site and now belong in the blacklist.

jomaxx

12:01 am on Jul 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Got hit by the same bad robot a couple of days ago, and I also banned the IP. Strangely, I have continued to get visited every day by the "good" Zyborg from 64.241.243.*. I can't believe the other one is a legitimate spider from Looksmart.

wilderness

12:13 am on Jul 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can't believe the other one is a legitimate spider from Looksmart.

Why not?
Both UA's are the same and both backbone providers are the same!

SAVVIS Communications Corporation SAVVIS8 (NET-64-240-0-0-1) 64.240.0.0 - 64.243.255.255
SAVVIS Communications Corporation SAVVIS7 (NET-216-88-0-0-1) 216.88.0.0 - 216.91.255.255

Peeress

3:24 am on Jul 30, 2003 (gmt 0)

10+ Year Member



Yes I was surprised by finding this bot had visited my guestbook in detail today- even comment forms. Did not look at robots.txt so I'm going to ban for now. It was good to find more info about it at this forum)

DavidT

3:25 am on Jul 30, 2003 (gmt 0)

10+ Year Member



Well Zyborg has managed to find the new location of my site but has also found the cgi-bin and is screwing up my affiliate link tracking statistics.

jomaxx

3:35 am on Jul 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What I meant was that the 64.* spider has been visiting my site for ages without any problems, and is still doing so. The 216.* spider just showed up and is exhibiting totally different behaviour.

This could be an out-of-control beta test, or a different company that is intentionally spoofing Zyborg because they use the same web host, or maybe SkyNet has just achieved self-awareness and is downloading all human knowledge, but it doesn't appear to be the regular LookSmart spider.

jomaxx

3:59 am on Jul 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



P.S. Just looked much further back in my logs, and I can see that the first appearance of the 216.88.158.142 robot was June 30.

There was some misbehaviour on a smallish scale on July 1, and it returned a few more times without incident. Then things began to go haywire on July 25, with the spider running executables in prohibited directories by the thousands.

frontpage

2:49 pm on Jul 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Reference: [webmasterworld.com...]

Has anyone contacted them about the rude behavior of the ZyBorg bot? I am going to send them an email today.

I am sure that members of LookSmart patronize this site. I am sure that they don't want a lot of angry webmasters, hosts, companies to ban their bot for snooping in unauthorized places.

jazzguy

5:39 pm on Jul 30, 2003 (gmt 0)

10+ Year Member



I received a stickymail response from a representative of Looksmart's crawler team. They are working out a way to have a formal presence in the forum, but in the meantime she has given me permission to quote her response to me.

From Daniele at Looksmart:

<snip>

So hopefully everything will be fixed soon.

-Jazzguy

[edited by: Brett_Tabke at 10:16 pm (utc) on Sep. 30, 2003]
[edit reason] We can NOT copy in content from email to posts. [/edit]

jeremy goodrich

5:51 pm on Jul 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Got the same in stickymail. :)

However, problem with the "promise" of fixing things is just that - promises mean nothing and solutions are the *only* thing that counts.

Ever since 2+ years or so ago - when Kord Campbell, the lead developer for the Grub project (now owned by LS) owned up to the robots.txt issue it seems that there has been far *long enough to fix the problem*.

Ya, deploying faulty code etc - does happen. However, given the track record of the company, call me cynical but I'll believe they have gotten some respect for US when I see it in my log files ;)

jazzguy

6:21 pm on Jul 30, 2003 (gmt 0)

10+ Year Member



The info about Grub and their unfulfilled promises is good to know. The grub crawler (all versions) definitely does not obey robots.txt files for my sites and has been permanently banned. It is unfortunate because Grub seems like a good idea, but it will go nowhere if they are banned from most sites.

Daniele from Looksmart seemed sincere, so I will give her and Looksmart the benefit of the doubt in this case and hope that they quickly fix the problem. But I agree, ultimately the log files tell all.

frontpage

9:12 pm on Jul 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Our bad robots trap is configured is that there is no way a robot could find the "secret" directory where the "trap" is unless it finds the reference to it in the robot.txt first. There are no URL links to the directory elswhere on our sites.

This indicates that the Zyborg robot is specifically configured to look at the robots.txt and not to just ignore it. This would indicate that the Zyborg bot has to call the robots.txt, parse it, located "secret" trap directory by reading my robots.txt.

jomaxx

12:08 am on Jul 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What kind of response is that? It appears to be discussing some internal problem with their "robots refresh protocol" (whatever that is).

Daniele makes no mention of Zyborg's ignoring the robots.txt file - except in the first line to acknowledge that's what jazzguy was complaining about!

jomaxx

4:57 am on Jul 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



FWIW, Daniele stickied me a clarification stating that they maintain a database of robots.txt restrictions, and there was a bug that allowed crawlers to ignore this database during the period that it was being refreshed. That's the "race condition" mentioned.

bull

6:01 am on Jul 31, 2003 (gmt 0)

10+ Year Member



stickied me...etc

How about Daniele writing something here himself? Too much effort?

lindahl

7:48 am on Jul 31, 2003 (gmt 0)

10+ Year Member


Not only am I being crawled by 216.88.158.142, but I am being crawled by a distributed set of sites. This is extremely visible to me because I noticed that they have the following bug:

If you ask for http://www.pbm.com/~lindahl/food.html/

(note trailing slash) then Apache returns the contents of food.html. Now if you are stupid, you think that all of the relative links are based on food.html/, instead of off of lindahl/.

So this distributed set of sites is trying to crawl urls like:

~lindahl/food.html/recipes/cariadoc/articles/recipes/foo/bar/baz/barf/etc.html

Yes, it's an endless loop.

And they ignore /robots.txt. So I added redirects. I have been averaging 1,000 hits per day of this.

A friend of mine who works for a Major Search Engine said, "Yeah, we don't chase URLs that have '.html/' in them."

Peeress

1:52 pm on Jul 31, 2003 (gmt 0)

10+ Year Member



lol! Well, personally, I don't have time to send out emails asking why a bot is behaving badly and since I'm not an expert on crawler infrastructures, I usually don't understand their explanations! Whether it's a Looksmart bot, Google bot, beta, good or bad, it's banned if it disobeys.

balam

5:03 pm on Jul 31, 2003 (gmt 0)

10+ Year Member



Same IP, but this is new...


Mozilla/4.0 compatible ZyBorg/1.0 DLC (wn.zyborg@looksmart.net; http*://www.WISEnutbot.com)

(* added to break the link to their non-existant bot page...)

wilderness

10:40 pm on Jul 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



4. LookSmart Turns Profit, Brightens Outlook
After giving a cautious outlook last quarter, the paid inclusion company turns a tidy
profit and sees bright year ahead.
[internetnews.com...]

My what a coincidence;)

lindahl

11:34 pm on Jul 31, 2003 (gmt 0)

10+ Year Member



The 64.* and 216.* robots have the same bug, so I'm pretty sure it's all looksmart.

jdMorgan

7:15 am on Aug 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some thoughts:
Mozilla/4.0 compatible ZyBorg/1.0 DLC (wn.zyborg@looksmart.net; [WISEnutbot.com)<...]
DLC = the new Dead Link Checker?

I hope you all consider the possible ramifications of banning Looksmart vis-a-vis disappearing from MSN and possibly other search engine results. While this appears to be a problem related to WiseNut, who knows what the future relationships may be?

The term "race condition" is a specific term which indicates a time-dependent problem existing between two processes which run independently. In this case, Daniele's explanation corresponds with the posted log file evidence.

I agree that the 'bot has misbehaved, and I believe that each webmaster should make their own decisions about which 'bots to allow and which to disallow, but let's give these folks a chance to fix their problem before "convicting" their 'bot and imposing a life sentence without parole.

In the long run, we need (well-behaved) robots to visit our sites in order to get traffic much more than any search engine needs to list our individual sites in their index. If they can't list a few sites because their 'bot is banned, it really won't affect their search results much; It will be unimportant... Except to the webmasters of those sites. IOW, we need the 'bots much more than they need our sites.

Personally, I'm glad to see WiseNut crawling; I recently got a new site listing in WiseNut, but the crawl data was from April. I'll appreciate an update, thanks.

Anyway, let's have a dialog here and try to help these folks fix the problems with the 'bot and with the broken robots information link in the user-agent string. If these posts are ignored and the problems persist, then we will have cause to ban these bots "permanently". However, until they've had a chance to fix the problems, I don't think a permanent ban is a very good idea.

Frontpage nailed it in post #17 - The first step is to contact the organization deploying the robot, and report problems. If they do nothing, or if they claim they will correct bad behaviour and then fail to do so, then ban 'em.

MHO - YMMV,
Jim

This 70 message thread spans 3 pages: 70