Forum Moderators: open
Is 216.88.158.142 a valid IP for the Zyborg bot, or is somebody spoofing Zyborg's U-A? 216.88.158.142 is assigned to:
OrgName: SAVVIS Communications Corporation
OrgID: SAVV
Address: 1 SAVVIS Parkway
City: Town and Country
StateProv: MO
PostalCode: 63017
Country: US
NetRange: 216.88.0.0 - 216.91.255.255
CIDR: 216.88.0.0/14
There is no reverse DNS configured for 216.88.158.142. The complete U-A of the bot was:
"Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http: //www.WISEnutbot.com)" (I added the space in the URL to prevent linking).
If this is the real ZyBorg bot, I should lift the ban on the IP, right? Isn't Looksmart a desired search engine? If they violate robots.txt, it's going to be a real pain to put in mod_rewrite rules to keep them out of disallowed areas.
Doesn't matter if they got the file before or not, they still need to check that it's OK.
After all, it's the traffic that matters to a site, and the content that matters to the search engine. If you don't get the traffic to justify the bandwidth expense of letting a spider in, then by all means, 403 it to someplace else.
216.88.158.142 - - [18/Sep/2003:16:12:02 -0700] "GET /robots.txt HTTP/1.1" 200 67 "-" "Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; ht tp://www.WISEnutbot.com)"
Could a XHTML1.0 transitional doctype & layout be causing problems with some spiders? Its W3C valid.
Assuming that Zyborg DLC's is a link checker whose job is to check links currently in the index, and that Zyborg DLC therefore does not look at robots.txt, then the problem that started this thread is the likely cause of your problem.
The Zyborg robot that had the problem grabbed your disallowed pages, and they ended up in the index. Then, DLC comes along and tries to verify the index, so it accesses those disallowed pages again.
I think their implementation is a bit weak, in that DLC should either read and obey robots.txt, or it should do HEAD requests instead of GETs if they want to classify it as a link-checker only and not as a robot.
However, I still would not apply a blanket user-agent ban to a well-known company's robot. I'd use .htaccess or ISAPI filters or something similar to block the specific problem pages until they get the problem sorted out.
But I haven't had any problems with Zyborg. Maybe those here who have had a problem should write it up, attach a short log file sample and e-mail it to Looksmart as a problem report. I had good luck recently with another company - I actually got a reply from someone 'famous' at director level thanking me for the info. So, some of them listen. I very wasn't happy with what happened because of this bug, but decided to help them out in return for the clicks they'd sent me over the past years.
Jim
<edited for typo>
[edited by: jdMorgan at 8:01 pm (utc) on Sep. 19, 2003]
However, I still would not apply a blanket user-agent ban to a well-known company's robot. I'd use .htaccess or ISAPI filters or something similar to block the specific problem pages until they get the problem sorted out.
In your opinion how long should it take to resolve the problem of the spider accessing disallowed files/folders before it gets banned? This thread started on July 27, nearly two months have passed, the company is clearly aware of the problem, and yet the problem persists to this very day. IMO if they were serious about fixing this bug it could have been done overnight. But if you feel differently about it I'll consider rethinking my position on the subject because you've earned my respect.
It's a fact that people make mistakes, and robots misbehave because of it. It's up to you to decide if you want to ban a robot because it is misbehaving, implement a work-around allowing time for them to fix it, or ignore the problem completely. As Jeremy noted above, it is up to the individual webmaster to make this decision, based on his or her site, the traffic it gets from the search engine in question, and the complexity of the problem.
My personal opinion - which should be taken with a grain of salt - is that in this case, a work-around is preferable. You just never know what will happen next month in this business, or who will be supplying search results to whom. Banning a 'brand-name' robot when more specific measures are possible is just not a good idea -- again, IMHO.
As to how long it takes to fix a problem, it depends. Maybe it's a simple coding problem, but then what about testing? A search engine's index is it's product. I would not expect them to release a new robot version with the potential capability of destroying their index without a good long test and evaluation. Since it often takes months to get a site spidered and listed, I'd take that investment into consideration when deciding what your pre-ban time limit will be.
Perhaps a fix is on the way: claus has spotted two new Zyborg variants [webmasterworld.com], and I found a new one myself today. We think they're new, anyway.
When I have pages that I don't want listed in the SERPs, I add them to robots.txt. But I also add them to rules in my .htaccess file, as insurance against the kind of problem described here. If the robot obeys robots.txt, it never sees any effect from .htaccess. But if it has a problem, the .htaccess code will deliver a 403-Forbidden response on a per-resource basis. I do this for the well-known robots that send me traffic. Any robot I don't recognize does not get this selective treatment, though.
My main point is that when dealing with search engines which send you traffic, a "shoot first and ask questions later" approach may hurt you more than it ever hurts them. You'll lose that traffic, someone else will move up a position in their results, and hardly anyone but you and that webmaster will notice.
Jim
Three days ago I found Sygate Personal Firewall which has an IP Ban capability and is very easy to configure. This is the only method I have been able to come up with for a Windows/non-Apache set-up since I don't know Perl, PHP or any other language for that matter.
Zyborg may be a biggy in the robots world but I don't want them on my site every 7 minutes. Checking the last 2 months logs, Zybor has not looked at robots.txt once.
I did e-mail Looksmart and recieved an inane reply which was very pleasant but said nothing.
Great Forum you have here. As an amature I really appreciate and use it and have learned a WHOLE bunch.
Thnx
Chance
Daniele at Looksmart has been a pleasure to work with and my opinion of the company has improved considerably.
I have no plans to add ZyBorg to the "website strippers" category of my browscap.ini file but I will continue to monitor my logs for any possible problems.
-gary.