Forum Moderators: DixonJones
Mozilla/4.0 compatible ZyBorg/1.0 Dead Link Checker (wn.zyborg@looksmart.net; http.//www.WISEnutbot.com)
Am I the only one noticing this, or am I just the only one that bugs this?
What also confuses me is this: I'm serving the trap-page correctly with a 200 and everything. But they keep coming back (almost every day the last 10 days). How much more alive do they want this link? I haven't checked, but if the grab all my pages on that basis (daily) then I have to think about banning them, I can't handle that much traffic just because looksmart wants to look smart...
Opinions/comments welcome.
With a high degree of regularity I see DLC repeatedly ask for the same dead link it looked for the last time. And....as with every other time (and there have been many), it again asks for the same dead link.
Thing about those dead links is that I put up a series of 301 redirects to resolve them (re-structured site - pathways changed) well over a year ago. To date, LS is the only one to NOT follow it's own logic.
The thing is though, that it is NOT a dead link...but a link off limits by robots.txt, and the first time in almost two years that it's requested by a bot that belongs to a known SE.
But from your statement I take it that this is not unusual for this bot, and by the time frame you mentioned there is no hope that it will change to a better or at least acceptable way.
Thanks, and happy new year
By rights, it's important to communicate with the bot owner with the hopes that they'll modify it. Sometimes they do/will and sometimes they don't even honour you with a reply.
I know 'stupid' isn't the proper way to describe the bot, but if I start talking about the designers I'd get real upset and since bots are sorta inanimate...Stupid is as Stupid does. <g>
I can only tolerate this 'hostage' activity for a certain time and then the claws are gonna come out...
If I brought out the claws now, I'd risk losing the traffic. Therefore, I am a hostage who is forced to simply ignore it.
@Visi: No, it's not grub - I've had problems with that one too. At least I don't think so by looking at the UA-string I posted and Stefan confirmed. If I'd ban (thru htaccess or ISAPI-rewrite resp.) I'd run the risks pendanticist mentioned: noticably losing traffic
[...]
We do support the Robot Exclusion protocol, and aim to refresh our robot rule data for each host on a weekly basis. This dead link checker only runs against URLs that are in the WiseNut index, so if it is violating your /robots.txt file, the crawler that collects the information for indexing may be also.
[...]
curious what the next reply is
What they are trying to tell you is that DLC is not a robot - it does not crawl. It is a list-checker. It checks URLs that they already have in their list to see if the URL is still good.
Generally, this means that at the time they generated the list, the file was not excluded in your robots.txt, or that your robots.txt had syntax errors at that time, or that their crawler had a bug at that time.
They ought to fix this, of course, but the main problem is that to which pendanticist refers - you can feed them a 301 or a 404 or 410 for a long, long time before they finally take your word for it that the resource in moved, not found or gone.
They need to respond to 301, 302, 403, 404, and 410 quicker and correctly, and they need to quit doing GETs and start doing Conditional GETS or HEADs. But they don't need to check robots.txt for what they are doing.
My main point here is to clarify that the Dead Link Checker is not a robot as defined by the Standard for Robots Exclusion, and so should not be expected to fetch and obey robots.txt. This is not an excuse, this is a statement that if necessary, you should take steps that do not rely on the user-agent's voluntary cooperation, as robots.txt does.
Jim
<edit> speling </edit>
That does not matter anymore, as they (Looksmart) agreed that it is a bug in the software and promised to fix the bug and inform me when they did.
Problem solved - as soon as the patch is applied.