Forum Moderators: DixonJones
"Scooter/3.3_SF" indexes pages.
"Scooter/3.3.vscooter" grabs images.
"vscooter" does not obey my disallowed image directories via robots.txt, but does obey when I disallow "vscooter" itself. The problem is that when I disallow "vscooter", "_SF" obeys the disallow also and will not take my pages.
I emailed the guy who maintains the bots and his answer seemed to be apathetic to my concerns. He cited that AV has 200 of my pages. They do, however many are old, defunct URLs that I have to keep 301 redirects for - mainly because of SEs like AV who haven't updated.
They once had a ton of irrelevant pages just displaying background graphics on a site of mine and weren't adding pages I wanted added. But that was long ago, they have refreshed pages lately and they aren't paid ones, either. Especially homepages, refreshed every 24 or 48 hours - anyway that's what it says.
Are you keeping the 301 redirects for just AV or is there another reason?
I am keeping about 20 301s, about 6 months now, because of all the old URLs out there - everywhere, not just AV. I made the mistake of taking stupid advice from someone when I was new, to name pages with short abbreviations instead of KW or descriptive page names. 20 or more redirects occur each day so I cannot remove them.
Frankly, I was (but not any longer) very surprised that SEs continue to keep old listings, defunct and broken links in their indexes. Wisenut, Looksmart, AltaVista, ATW, et al have old, now defunct URLs of mine. Not much I can do about it except keep the 301s.
... homepages, refreshed every 24 or 48 hours...
It annoys me that AV seemingly cannot write a robot that can read a standardized disallow. This is the only standard they are asked to support after all.
You wrote:
"Scooter/3.3_SF" indexes pages.
"Scooter/3.3.vscooter" grabs images.
Just as an FYI, I've also seen "Scooter/3.3" (non-SF version) out and about recently collecting pages.
If a robots.txt Disallow won't work, then use a 403 on the non-compliant bots while allowing the compliant ones, if that's what you need to do.
Write your robots.txt correctly -- as if the robots do not get confused, and then 403 the violations. When you stop seeing violations in your logs, you'll know they've brought the 'bot into compliance.
Jim
216.39.50.144 - - [14/Nov/2003:00:42:20 -0800] "GET /robots.txt HTTP/1.0" 200 685 "-" "Scooter/3.3_SF"
216.39.50.144 - - [14/Nov/2003:00:42:20 -0800] "GET / HTTP/1.0" 200 11187 "-" "Scooter/3.3_SF"