Scooter varieties

Forum Moderators: DixonJones

Message Too Old, No Replies

Scooter varieties

keyplyr

12:45 am on Nov 14, 2003 (gmt 0)

AltaVista has 2 bots that have been coming round a lot.

"Scooter/3.3_SF" indexes pages.

"Scooter/3.3.vscooter" grabs images.

"vscooter" does not obey my disallowed image directories via robots.txt, but does obey when I disallow "vscooter" itself. The problem is that when I disallow "vscooter", "_SF" obeys the disallow also and will not take my pages.

I emailed the guy who maintains the bots and his answer seemed to be apathetic to my concerns. He cited that AV has 200 of my pages. They do, however many are old, defunct URLs that I have to keep 301 redirects for - mainly because of SEs like AV who haven't updated.

Marcia

1:33 am on Nov 14, 2003 (gmt 0)

Are you keeping the 301 redirects for just AV or is there another reason?

They once had a ton of irrelevant pages just displaying background graphics on a site of mine and weren't adding pages I wanted added. But that was long ago, they have refreshed pages lately and they aren't paid ones, either. Especially homepages, refreshed every 24 or 48 hours - anyway that's what it says.

keyplyr

3:02 am on Nov 14, 2003 (gmt 0)

Are you keeping the 301 redirects for just AV or is there another reason?

I am keeping about 20 301s, about 6 months now, because of all the old URLs out there - everywhere, not just AV. I made the mistake of taking stupid advice from someone when I was new, to name pages with short abbreviations instead of KW or descriptive page names. 20 or more redirects occur each day so I cannot remove them.

Frankly, I was (but not any longer) very surprised that SEs continue to keep old listings, defunct and broken links in their indexes. Wisenut, Looksmart, AltaVista, ATW, et al have old, now defunct URLs of mine. Not much I can do about it except keep the 301s.

... homepages, refreshed every 24 or 48 hours...

Well they can't refresh mine if they don't crawl it and I'm not about to let them put thousands of our photos in their image search just so they can get our homepage, especially with very low traffic from them.

It annoys me that AV seemingly cannot write a robot that can read a standardized disallow. This is the only standard they are asked to support after all.

jdMorgan

3:36 am on Nov 14, 2003 (gmt 0)

Keyplyr,

You wrote:
"Scooter/3.3_SF" indexes pages.
"Scooter/3.3.vscooter" grabs images.

Just as an FYI, I've also seen "Scooter/3.3" (non-SF version) out and about recently collecting pages.

If a robots.txt Disallow won't work, then use a 403 on the non-compliant bots while allowing the compliant ones, if that's what you need to do.

Write your robots.txt correctly -- as if the robots do not get confused, and then 403 the violations. When you stop seeing violations in your logs, you'll know they've brought the 'bot into compliance.

Jim

keyplyr

9:46 am on Nov 14, 2003 (gmt 0)

Sucess! Thanks Jim; I removed the specific disallow in robots.txt and 403'd vscooter in .htaccess. This opened the way for my index page to be crawled:


216.39.50.144 - - [14/Nov/2003:00:42:20 -0800] "GET /robots.txt HTTP/1.0" 200 685 "-" "Scooter/3.3_SF"
216.39.50.144 - - [14/Nov/2003:00:42:20 -0800] "GET / HTTP/1.0" 200 11187 "-" "Scooter/3.3_SF"

Ya know, you'd think that a veteran engine like AV would have their act together by now.