Forum Moderators: open
"(One thing to keep in mind is that AltaVista knows nothing about the design of your site, and therefore cannot evaluate, test, or verify your robots.txt file. In the most general terms, if your site contains pages you don't want us to access, you know where they are, but we don't.) "
I've had Scooter/1.0 requesting disallowed images as well. I sent them a report, along with a relevant portion of my server log showing the 'bot requesting and then ignoring robots.txt. I invited them to take a look at my working, verified robots.txt as well.
Because of the info I sent them, they will not be able to claim that they don't know the structure of my site - Among others, my robots.txt says:
User-agent: *
Disallow: /images/
The log file clearly shows fetches from this subdirectory subsequent to fetching robots.txt.
We'll see what they respond with. I'll paraphrase it here if it's interesting. Either they've got a bug, or Brett's robots.txt validator and several others on the Web need some improvements. Googlebot is covered by the same User-agent: * directive, and it (as expected) does not request these files.
Jim
The rest do not resolve. Although it appears that AV owns all of 216.39.48.0 - 216.39.63.255
Is Scooter requesting files on your site disallowed in robots.txt?
Have you had any problems with any other spiders fetching disallowed files?
Have you verified your robots.txt with Brett's robots.txt checker?
Do you have a special Disallow section for Scooter, as opposed to a general "User-agent: *"?
In my case, the answers are Yes, No, Yes, and No.
I suspect that everyone's answers are the same, but I want to make sure we're all on the same page here, before some tech newsie mines this thread for a story or something...
Thanks,
Jim
But the answers are:
no - scooter is not requesting files that are disallowed.
no - I haven't had problems with any other legitimate SE spiders.
yes - the robots.txt has been verified.
and no- I don't have a special disallow section for scooter.
Thanks for the reply.
I sure can't figure this out, then. If Scooter/1.0 is not requesting disallowed files from your site, then that implies that I may have a problem with my robots.txt... I don't know how, though. It's been validated and has been obeyed for years... Everybody else (Google/Fast/Ink/Jeeves/LS/Lycos/Teoma/etc.) seems to be obeying it, and they are all covered by the same User-agent: * section.
I have not blocked Scooter/1.0. I'm going to let it continue as an experiment, and see if it refetches robots.txt and changes behaviour, or if anything else interesting happens. I've sent a formail to AV tech support and hopefully, they'll check into it. If I learn anything, I'll post it.
Jim
It's been two weeks since Scooter/1.0 dropped by, but it has ALWAYS disregarded my robots.txt...
To join others, I can say my answers are "Yes, no, yes, no."
The vaguely amusing part is that Scooter has always grabbed robots.txt on each visit - Of course, it then turned around and stuck its fingers where it shouldn't. :(
I submitted a feedback form tp AV on October 8th, reporting the robots.txt violations, and have received only an auto-response.
However, Scooter/1.0 stopped visiting shortly thereafter.
AV may have simply added my site to an exclusion list for Scooter/1.0 to stop it. Scooter/3.2 still visits almost daily, and behaves well. My site listings on AV all carry the "Refreshed in the last 24 (or 48) hours" flag.
Because Scooter/1.0 doesn't seem to be visiting anymore, I'll have to rely on others to report current status.
Jim
Hey Jim,
I mentioned AV in passing in another thread.
I'm not so sure they have ceased? It seems more like a scheduled event?
I was visited by 1.0 throughout October then the activity ceased perhaps a week after me email inquiry to AV.
Then in early November it began again.
Although the 1.0 attempts to directly spider images disallowed in folders conatined in robots, have at least momentarily ceased.
I'm almost willing to wager that it begins again either before the end of the month or just after the 1st.
As I compose this I sit here pondering the possibilites!
Is it possible this is all related to webcollage?
All the webcollage image links come in to my sites as referr AV! These currently result in 403.
Perhaps it's possible that 1.0 is just doing VERY MISGUIDED honest work?
Yes, it's possible we submitted our problem reports to AV coincindental with the scheduled end of a spidering run. As a result, I'm still not ready to draw any conclusions.
As to WebCollage, I've had it come in with several different search engines as the referrer, not just AV.
I hope AV gets Scooter/1.0 fixed with regard to robots.txt and gets their marketing act together. The more search engines we have, the less dependence we'll have on any one... It is admittedly possible that my wishes are somehow related to my position in AV SERPs, though. ;)
If I see Scooter/1.0 come back here, I'll post on whether it obeys robots.txt.
Jim