Forum Moderators: open

Message Too Old, No Replies

AV Scooter 1.0

images

         

wilderness

12:57 pm on Sep 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



over the past few days AV has been disregardding my robots. Spidering images in the process.
As a result I have denied the range 216.39.48.

I may add the entire AV block in the near future.

wilderness

10:23 pm on Oct 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just a follow up on this.
I limited my deny range to 216.39.48.

Over the past few days Scooter 1.0 has returned gathering images. Direct links from a 216.39.50.
As a result I added a SetEnvIF ^Scooter
for all scooter's.

Bye AV :-(

wilderness

1:04 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



a portion of a reply from AV's crawl support:

"(One thing to keep in mind is that AltaVista knows nothing about the design of your site, and therefore cannot evaluate, test, or verify your robots.txt file. In the most general terms, if your site contains pages you don't want us to access, you know where they are, but we don't.) "

jdMorgan

1:27 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



wilderness,

I've had Scooter/1.0 requesting disallowed images as well. I sent them a report, along with a relevant portion of my server log showing the 'bot requesting and then ignoring robots.txt. I invited them to take a look at my working, verified robots.txt as well.

Because of the info I sent them, they will not be able to claim that they don't know the structure of my site - Among others, my robots.txt says:

User-agent: *
Disallow: /images/

The log file clearly shows fetches from this subdirectory subsequent to fetching robots.txt.

We'll see what they respond with. I'll paraphrase it here if it's interesting. Either they've got a bug, or Brett's robots.txt validator and several others on the Web need some improvements. Googlebot is covered by the same User-agent: * directive, and it (as expected) does not request these files.

Jim

fiestagirl

9:44 pm on Oct 9, 2002 (gmt 0)

10+ Year Member



Just a heads up..

I also have them coming from 216.39.50.* now.

volatilegx

11:24 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



fiestagirl, are any of those requests resolving to an AV hostname?

fiestagirl

11:39 pm on Oct 9, 2002 (gmt 0)

10+ Year Member



216.39.50.5 -> vscooter.sv.av.com
216.39.50.7 -> mmbuild7.sv.av.com
216.39.50.16 -> mmbuild2.sv.av.com
216.39.50.21 -> qaserver2.sv.av.com
216.39.50.105 -> mmbuild1.sv.av.com
216.39.50.106 -> mmbuild4.sv.av.com
216.39.50.107 -> mmbuild5.sv.av.com
216.39.50.108 -> mmbuild6.sv.av.com
216.39.50.109 -> mmbuild8.sv.av.com
216.39.50.230 -> bigip7a.sv.av.com
216.39.50.231 -> bigip7.sv.av.com
216.39.50.232 -> bigip8.sv.av.com

The rest do not resolve. Although it appears that AV owns all of 216.39.48.0 - 216.39.63.255

wilderness

3:35 am on Oct 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hey Jim,
I've sent you a sticky with the full mails.
The reply they sent was a bunch of hogwash.
As you see by final reply to them. :-(

jdMorgan

3:58 am on Oct 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



fiestagirl,

Is Scooter requesting files on your site disallowed in robots.txt?
Have you had any problems with any other spiders fetching disallowed files?
Have you verified your robots.txt with Brett's robots.txt checker?
Do you have a special Disallow section for Scooter, as opposed to a general "User-agent: *"?

In my case, the answers are Yes, No, Yes, and No.

I suspect that everyone's answers are the same, but I want to make sure we're all on the same page here, before some tech newsie mines this thread for a story or something...

Thanks,
Jim

fiestagirl

4:46 am on Oct 10, 2002 (gmt 0)

10+ Year Member



I was just giving a heads up on the new ip address. In case anyone was banning them by ip, not just ua.

But the answers are:
no - scooter is not requesting files that are disallowed.
no - I haven't had problems with any other legitimate SE spiders.
yes - the robots.txt has been verified.
and no- I don't have a special disallow section for scooter.

jdMorgan

5:02 am on Oct 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



fiestagirl,

Thanks for the reply.

I sure can't figure this out, then. If Scooter/1.0 is not requesting disallowed files from your site, then that implies that I may have a problem with my robots.txt... I don't know how, though. It's been validated and has been obeyed for years... Everybody else (Google/Fast/Ink/Jeeves/LS/Lycos/Teoma/etc.) seems to be obeying it, and they are all covered by the same User-agent: * section.

I have not blocked Scooter/1.0. I'm going to let it continue as an experiment, and see if it refetches robots.txt and changes behaviour, or if anything else interesting happens. I've sent a formail to AV tech support and hopefully, they'll check into it. If I learn anything, I'll post it.

Jim

volatilegx

5:13 pm on Oct 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



thanks everybody

tourist

9:24 pm on Nov 19, 2002 (gmt 0)

10+ Year Member



Is there any further news? jdMorgan, maybe?

It's been two weeks since Scooter/1.0 dropped by, but it has ALWAYS disregarded my robots.txt...

To join others, I can say my answers are "Yes, no, yes, no."

The vaguely amusing part is that Scooter has always grabbed robots.txt on each visit - Of course, it then turned around and stuck its fingers where it shouldn't. :(

jdMorgan

9:58 pm on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



tourist,

I submitted a feedback form tp AV on October 8th, reporting the robots.txt violations, and have received only an auto-response.

However, Scooter/1.0 stopped visiting shortly thereafter.

AV may have simply added my site to an exclusion list for Scooter/1.0 to stop it. Scooter/3.2 still visits almost daily, and behaves well. My site listings on AV all carry the "Refreshed in the last 24 (or 48) hours" flag.

Because Scooter/1.0 doesn't seem to be visiting anymore, I'll have to rely on others to report current status.

Jim

wilderness

10:56 pm on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<I'll have to rely on others to report current status.>

Hey Jim,
I mentioned AV in passing in another thread.

I'm not so sure they have ceased? It seems more like a scheduled event?
I was visited by 1.0 throughout October then the activity ceased perhaps a week after me email inquiry to AV.
Then in early November it began again.

Although the 1.0 attempts to directly spider images disallowed in folders conatined in robots, have at least momentarily ceased.

I'm almost willing to wager that it begins again either before the end of the month or just after the 1st.

As I compose this I sit here pondering the possibilites!

Is it possible this is all related to webcollage?
All the webcollage image links come in to my sites as referr AV! These currently result in 403.
Perhaps it's possible that 1.0 is just doing VERY MISGUIDED honest work?

jdMorgan

11:35 pm on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



wilderness,

Yes, it's possible we submitted our problem reports to AV coincindental with the scheduled end of a spidering run. As a result, I'm still not ready to draw any conclusions.

As to WebCollage, I've had it come in with several different search engines as the referrer, not just AV.

I hope AV gets Scooter/1.0 fixed with regard to robots.txt and gets their marketing act together. The more search engines we have, the less dependence we'll have on any one... It is admittedly possible that my wishes are somehow related to my position in AV SERPs, though. ;)

If I see Scooter/1.0 come back here, I'll post on whether it obeys robots.txt.

Jim