Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: open
right now altavistas crawlers are consuming several hundred
gigabytes. is anyone seeing the same? although i like altavista - if they continue im afraid that i have to ban their robots - the traffic they cause doesnt hold against the few referrals from this search engine (although i cant complain about the rankings ...)
here are the crawler id's/UA's:
any ideas? (or promises from altavista? :)
We have a new robots.txt forum [webmasterworld.com] where you could ask about keeping Scooter out of your large multimedia files, but allow it to crawl your other pages.
It'll be interesting to see what the AV-bashers have to say if the recent changes at G drive significant search traffic over to AV... I'm kinda glad to see an old friend back, myself.
[edited by: jdMorgan at 9:09 pm (utc) on Dec. 9, 2003]
and - the user agent is NOT scooter - the machines are listed above - the new user agent is "3.3.vscooter".
the crawler/grabber is "only" interested in multimedia files - if you have large image collections or huge video or audiofiles - beware!
another resource includes:
eventually we really have to block the entire thing -
not a clever idea from altavista. i can not appriciate it if they hit the same huge files over and over (most likely the get confused with all the mirror domains we own...)
action: i have deactivated/rerouted most of my mirror domains and i have banned altavista from my multimedia folder via robots.txt.
result: the bandwith consumed from altavistas multimedia
crawlers has been greatly reduced (from several gigabytes
to a few hundred megabytes)
conclusion: 1 - altavistas crawler are NOT capable to identify multiple instances of the same file BEFORE they download the entire thing (not sure if they do it afterwards either ...) - this could be a loophole to get multimedia content into altavista
2 altavistas crawler do not follow the robots.txt protocol - or it takes some time before they reread the file - another flaw in altavista's technology
3 i wrote to them two weeks ago (corporate marketing) and i am still awaiting an answer - their customer relationship management is not very responsive ...