Forum Moderators: open
If you are receiving abuse by Alta Vista's robots, please contact them at:
Crawl Support <crawl-support@av.com>
...if no reply then send to : corporate@support.altavista.com
Make sure to really let them know what is happening on your server. The more details you give them to their activity, they more likely they will reply back to you.
- snip your log files (renaming the files of course
- let them know if they are not obeying the robots.txt file
- Give them your server IP (THAT IS IF YOU HAVE NOTHING TO HIDE) :)
If all goes well you should receive a reply like:
Thanks for contacting us about the machine at IP address xxx.xxx.xxx.xx.
We have forwarded your message to the crawl engineers, and have started to
process the investigation.
We would like to confirm with you if your web site is domainname.com.
Would you please help?For not being crawled by AltaVista crawlers, you may set up robots.txt as -
User-agent: scooter # AltaVista web page search
Disallow: /For the further robots.txt information, you may check the websites at -
[help.altavista.com...]
[info.webcrawler.com...]In addition, here is some general information about our crawler and how it
should normally behave.
The crawler should not be consuming a large percentage of your server's
capacity. In the past, we have limited crawlers so that they receive only
one page per second, but most modern servers are capable of serving far more
than this, and some webmasters complained that not all of their pages were
being indexed. However, the crawler will still limit itself to one request
at a time, and it should wait for a request to finish before starting the
next one. For example, if your server can process a single request in a
tenth of a second, you may get as many as 10 requests per second from the
crawler, and this should be within the load that your server can routinely
bear.This wastes our resources and yours, since a given page should appear
in the index only once. However, some URLs are different enough to confuse
the crawler and make it think the pages are unique. For example, the crawler
knows that URLs like this tend to be similar:
[my.site.com...]
[my.site.com...]
However, some websites have URLs that change without using a question mark,
such as
[my.site.com...]
[my.site.com...]
This may confuse the crawler and cause it to request the same script over
and over, using slightly different URLs each time. In this case, the best
approach is usually to prevent the crawler from accessing the script (or,
alternatively, from accessing the entire script directory).Again, we have started to process the investigation.
Looking forward to your confirmation.
Thanks.sincerely,
AltaVista Crawl Support