Forum Moderators: open
The only pages we've found actually added to the database for a new site were from a small batch submitted before the PFI date and some up to 20 and 21 June but none of the more recent submissions. Also, not one of several hundred new pages repeatedly spidered by Scooter is getting in - it looks like you have to manually submit (or pay!)
Any word on a turnaround time for them to fix this problem Brett?
They've locked up one server today grabbing 300k of information per second. Seriously thinking of disallowing everything except the root index pages.
So as of last week, I started banning them in my .htaccess file...they now get a 403 every time they hit my site (and they have 3 times since). I don't know about the rest of the community, but I expect something in exchange for the frustration of watching the thing slow down my site, and fill my logs with useless requests.
If they index me (and they should have at least 10 copies of the whole bloody site) I'll consider taking off the ban. Till then, I'm not letting altavista have my stuff. It wasn't time for their server, since they won't index anyway, and it waste's my time, because the only reason i go to Atavista [altavista.com] anymore is to see if the pages got listed. I'm not about to actually "search" for anything there.
sort of ....
the visitor actually came from ask.co.uk which showed a result from altavista.com for an 8 word search term ....
but it's still an altavista result isn't it? please, someone tell me its an altavista result ?? i'll be ever so disappointed if it isnt ..... LOL
"We don't consider this activity an "attack" or "security breach" unless any of the following occur: "
Well 1 and 5 is happening at this moment.
The accesses to your machine are most likely the result of Scooter's
normal operation. Scooter usually finds your site by following a link
from another page somewhere on the Internet. We don't consider this
activity an "attack" or "security breach" unless any of the following
occur:1) Requests for the same URL over and over, or opening many connections
at once, denying service to other legitimate users. (Scooter opens only
one connection at a time to any given server and waits before visiting
again).2) "Scanning" multiple ports in sequence to find all services. (Scooter
only visits URLs it has heard of before, and doesn't try to "guess"
URLs. Scooter will usually come to port 80 unless a link was found or
submitted with an alternate port number).3) "Cracking" or trying to guess passwords. (Scooter will never provide
an http password, even if one is given in the URL, and will never try to
guess passwords).4) Using any protocol besides HTTP. (Scooter may occasionally connect
to a port that is not a web server, if it finds something that looks
like a URL containing a colon followed by that number. If the service
at that port doesn't respond to HTTP commands, Scooter will break the
connection, and will not attempt to negotiate some other protocol).5)Ignoring your robots.txt (Scooter requests "/robots.txt" first and
respects the directions given there, and scooter will periodically
request a new copy of /robots.txt in case it is new or has changed).
By far the best way to get Scooter to leave your site alone is to write
a robots.txt and place it at the top level of the web documents tree.
Scooter (and numerous other robot crawlers) will recognize and obey
these directions. Scooter has no way of knowing whether you consider
something "private" or "unauthorized", it simply follows anything that
looks like a link to a web page. If your host is an "internal use only"
server, you may instead choose to block web access at the router using a
firewall or screen, or at your web server by restricting access by IP
number or subnet. Alternatively, you can establish a "public" service on
one port, and a "private" service on another port, using IP address
rules. This is a more reliable way to protect sensitive or confidential
documents, because it blocks out all access, not just robots access. If
Scooter cannot reach your site because it is blocked at the router, it
will eventually give up and stop trying.