zero6, below info has been floating around for about a year now, and gives pretty good insight.
Brett if you feel this post is a liability please delete or let me know and I'll take it off.
**** provides a wide range of services to meet the needs of our customers. This list provides the most common issues that must be decided.
When you receive services from ****, you may receive them by means of a Web Mastercluster or a Private Minicluster.
A Mastercluster is a large cluster of interconnected computers, designed to support searching the entire web. When you use a Mastercluster, you share it with other **** customers. By providing shared access to **** resources, **** can deliver search services more efficiently and economically. And, our use of cluster technology allows us to easily expand capacity to grow with our customers.
A Minicluster is a smaller cluster of computers. Whereas Masterclusters typically number more than 100 interconnected computers, a Minicluster typically consists of a smaller number of computers. Miniclusters are usually private and dedicated to a single **** customer. Miniclusters are well-suited to custom search functionality and allow higher frequency of data refresh rate.
When you receive service from **** we will work to come to agreement with you on the following areas:
**** provides database access in the following sizes:
110 million documents
54 million documents
Many **** customers use multiple database sizes. **** can send a percentage of queries to one database tier and the remainder to a different database tier. This can be useful to economically provide complicated queries with access to a larger list of data than might otherwise be possible. For example, a customer might choose to send 80% of their queries to a 54-million document database and the remaining 20% of their queries to the 110-million document database.
Currently, all **** Masterclusters are refreshed approximately monthly.
Physical Access Point
**** provides service from multiple databases in several geographical locations. **** customers are expected to connect to **** datacenters in one of the following locations:
Santa Clara, California (Exodus facility)
Herndon, Virginia (Exodus facility)
Europe (Location TBD, availability 1Q99)
Japan (contact ****)
**** can tag a subset of a MasterCluster index. This enables a customer to create a custom search by specially identifying a useful set of MasterCluster data. For example, if a customer had a list of California web pages, we could tag them all with a single tag that enabled the customer to provide a California focused search. Only documents that already exist in the **** database can be tagged on the Web MasterCluster.
Tagging is implemented in one of two ways. Customers may attach a metaword to selected documents in the index. This method can only be applied when less than 10% of the total web index will be tagged. These tags can only be modified or changed when a page is refreshed on our normal refreshed crawling schedule (approximately once per month.)
The second method allows a customer to tag with a series of bits. These flags may be turned on or off for each URL in the database. Customers may then filter search results based on the values of these bit fields. Bitfield tagging may be done either weekly or semi-weekly.
Customers can choose whether or not to allow other **** customers to access the customersí tags.
****ís Private Minicluster service delivers the maximum amount of flexibility and control to ****ís customers.
We will come to agreement with you regarding the following areas:
Private Minicluster databases can be any size, and actual implementations range from 20,000 documents to 20 million documents. Generally, we will need to estimate the number of pages in advance to determine your hardware needs.
**** can readily accommodate the following data sources:
Customer provides **** with a list of URLís which **** crawls to generate a search index
Customer provides **** with web content that has been preprocessed for indexing. This file is indexed directly, without crawling.
Standard **** refresh options include the following:
**** can apply tags to Minicluster data. This enables a customer to create multiple custom searches out of one piece of crawling infrastructure. Tagging is generally done in conjunction with crawling and indexing.
Physical Access Point
Currently, all Private Miniclusters are deployed in ****ís California facilities.