Chris_D - 2:05 am on Apr 20, 2004 (gmt 0) Nutch says: Gigablast says: and according to www.gigablast.com, Gigablast has I know which one will have the lower overhead cost structure - Gigablast is already over 200 million pages running on 8 PCs!
Its actually quite interesting to compare Gigablast with other 'up and coming' engines like Nutch.
Our current goal is to create a good-sized public demo that can handle moderate traffic. Even this takes a fair amount of hardware and bandwidth. Fortunately, the Internet Archive has donated bandwidth, so all that we need now is hardware. We estimate that a two-hundred-million page demo system that can handle moderate traffic will require less than $200,000 in hardware.
Gigablast is a search engine that I've been working on for about the last three years. I wrote it entirely from scratch in C++. The only external tool or library I use is the zlib compression library. It runs on eight desktop machines, each with four 160-GB IDE hard drives, two gigs of RAM, and one 2.6-GHz Intel processor. It can hold up to 320 million Web pages (on 5 TB), handle about 40 queries per second and spider about eight million pages per day. Currently it serves half a million queries per day to various clients, including some meta search engines and some pay-per-click engines.
273,661,136 pages indexed
and according to www.gigablast.com, Gigablast has
I know which one will have the lower overhead cost structure - Gigablast is already over 200 million pages running on 8 PCs!