Forum Moderators: bakedjake
200,129,632 pages indexed and still counting. Sure it is much smaller than ATW or Google, but with the limited resources Matt Dwells can be proud on it.
I've got a site that was completely changed about 6 months ago, including the links and directory structure.
The main index page has been updated, but not any of the interior pages.
I remember 'the good old days' when adding a url got your site completely spidered almost immediately!
I've got pages that are showing an index date that is over a year old! :(
Google last changed it November last year
Just today Google updated the home page:
Searching 3,083,324,652 web pages into:
Searching 3,307,998,701 web pages I've got pages that are showing an index date that is over a year old!
You may notice pages that have index dates from a long time ago, but that may mean that the spider visited them recently and found them unchanged so it did not reindex them. To make things less confusing, I may soon change the index date to a last visited date, ..Matt Dwells in message 16 of gigablast management [webmasterworld.com]
In the same message he also wrote:
With only about 1.5Mbps of bandwidth, $8k of hardware and while serving 500,000 queries per day it is challenging to keep a two hundred million page index fresh ..
It is a popular misconception that Gigablast is default OR.
In reality, Gigablast is default AND and default OR combined. You get the best of both worlds. Default AND results are always displayed before the default OR results.
These two sets of results are separated by a clearly displayed blue bar. This way is better than regular
default AND because if you misspell a word or
enter a long query that has no results there's a good
chance you will get something relevant back without
having to do anything else.
btw, Gigablast should be a little faster now since
I finally upgrade most servers to kernel 2.4.21 (right before 2.4.22 came out, sigh...) So now it doesn't swap out my processes for absolutely no apparent reason. yoo-hoo! i've noticed good speed increases as a result.
Matt Wells
yes indeed there are some old pages still in the index. it is my top priority to take care of that asap. i am currently working on some ways to increase the spider rate by about a factor of 10.
speaking of spidering, it is interesting to note that some financially-richer search engines seem to be following Gigablast's lead in the field of continuous index updating. Gigablast, to the best of my knowledge, was the first search engine to continually refresh its entire index automatically. I consider this to be one, if not the, most complicated technologies in a modern day search engine. There are many many details you have to worry about to make it work and there are many many more things that could go wrong and have disastrous consequences. Once my competitors have a document count that changes in real-time we'll know they've pulled it off, but until then, it's probably not true continuous updating.
matt wells
[webmasterworld.com...]
I do like GigaBlast but am a little disillusioned by all the caching going on by all SE's.