the index is "aware" of 40M pages although not all of them have content that ends up in the index files. I do not have any major plans to bump that number up significantly; i prefer to try and keep it fresher (but this always is a time limiting thing anyway).
there are details on shippo and how to optimize (if you so care) or how the technology works on the web site in the 'about' section. I thikn the technology page also has a link to a file whic is output from the spider so you can see how that works if you (or anyone) wants to get involved. My new gig is taking up most of my time so i am not really focused on new features (hence the semi-open-sourcitizing attempts), but rather maintaining it and maybe a tweak here and there.