Looks impressive so far indeed. I'm really curious about any increase/decrease in relevance, once there's a significant number of sites indexed.
A few things to note, most of which you probably know already:
Always respect robots.txt for all pages.
The spider needs to do some load balancing, so that it doesn't fetch too many pages from the same site in a short time. The recommended ratio is about one page per minute and site (http://www.robotstxt.org/wc/robots.html)
Make sure that the images on your site are served with headers for creation date, size, and expiry date, so that the client can cache them. This will noticeably reduce the bandwidth requirements on your own system.
Only list one of www.example.com/ and www.example.com/index.html (home¦default.htm¦asp¦php, etc.) at least if they contain the same text.
Cluster the results, so that one site can't dominate the SERPs for any keyword combination.
I'm sure there's a lot more work waiting for you... ;)
pyst - While clustering improves matters a lot, it can be very useful to spider deeply. Some sites don't have every topic on the site detailed on the home page, and sometimes a deeper page is really more relevant to a search.
Not spidering sites deeply, and only paying attention to the home pages just encourages people to get a different site for each product. Certainly, the more topics your home page covers, the less likely you are to rank well for the specific topics customers will look up. This is exactly what happens in Yahoo and Looksmart, who only pay attention to the homepage. You end up with whole categories full of one-off sites that are obvious domain spam.
The site has been down for me for a couple of days now. Matt, are you still reading post in the forum? I was just wondering if there was some kind of major problem or are you just doing some more fine tuning??