I'm running nutch on a number of projects. It's fast, and lets one index a huge volume of data - a great engine for search. But it's difficult to set up (it's java based instead of the typical LAMP stuff I'm used to) and not so easy to customize without a skilled developer.
I just ran across sphinx search, another GPL'ed engine based on mysql and php apparently.
Has anyone experience with both of these to compare or contrast these two methods? (Sphinx search is claiming terrabytes of data now, index size being one of the reasons I didn't go with a php/mysql setup before).
I just built and installed Sphinx. It definitely uses MySQL as well as some of it's own files for indexing. I think the way it uses MySQL is just as a data source (as opposed to html files you'd get from your web site).
I ran through the "Quick Sphinx usage tour" in the instructions and it seems to work but it doesn't feel very robust. I had to create a couple of the directories it expected and it wasn't trivial to run it as a non-root user.
I also found that the instructions mean /usr/local/etc when they say /usr/local/sphinx/etc.
I haven't configured Nutch, only been a user on it, but I think I might install that now for comparison.