I have the following info in the db for every page:
- title
- description (255)
- keywords
- the page text itself
If I use MySQL I can't really customize it because you have to recompile to change your stopwords or to be able to search for words shorter than 4 characters. I can't really do that I have to tell my sysadmin to do it, and he probably won't.
Will using custom search hit the performance compared to MySQL?
If you plan to just use PHP's ereg() or similar to search through text fields, yes, it's likely that MySql would be significantly faster, especially as the number of indexed documents grows.
If you write your search routines to pre-score your documents during the indexing process, then you may be able to get PHP to run faster (although at the expense of indexing speed, and maybe precision of search results depending on how you approach it).
Maybe you could go for a hybrid approach? You could pre-filter your documents for stop words before adding them to the MySql db, which would give you the flexibility to change stop words on every index if you wanted.
You're still left with the problem of searching for four character words, but I can't see an efficient way around that (a kludge comes to mind, but I need to think about it for a while ;) )
Nope, you don't have to. There's various routes to take on pre-scoring, but at its simplest it involves doing a density analysis on the document you're indexing and assigning each word a 'score' for that document. When you come to searching for a phrase, you can add up the scores for each individual word to get a reasonably relevant result, especially on a document set that you control, and don't have to worry about spammers playing with your algo ;)
We had a good discussion on this subject late last year:
[webmasterworld.com...]
Of course, the major search engines will take a far more sophisticated approach, but I'm afraid that's way beyond the bounds of my knowledge :)