Forum Moderators: coopster & phranque

Message Too Old, No Replies

Custom internal search with PHP vs MySQL's fulltext index

         

martin

11:59 pm on Aug 7, 2002 (gmt 0)

10+ Year Member



I am wondering if I should build myself a search engine from scratch or use MySQL's fulltext indexing instead.

I have the following info in the db for every page:
- title
- description (255)
- keywords
- the page text itself

If I use MySQL I can't really customize it because you have to recompile to change your stopwords or to be able to search for words shorter than 4 characters. I can't really do that I have to tell my sysadmin to do it, and he probably won't.

Will using custom search hit the performance compared to MySQL?

sugarkane

12:16 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It depends how you write your search routines :)

If you plan to just use PHP's ereg() or similar to search through text fields, yes, it's likely that MySql would be significantly faster, especially as the number of indexed documents grows.

If you write your search routines to pre-score your documents during the indexing process, then you may be able to get PHP to run faster (although at the expense of indexing speed, and maybe precision of search results depending on how you approach it).

Maybe you could go for a hybrid approach? You could pre-filter your documents for stop words before adding them to the MySql db, which would give you the flexibility to change stop words on every index if you wanted.

You're still left with the problem of searching for four character words, but I can't see an efficient way around that (a kludge comes to mind, but I need to think about it for a while ;) )

martin

12:08 am on Aug 9, 2002 (gmt 0)

10+ Year Member



>If you write your search routines to pre-score your documents during the indexing process, then you may be able to get PHP to run faster

Well, the problem is I don't quite know how to pre-score the pages, I can't possibly figure out all keyword searches. Is there any known algorithm for this?

sugarkane

10:39 am on Aug 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> can't possibly figure out all keyword searches

Nope, you don't have to. There's various routes to take on pre-scoring, but at its simplest it involves doing a density analysis on the document you're indexing and assigning each word a 'score' for that document. When you come to searching for a phrase, you can add up the scores for each individual word to get a reasonably relevant result, especially on a document set that you control, and don't have to worry about spammers playing with your algo ;)

We had a good discussion on this subject late last year:

[webmasterworld.com...]

Of course, the major search engines will take a far more sophisticated approach, but I'm afraid that's way beyond the bounds of my knowledge :)

martin

11:42 pm on Aug 9, 2002 (gmt 0)

10+ Year Member



Thanks for the tips.