ahhh, big cheers ukgimp and andreas, it works fine, I forgot to change the pipes in the second example.
You guys are just too good ;)
what you explained is pretty much what I have written down, though I'd still want to search self and sustainable as two seperate words.
If there is more than a single phrase entered as a query, then the query is split word by word and pushed into an array.
The phrase in itself is #1 to be searched for, and if it produces no results, the array of single phrases are searched individually for every element of the array.
andreas,ukgimp, you both know about that directory using the wordid list as category names in the directory....this is the same table to be used.
Then the algo would come down to these factors
1) How many words are in the query (divide their relevance by 1/total)
2) How many categories contain each of the words
3) If these words are categories in the directory, determine what level in the hierarchy they are in, and if it is a defining category (i.e. it is the last category)....then bonuses are applied and the websites within these categories containing the words "self" or "sustainable"
I just plan on using categories of a directory as a heavy influence on weighting search engine results.
A search on "news" brings international news for example, because most likely in the directory there will be a category called "news" that is high up in the category structure and thus gives extra relevance to a generic term.
If someone searches for $country $region news, then the value of each word (country, region, news" are searched for as a whole phrase and compared to the word dictionary......in this case there would be no match, but when the words are cleaned up of dashes and such they can be posted into an array and re-examined for a match.
If there is a category for $country, $region, news, then the elements of the array will match up well with the category
country > region > news
region > news or
anothercountry > region > news
I'm sure you see where I'm going :) I think that things like search phrase order will also have to be taken into account, and generally anything else that moves!
A punnett square might come in handy ;)
At least with the regex provided, there is more of a chance that a query will match a word or phrase.....and maybe with another layer of script dealing with stemming the searches should appear OK