Forum Moderators: open

Message Too Old, No Replies

MSSQL [taboo / vulgar] word search implementation

how can I find a set of characters anywhere in the text columns of a table

         

emsaw

12:32 am on May 26, 2006 (gmt 0)

10+ Year Member



I've been looking around for a while now, and can't seem to come to terms with the options I've found so far.

FreeText/FreeTextTable ¦¦ Contains/ContainsTable only appear to work if the value searching for is surrounded by a word boundary, i.e. searching for 'mit' only returns rows that have the distinct value of 'mit' in it, but I want 'kismit', 'smith', and 'mittens'. I know I can use *, but that appears to be only available to the suffix, so '"mit*"' could return 'mittens', but not 'kismit' or 'smith'.

So am I stuck with using LIKE, and looping through my dictionary of words to search for?

If I have to I will, but performance is going to, uh, smell, isn't it?

Does somebody have a magic wand for me? Ideas?

Thanks,

Mark

aspdaddy

5:50 pm on Jun 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can use LIKE in a join with % or * either side, but it will do a table scan so its only any use with a small list of stopwords. You can also use (charindex > 0) in SQL Joins.

I dont have any SQL code with me at the momment but heres one in Access that is used on a few forums:

SELECT * FROM tblStopWords INNER JOIN tblPosts ON tblPosts.txtPost LIKE "*" & tblStopWords.txtWord & "*"

emsaw

5:45 pm on Jun 2, 2006 (gmt 0)

10+ Year Member



aspdaddy,

I did end up just going with the

... WHERE fieldName LIKE '%' + stopWord + '%'

I am only processing a small number of rows each time, but I am disappointed that I couldn't find another (better, more efficient) solution.

I'll come back to it when I see noticeable performance degradation.

Thanks ,

Mark

lovethecoast

4:36 pm on Jun 7, 2006 (gmt 0)

10+ Year Member



Another solution to this would be to catch bad words *before* they go into the database.

Setup a table with all your bad words and what their replacement should be (if you want it unique -- else, don't bother).

When a post is made, run it through this table via code (vb, stored procedure, trigger, etc -- however you put your post into the DB). Replace bad words with their replacement and be done with it.

Looking for bad words on display means you have to do it 500 times per post. The method above means you do it once -- when the post was made. Much, MUCH less server intensive.

If you go with the above suggestion, you could then write some code (again, vb, stored procedure, etc) that will go through and globally replace any existing bad words. Again - a one time shot verses doing it every, single, time a post is read.

S