Welcome to WebmasterWorld Guest from 54.158.143.40

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

related stories/text algorthm/code?

   
12:10 am on Oct 26, 2007 (gmt 0)

5+ Year Member



Hey guys, I havnt been able to figure it out, does anyone have code that compares lets say paragraph of description text to other descriptions in db and selects records that their descriptions are related?

I have been thinking about stripping out certain words, just to keep subject type of words in, but havnt been able to get close.

Any help would be greatly appreciated, thanks!

7:47 am on Oct 26, 2007 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



1 - store the posted description in a variable
2 - use SELECT * FROM db_table WHERE description LIKE '%storedVar%'

if you want to do it yourself then follow the above steps. If you want us to do it for you then keep waiting ;)

7:15 pm on Oct 29, 2007 (gmt 0)

5+ Year Member



Well how does that get related? It will only find stories with exact description as the storedvar? That seems to be limited...I want to get related to topics, names, events, etc...Thanks!
8:30 pm on Oct 29, 2007 (gmt 0)

5+ Year Member



this is kind of slow, but u can try assiging your description paragraph to an array(filter out prepositions 'on' 'in' 'up' etc, conjunctions 'and' 'for' 'nor' 'yet',etc) by using explode() or split(). loop through this array and compare each value to the db with the like% thing like phparion suggested. you're bound to find a lot of results. u can enhance ur results by coding so the result with the most matches go on top.
9:03 pm on Oct 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Use similarity metrics.

Essentially, a count of the number of edits needed to change each database paragraph into the target paragraph. The paragraphs with the lowest counts are "most similar".

You may need to calculate using several different metrics and combine the scores in a weighted way that meets your application's quirks.

It's what I do, and it works very well.

Wikiedia for Jaccard Index to get yourself started.

6:38 am on Oct 30, 2007 (gmt 0)

5+ Year Member



thanks for the reply guys, do you think you could share some possible code for it, anyone out there have any? Thanks!
4:38 pm on Oct 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I generally use REBOL rather than PHP for server side scripting, so the similarity matrix code I use is in that language.

I ma not aware of any canned code in PHP. To find the code I use, search with google for
rebol simetrics

You may be able to recast the algorithms into PHP given the above reference implementation.

5:50 pm on Oct 31, 2007 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



There is something on SourceForge that may help -

[sourceforge.net...]

1:29 pm on Nov 1, 2007 (gmt 0)

5+ Year Member



think fulltext search would help.

[dev.mysql.com...]

It will match even if some words are missing / rearranged. Result set can be sorted by relevance. Works much faster than LIKE. But uses more disk space than regular indices.