homepage Welcome to WebmasterWorld Guest from 54.226.43.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
related stories/text algorthm/code?
brandon0401




msg:3487893
 12:10 am on Oct 26, 2007 (gmt 0)

Hey guys, I havnt been able to figure it out, does anyone have code that compares lets say paragraph of description text to other descriptions in db and selects records that their descriptions are related?

I have been thinking about stripping out certain words, just to keep subject type of words in, but havnt been able to get close.

Any help would be greatly appreciated, thanks!

 

phparion




msg:3488106
 7:47 am on Oct 26, 2007 (gmt 0)

1 - store the posted description in a variable
2 - use SELECT * FROM db_table WHERE description LIKE '%storedVar%'

if you want to do it yourself then follow the above steps. If you want us to do it for you then keep waiting ;)

brandon0401




msg:3490805
 7:15 pm on Oct 29, 2007 (gmt 0)

Well how does that get related? It will only find stories with exact description as the storedvar? That seems to be limited...I want to get related to topics, names, events, etc...Thanks!

d40sithui




msg:3490874
 8:30 pm on Oct 29, 2007 (gmt 0)

this is kind of slow, but u can try assiging your description paragraph to an array(filter out prepositions 'on' 'in' 'up' etc, conjunctions 'and' 'for' 'nor' 'yet',etc) by using explode() or split(). loop through this array and compare each value to the db with the like% thing like phparion suggested. you're bound to find a lot of results. u can enhance ur results by coding so the result with the most matches go on top.

victor




msg:3490898
 9:03 pm on Oct 29, 2007 (gmt 0)

Use similarity metrics.

Essentially, a count of the number of edits needed to change each database paragraph into the target paragraph. The paragraphs with the lowest counts are "most similar".

You may need to calculate using several different metrics and combine the scores in a weighted way that meets your application's quirks.

It's what I do, and it works very well.

Wikiedia for Jaccard Index to get yourself started.

brandon0401




msg:3491195
 6:38 am on Oct 30, 2007 (gmt 0)

thanks for the reply guys, do you think you could share some possible code for it, anyone out there have any? Thanks!

victor




msg:3492751
 4:38 pm on Oct 31, 2007 (gmt 0)

I generally use REBOL rather than PHP for server side scripting, so the similarity matrix code I use is in that language.

I ma not aware of any canned code in PHP. To find the code I use, search with google for
rebol simetrics

You may be able to recast the algorithms into PHP given the above reference implementation.

PHP_Chimp




msg:3492856
 5:50 pm on Oct 31, 2007 (gmt 0)

There is something on SourceForge that may help -

[sourceforge.net...]

joelgreen




msg:3493532
 1:29 pm on Nov 1, 2007 (gmt 0)

think fulltext search would help.

[dev.mysql.com...]

It will match even if some words are missing / rearranged. Result set can be sorted by relevance. Works much faster than LIKE. But uses more disk space than regular indices.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved