homepage Welcome to WebmasterWorld Guest from 54.147.196.159
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
anyone know how wikipedia does string matches on all it's terms
jamie

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4392861 posted 10:06 am on Dec 1, 2011 (gmt 0)

hi,

i have to implement a string match involving persons names. we have a database of 5,000 names. these have to be detected in short paragraphs.

obviously i'd rather not loop through each line comparing 5000 entries ;)

does anyone have any experience with this or know how wikipedia do it?

many thanks

 

jamie

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4392861 posted 5:55 pm on Dec 1, 2011 (gmt 0)

well, i thought why not try it anyway... and it only takes microseconds to parse the 5000 names and preg_match:


foreach ($arr_names as $ID => $name)
{
if (preg_match('/(' . $name . ')\b/i', $_POST['str'], $matches))
{
$m[] = $matches[1];
}
}


(please don't try this with a wikipedia database of 10 million terms lol)

eelixduppy

WebmasterWorld Senior Member eelixduppy us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4392861 posted 6:30 pm on Dec 1, 2011 (gmt 0)

If I had to guess I'm sure Wikipedia must limits the domain of words it selects from by subject matter (or some other meta data) for that particular article. Also, I'm sure they have some type of caching mechanism built in.

You should only need to do this find/replace when either the string changes or the names domain changes; in both cases, you only need to search what has changed (the new names or the new text) if you've already searched the text before. Timestamps on the "names" table should be able to aid in this.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved