Forum Moderators: coopster
Something like this should be close:
$WordsToFind='student^school^teacher^classroom^class room';
$WordsToFind=explode('^',$WordsToFind);
for($i=0;$i<count($WordsToFind);$i++) {
if(stripos($text,$WordsToFind[$i])) { $category='education'; break; }
}
BTW: Welcome to WebmasterWorld!
EDITED: Forgot stripos() is even less memory intensive than stristr() for a min.
It's really going to depend on context. If these words come from user input - even if it's input one time by an administrator - the potential always exists for something you didn't expect, Class Room or classroom, for example. Then there's also the potential for changes and additions. Regexps will provide the maximum flexibility.
I suggest a little of both:
// Collect an array, doesn't matter if it's hard coded at
// the top of the script, from a database, or user input
// note * means zero or more, matches on student or students
$category=NULL;
$edu = Array(
'students*',
'schools*',
'teachers*',
'class\s*room\s*,
'uni',
'university*i*e*s*'
);
foreach ($edu as $wd) {
if(preg_match('/$wd/i',$text)) {
$category='education';
break;
}
}
if ($category==NULL) { echo "Whoops no category found"; }
Please note the 'i' between str and pos in the code I suggested.
And, of course, this might work too:
$WordsToFind='student^school^teacher^classroom^class room';
$WordsToFind=explode('^',$WordsToFind);
for($i=0;$i<count($WordsToFind);$i++) {
if(strpos(strtolower($text),$WordsToFind[$i])) { $category='education'; break; }
}
* It's also unnecessary to check for the plural using stripos, because we are not checking for \sword\s we are checking to see if the word (string) is contained in the string, so, if the string contains schools stripos will return a 'true' match for the 'needle' school.
** According to [phpbench.com...] a foreach loop is ridiculous compared to using either a for or a while statement.
Also:
From the documentation:
Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.[us3.php.net...]
From the documentation:
Note: If you only want to determine if a particular needle occurs within haystack , use the faster and less memory intensive function strpos() instead.[us3.php.net...]
One final note:
$edu = Array(
'students*',
'schools*',
'teachers*',
'class\s*room\s*,
'uni',/ * The preceding 'uni' will match union, uniform, unicode, punitive, and anything else containing uni, including universities or university, because you do not set a boundary on the match anywhere within your pattern to match or your regex. It's completely un-anchored and will perform *very* unexpectedly. */
'university*i*e*s*'
);foreach ($edu as $wd) {
if(preg_match('/$wd/i',$text)) {
$category='education';
break;
}
}
With this new approach, I now have a whole page of category filters, and I'm wondering if there's another step I should take.
$needle=aray("student","school","teacher","classroom","class room");
for($i=0;$i<count($needle);$i++) if(stripos($haystack,$needle[$i])){ $category.="159,";break; }
$needle=aray("marriage","civil union","domestic partner","wedding");
for($i=0;$i<count($needle);$i++) if(stripos($haystack,$needle[$i])){ $category.="59,";break; }
$needle=aray("military","army","navy","air force","marine");
for($i=0;$i<count($needle);$i++) if(stripos($haystack,$needle[$i])){ $category.="256,";break; }
$needle=aray("vote","Rep.","Sen.","republican","democrat","elected","mayor","government","political");
for($i=0;$i<count($needle);$i++) if(stripos($haystack,$needle[$i])){ $category.="347,";break; }
I looked at this about noon and couldn't think of anything right away. I looked at it about an hour ago and couldn't think of anything right away... The only thing I would consider (and it should be benchmarked) is:
ADDED /EDITED:
$newHaystack=strtolower($haystack);
$newHaystack=explode(" ",$newHaystack);
$newHaystack=array_unique($newHastack);
$newHaystack=implode(" ",$newHaystack);
ADDED / EDITED:
strtolower 1st and switch from stripos to strpos...
You'll need to keep all your $needles in lowercase, or strtolower them too, but IMO by running a single case, you should compare faster overall since you are comparing so many times. if you are matching A or a there's two possibilities for every a... switch everything to the same case and there's only one. It cuts the possible matches down.
$needle=aray("military","army","navy","air force","marine");
for($i=0;$i<count($needle);$i++) if(strpos($newHaystack,$needle[$i])){ $category.="256,";break; }
IMO: It'll really depend on which is faster: stripos or array_unique and your exact application. The length of your $haystack will probably be a factor. The reason I think it might be an option is as soon as a word begins with a different letter or 'doesn't match', array_unique should break from matching that piece of the array and move on to the next one, where stripos is going to compare every letter for a match to every needle... IOW array_unique will break off matching an entire word as soon as a match is not found, while stripos will continue checking the entire word and since you are running stripos multiple times, eliminating duplicates may be to your advantage.
If you were only running stripos one-time-through a text-block I wouldn't worry about it, but the elimination of duplicates to be checked might show you some gains with the number of times you're running stripos on any given string.
* Make sure you keep the original intact, so you can put it in the appropriate place. ;)
EG if air force is a match 3/4 of the way through, but air is found by itself in the first 1/4, you could lose the word combination air force, since my guess is the first air will be stored and the word force will not appear next to it, but rather 3/4 the way through.
Personally, I might try to think of a way to overcome this, maybe by using strpos for the two word combinations before the haystack gets the duplicates removed and then if there is not a 2 word match, eliminating duplicates and checking for the single words.
Sorry I didn't think of this earlier.
I still think it might be faster to use strtolower and strpos, even if you don't end up pulling it apart and removing duplicates, just because of the number of comparisons you will be making.
Also, make sure you put the 'most contributed to' categories at the top, so you don't run as many 'no match' cycles.