Forum Moderators: coopster
The script words on a foreach run where each page runs the loop with the words and all the noise words should be snuffed out in this loop. when theres only 1 page (therefore the loop is only run once), it seems fine and the noise words are taken out and so on. As soon as theres more than one, I get this:
Warning: preg_replace(): Unknown modifier 'a' in XXX
So then i go about escaping all the 'a's but that's no good because now the noise words around found out and removed!
Am i doing something wrong? I can attach the code but it's a bit long!
[edited by: coopster at 7:49 pm (utc) on Sep. 26, 2005]
[edit reason] no email sigs please and thanks :-) [/edit]
// get rid of noise words
foreach ($noisewords as $k => $v) {
$noisewords[$k] = "/ ".strtolower($v)." /i";
}
foreach ($words as $kk => $v) {
$words[$kk] = " ".strtolower($v)." ";
}
// do find and replace
foreach ($words as $kkk => $v) {
$words[$kkk] = preg_replace($noisewords,"",$v);
}
and here is the noise words list (feel free to copy and use!):
<?php
// noise words
$noisewords = array(
"about","after","all","also","an","and","another","any","are","as","at","be","because","been","before","being",
"between","both","but","by","came","can","come","could","did","do","each","for","from","get","got","has","had","he",
"have","her","here","him","himself","his","how","if","in","into","is","it","like","make","many","me","might","more",
"most","much","must","my","never","now","of","on","only","or","other","our","out","over","said","same","see","should",
"since","some","still","such","take","than","that","the","their","them","then","there","these","they","this","those",
"through","to","too","under","up","very","was","way","we","well","were","what","where","which","while","who","with",
"would","you","your","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
"0","1","2","3","4","5","6","7","8","9","‍"," "," "," ","`","´","˜",
"\^","¯","‾","¨","¨","¸","_","­","-","–","—",
";",":","!","¡","\?","¿","\.","…","·","'","‘","’","‚","‹",
"›",""","“","”","„","«","»","\(","\)","\[","\]","\{","\}","§","¶",
"©","®","@","\*","\/","⁄"","\\","&","#","%","‰","†","‡","•","′",
"″","ˆ","°","←","→","↑","↓","↔","↵","←","↑","→","↓",
"↔","∀","∂","∃","∅","∇","∈","∉","∋","∏","∑","+","±",
"÷","×","¬","\¦","¦","~","−","∗","√","∝","∞","∠","∧",
"∨","∩","∪","∫","∴","∼","≅","≈","≡","≤","≥","⊂","⊄","⊃",
"⊆","⊇","⊕","⊗","⊥","⋅","◊","♠","♣","♥","♦","¤","¢",
"\$","£","¥","€","℘","¹","½","¼","²","³","¾","ª",
"á","á","à","à","à","â","â","å","å","ä","ä",
"ã","ã","æ","&aElig;",
"Ç","ç","ç","ð","Ð",
"É","é","È","è","Ê","ê","ë","Ë",
"ƒ","ƒ","ℑ","í",
"Í","Ì","ì","î","Î","Ï","ï",
"Ñ","ñ","º",
"Ó","ó","ò","Ò","ô","Ô","Ö","ö","õ","Õ","œ","Œ",
"où","ø","Ø","qu’",
"ℜ","Š","š","ß","™",
"ú","Ú","Ù","ù","û","Û","ü","Ü"," ",
"Ý","ý","Ÿ","ÿ","þ","Þ","α","α","β","Β","γ","Γ",
"δ","Δ","ε","Ε","Ζ","ζ","Η","η","Θ","θ","ι","Ι",
"κ","Κ","λ","Λ","μ","Μ","µ","ν","Ν","Ξ","ξ","Ο","ο",
"Π","π","Ρ","ρ","σ","Σ","ς","Τ","τ","υ","Υ","Φ","φ",
"χ","Χ","Ψ","ψ","ω","Ω","ℵ"
);
?>
Sorry its a bit long!
BTW, you should try to use str_replace when you don't really need regular expressions, i.e. in simple replace operations as required here. It's faster and less demanding for the server.
BTW 2, this way HTML entities will not be stripped from words. So arrivé will remain as it is. (I get the impression that that is not what you want).
welcome
to
firestarter
media
...
and
offline
contactfirestarter
(downsized!)
-- END --
If you have a look at the foreachs before the str_replace:
foreach ($noisewords as $k => $v) {
$noisewords[$k] = " ".strtolower($v)." ";
}
foreach ($words as $kk => $v) {
$words[$kk] = " ".strtolower($v)." ";
}
I've added spaces (ie " ".strtolower($v)." ";) so that it will only replace full, standalone words and not get rid of 'and' in 'random'.
Does that make sense? but it's still not working! :(
[edited by: ahmedtheking at 9:03 am (utc) on Sep. 27, 2005]
131
70
Indexed http://www.example.com/main.php?goto=index
259
259
Indexed http://www.example.com/main.php?goto=ab_index
167
167
Indexed http://www.example.com/main.php?goto=cn_index
key:
1st number is the amount of words
2nd number is the amount of words after the noise words have been taken away
URL
If i change the url to a different one, only the first has the words replaced!
[edited by: coopster at 1:04 am (utc) on Oct. 4, 2005]
[edit reason] generalized ulr per TOS [webmasterworld.com] [/edit]