Forum Moderators: coopster

Message Too Old, No Replies

Linking Dynamically Throughout Site To Dictionary Terms

How to create dynamic links whenever a dictionary term appears in text

         

geckofuel

2:23 pm on Dec 18, 2002 (gmt 0)

10+ Year Member



I've searched and searched on this topic but keep coming up with noise.

Does anyone know how, using PHP and mySQL, one would go about searching the text on a page and dynamically creating a link to a dictionary entry whenever the particular term appears in the page text?

I'm sure there is information out there on the web on this topic but searches keep bringing up information on nasty SEO tactics;-) "Dynamic generation of urls" "generation of urls based keywords"

Any help is appreciated.

jatar_k

4:26 pm on Dec 18, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Do you mean searching for a word while the page is being generated?

Do you mean you have a list of words and you want to link them dynamically?

I need a little more info, I don't exactly understand what you want to do.

lorax

4:39 pm on Dec 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think the better approach would be to create the links at the time the content is created.

geckofuel

5:28 pm on Dec 18, 2002 (gmt 0)

10+ Year Member



I've already got a site with about 500 pages. For example, what if everytime the word "widget" appears on one of these 300 pages I'd like it to link to my definition page for "widget".

I run a educational science website, and we have a database with terms and definitions. We'd like to set up a script that will automatically take the text on these pages, look for any of the terms in the database, and then automatically create hyperlinks to them.

Think of it sort of like an cross-linked multimedia encyclopedia. We've got the text, we've got the terms, how do we automate the task of creating this cross-linked multimedia encyclopedia?

andreasfriedrich

5:37 pm on Dec 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



lorax is quite right when he suggests that this ought to be done when creating the content not every time a UA requests a page. You can run a simple perl script over your existing files that uses regular expressions to find the words and replace them with a link. Use Jeffrey Friedl´s trick to speed up the RE substitution.

Andreas

lorax

5:38 pm on Dec 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ah.. now I understand.

So essentially, you need a way to loop through all of the terms (each has it's own record?) in the definitions table and then look for that term on all of the pages within the site. If a match is found, convert the word to a link to a PHP page with the variable for querying the db for the full definition of that term.

Andreas, I think he just wants to use the script to make the conversion once and save the files with the links in them.

Right?

amoore

5:46 pm on Dec 18, 2002 (gmt 0)

10+ Year Member



If it were up to me, I'd write a mod_perl handler that ran after all of the other handlers. It would look at the page as it were being served (so that it could be generated by SSI, PHP, perl, filesystem, or whatever) and find the words and replace them with links. This really doesn't help you if you don't have mod_perl, though.

I'm not sure how I would do it with a normal scripting language since you don't really know how the pages are being created and it's tough to stack a CGI on the end of arbitrary page generation stuff.

I wrote a similar one to highlight search terms from the referring URL when serving pages. It can be done in other ways, but I liked this way.

geckofuel

2:38 am on Dec 19, 2002 (gmt 0)

10+ Year Member




Could you explain this further?

"Use Jeffrey Friedl´s trick to speed up the RE substitution."

andreasfriedrich

5:00 pm on Dec 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Suppose you had an external style sheet file which looks like this:

h1.aaron { 
color:firebrick;
background:olivedrab;
border-top:1px solid linen;
}

Now this is not valid CSS since it allows only the 16 VGA colors to be specified by name. Netscape´s named colors are not legal values. However, you´d like to use them anyway since they are easier to remember than hex or rgb codes.

To solve this problem, you have a perl script that substitutes the color names with theír respective hex values. In this script we have a hash which maps color names to hex values.

%names2vals = ( 
aliceblue => "#F0F8FF",
antiquewhite => "#FAEBD7",
aqua => "#00FFFF",
aquamarine => "#7FFFD4",
azure => "#F0FFFF",
beige => "#F5F5DC",
bisque => "#FFE4C4",
black => "#000000",
blanchedalmond => "#FFEBCD",
blue => "#0000FF",
blueviolet => "#8A2BE2",
brown => "#A52A2A",
burlywood => "#DEB887",
cadetblue => "#5F9EA0",
chartreuse => "#7FFF00",
chocolate => "#D2691E",
coral => "#FF7F50",
cornflowerblue => "#6495ED",
cornsilk => "#FFF8DC",
crimson => "#DC143C",
cyan => "#00FFFF",
darkblue => "#00008B",
darkcyan => "#008B8B",
darkgoldenrod => "#B8860B",
darkgray => "#A9A9A9",
darkgreen => "#006400",
darkkhaki => "#BDB76B",
darkmagenta => "#8B008B",
darkolivegreen => "#556B2F",
darkorange => "#FF8C00",
darkorchid => "#9932CC",
darkred => "#8B0000",
darksalmon => "#E9967A",
darkseagreen => "#8FBC8F",
darkslateblue => "#483D8B",
darkslategray => "#2F4F4F",
darkturquoise => "#00CED1",
darkviolet => "#9400D3",
deeppink => "#FF1493",
deepskyblue => "#00BFFF",
dimgray => "#696969",
dodgerblue => "#1E90FF",
firebrick => "#B22222",
floralwhite => "#FFFAF0",
forestgreen => "#228B22",
fuchsia => "#FF00FF",
gainsboro => "#DCDCDC",
ghostwhite => "#F8F8FF",
gold => "#FFD700",
goldenrod => "#DAA520",
gray => "#808080",
green => "#008000",
greenyellow => "#ADFF2F",
honeydew => "#F0FFF0",
hotpink => "#FF69B4",
indianred => "#CD5C5C",
indigo => "#4B0082",
ivory => "#FFFFF0",
khaki => "#F0E68C",
lavender => "#E6E6FA",
lavenderblush => "#FFF0F5",
lawngreen => "#7CFC00",
lemonchiffon => "#FFFACD",
lightblue => "#ADD8E6",
lightcoral => "#F08080",
lightcyan => "#E0FFFF",
lightgoldenrodyellow => "#FAFAD2",
lightgreen => "#90EE90",
lightgrey => "#D3D3D3",
lightpink => "#FFB6C1",
lightsalmon => "#FFA07A",
lightseagreen => "#20B2AA",
lightskyblue => "#87CEFA",
lightslategray => "#778899",
lightsteelblue => "#B0C4DE",
lightyellow => "#FFFFE0",
lime => "#00FF00",
limegreen => "#32CD32",
linen => "#FAF0E6",
magenta => "#FF00FF",
maroon => "#800000",
mediumaquamarine => "#66CDAA",
mediumblue => "#0000CD",
mediumorchid => "#BA55D3",
mediumpurple => "#9370DB",
mediumseagreen => "#3CB371",
mediumslateblue => "#7B68EE",
mediumspringgreen => "#00FA9A",
mediumturquoise => "#48D1CC",
mediumvioletred => "#C71585",
midnightblue => "#191970",
mintcream => "#F5FFFA",
mistyrose => "#FFE4E1",
moccasin => "#FFE4B5",
navajowhite => "#FFDEAD",
navy => "#000080",
oldlace => "#FDF5E6",
olive => "#808000",
olivedrab => "#6B8E23",
orange => "#FFA500",
orangered => "#FF4500",
orchid => "#DA70D6",
palegoldenrod => "#EEE8AA",
palegreen => "#98FB98",
paleturquoise => "#AFEEEE",
palevioletred => "#DB7093",
papayawhip => "#FFEFD5",
peachpuff => "#FFDAB9",
peru => "#CD853F",
pink => "#FFC0CB",
plum => "#DDA0DD",
powderblue => "#B0E0E6",
purple => "#800080",
red => "#FF0000",
rosybrown => "#BC8F8F",
royalblue => "#4169E1",
saddlebrown => "#8B4513",
salmon => "#FA8072",
sandybrown => "#F4A460",
seagreen => "#2E8B57",
seashell => "#FFF5EE",
sienna => "#A0522D",
silver => "#C0C0C0",
skyblue => "#87CEEB",
slateblue => "#6A5ACD",
slategray => "#708090",
snow => "#FFFAFA",
springgreen => "#00FF7F",
steelblue => "#4682B4",
tan => "#D2B48C",
teal => "#008080",
thistle => "#D8BFD8",
tomato => "#FF6347",
turquoise => "#40E0D0",
violet => "#EE82EE",
wheat => "#F5DEB3",
white => "#FFFFFF",
whitesmoke => "#F5F5F5",
yellow => "#FFFF00",
yellowgreen => "#9ACD32",
);

To replace the color name you could use this obvious method which loops through all lines and for each line loops through all entries in the %names2vals hash.

while (<>) { 
for $name (keys %names2vals) {
s/\b$name\b/$names2vals{$name}/g;
}
print;
}

However, this approach is really slow. As a solution to this problem Jeffrey Friedl came up with this solution. Create an anonymous subroutine that caches the compiled patterns in the closure it creates. Recipe 6.10 in the Perl Cookbook contains an example using the regex match operator.

Applying this method using the substitution operator you end up with code looking like this. While originally the intent was to cache the compiled patterns to speed up the matching this is not the effect that we are after and which we do not use here.

my $re = join('¦¦', 
map { "s/\\b$_\\b/$names2vals{$_}/g" }
keys %names2vals);
my $subst = eval "sub{$re}";
while (<>) {
&subst;
print;
}

The first line builds a string that looks like this:

s/\bmediumslateblue\b/#7B68EE/g 
¦¦
s/\bhoneydew\b/#F0FFF0/g
¦¦
s/\bgreen\b/#008000/g
¦¦
s/\bdarkred\b/#8B0000/g
¦¦
s/\bviolet\b/#EE82EE/g

In the second line we build an anonymous subroutine using this string. This subroutine is then called for each line. It applies the regular expressions to $_ which contains a single line.

While the former approach took over 3 seconds to process a larger style sheet file the latter script needed only 0.3 seconds.

Hope this helps.

Andreas

Note: The WebmasterWorld posting software deletes spaces preceding the exclamation point "!" character. It also replaces a solid vertical pipe symbol with a broken vertical pipe "¦" symbol. Both of these changes will need to be undone in any code you copy from WebmasterWorld. Make sure to include a space preceding the "!" in mod_rewrite code, and always replace "¦" with a solid vertical pipe.