Forum Moderators: phranque
I need to extract the (different) description meta tag for a series of pages, keep the original meta tag in place, and put the copied description into the main body of the page, preferably wrapped in <P> tags!
I couldn't see how to use Word to help me either...
:-(
A typical call would be
awk -f myscript.awk *.html > descriptions.html where myscript.awk contains something like
/name="description"/ {
split( $0, terms, "=" );
gsub( "^\"", "", terms[3] );
gsub( "\">$", "", terms[3] );
printf( "\nDescription from %s:\n", FILENAME );
printf( "%s\n\n", terms[3] );
}
So what does this do? The first line tells us, to only search for lines in your text html files that contain the string 'name="description"'. Then this line is split in three parts with the '=' as separator.
The first gsub strips the starting " from the line and the second one strips the "> from the end. So terms[3] now contains the clean text.
The two printf statements output the filename, followed by the description text. The printf statement in AWK uses normal C formatting codes (if this language is familiar to you) so you can output all kinds of text, including HTML codes etc.
I really do not know any simple to use program onder modern graphical operating systems which comes even close to the power of ancient *nix scripting languages.
Normal editors do not show them on the screen. AWK doesn't know about this special file coding and prints an invalid char in expression message instead.
If you save the awk script file in ANSI format with your editor this problem should disappear.
I really do not know any simple to use program onder modern graphical operating systems which comes even close to the power of ancient *nix scripting languages.
You're right, and often we forget that when we get so used to having a mouse and buttons to click on.
I haven't used AWK in many years, your post reminded me just how damn useful it can be.
TJ