Welcome to WebmasterWorld Guest from 54.196.208.6

Forum Moderators: open

Message Too Old, No Replies

Selecting and extracting text between two defined markers

what text processing tool/script?

     
9:45 am on Sep 1, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:June 30, 2005
posts:149
votes: 0


I have 100s of full html pages but want to extract only the content, which is clearly marked with comments ( ie. <!-- Content begins/ends here --!> ). Rather than just cutting and pasting into separate files, how would you approach it? Is there some console cleverness that can be used?
5:29 pm on Sept 1, 2006 (gmt 0)

Administrator

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 31, 2003
posts:12533
votes: 0


shell, perl or some other form of server-side scripting would be ideal here. Loop through the files in the directory, locate the string in between the comments and write them out to a new directory/files.
8:17 pm on Sept 1, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:June 30, 2005
posts:149
votes: 0


Hey Coopster - both sed and awk do the trick. Just not sure how to apply the correct phrasing to a whole directory. Anyhow, trial/error etc :)