I have 100s of full html pages but want to extract only the content, which is clearly marked with comments ( ie. <!-- Content begins/ends here --!> ). Rather than just cutting and pasting into separate files, how would you approach it? Is there some console cleverness that can be used?
shell, perl or some other form of server-side scripting would be ideal here. Loop through the files in the directory, locate the string in between the comments and write them out to a new directory/files.