Page is a not externally linkable
- Code, Content, and Presentation
-- HTML
---- Moving into 2013 from 1999


4serendipity - 8:55 pm on Feb 12, 2013 (gmt 0)


I would suspect that it would be fairly easy to strip out the content. I would think that using a combination of curl and regular expressions the extracted content would be a good place to start.

The difficulty of the automated process would increase with the level of inconsistency in the site's pages. However, I'd suspect that even having to tweak a script a good bit would be preferable to copying and pasting ~900 pages. Also, with the automated approach, you could run the extracted html through tidy to help get outdated markup up to snuff.


Thread source:: http://www.webmasterworld.com/html/4544822.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com