Forum Moderators: phranque
So my question is:
Is there a way to add the <html> tag's etc. to the beginning and end of each file? in an automated way?
Cause, you can imagine, opening each of the 20,000 files individually and adding the code, might take a while. :)
Good luck.
Another method is to use the copy command in DOS to append a header and footer file to your text files via a batch file.
Google for how to's or syntax.
Anyway you slice it you have a lot of work, the time it will take to collate that many pages into a useful index. Unless you consider importing all those text files into a database.
Just a thought
I'm surprised there's no real way to do this!
You'd think there'd be some programs out there that can "add to beginning" and "add to end" of a large amount of files.
The only problem with using the search and replace idea for this, is that all the text in the files are different, there's no 1 line of text that's really the same, so that wouldn't work.
If there was some word that was the same at the beginning and end of each file, I could replace the word with the html tags and what not, then re-add the word, and that would be problem solved, but unfortunately there isn't. :(
So I'm not really sure what to do now... Is there any program that just add's, rather than needing something to search and replace with?
preg_replace("'$'ms","whatever you want added",$file);
Which would take the end of the last line and replace it with "whatever you want added", though you could probably do this without PHP if you don't have it installed.
'course you might want to make a backup copy before you go testing
(something like dir>filelist.txt)
2. Write another that modifies the files.
Good resource: [robvanderwoude.com...]
The files still seem to work without the closing </html> and other tag's though...
So, how important are closing tags?
I know it's not a good idea, to not close your tags, but will it still work? Will search engines, all versions of browsers etc. care?
If it's not a big deal, I'll just add 'em to the beginning.
The only tags that wouldn't be closed are:
</body>
</html>
P.S. Thanks for the link photon, I'll check it out.
If you're on a Unix box you could try awk. It's not pretty though.
You can use "cygwin" to use this program (and many other common unix ones) under windows. Incedentally there is also a unix tool called sed (Stream EDitor) which performs similar tasks as awk and is often mentioned in similar contexts (its pretty equally "not pretty"). Sed is generally used for strings and files that lack orginization, awk is generally used for more orginized data, but they are both tools robust enough to allow adaption for many common tasks. I would assume that you could do something similar within dos but don't know for certain.
The Advanced Bash Scripting Guide [tldp.org] has a chapter [tldp.org] dealing specifically with these two tools and many examples that would guide you in the right direction.
That said in a unix shell you could avoid both these tools with something like 'echo "<html>" `cat $old_file` "</html>" > $new_file' where old_file and new_file are set by a suitable for loop - this would save you the considerable trouble of learning regular expressions or purposely creating bad html files.
For windows I recommend downloading ultraedit then go to 'Search' -> 'Replace in File' and replace <body> with "<html>^p<body>"
For closing tags.
unix: find . -name "*.txt" -exec perl -pi -e 's/</body>/<body>\n</html>/g' {} \;
ultraedit: replace </body> with "</body>^p</html>"
A more efficient way probably exists in unix by using sed/awk but find & perl will get the job done.
The Regex Coach is a free app which is a fantastic regular expressions sandbox, allowing you to learn how to do regular expression matching without breaking anything.
Take a look at both of these, find a ten minute basic regular expressions tutorial on the web and you'll have the job done very quickly indeed.
This would be my approach:
First use one of the editors mentioned to add <p>before each para and </p>after it. I'd search for two paragraph markers to do it automatically.
This will tag all contents as paragraph text.
Create header and footer files containing the HTML for top and bottom.
Next use batch files to concatenate to a new extension. I can't be bothered to look it up, but it is conceptually:
For * in folder foo, cat header.txt+*+footer.txt > newfile.htm
You can almost slice bread with batch files, more so in Unix.
I found some examples to illustrate batch files, not necessarily the perfect solution:
[fireflysoftware.com...]
[lc.yi.org...]
for %a in (*.html) do copy header.txt + %a + footer.txt out\%a
The 'for' command seaches the local directory for files with an extension '.html'. Then calls the copy command and substitutes %a with the file name of the current file being processed.
[netvedam.com ]