Welcome to WebmasterWorld Guest from 23.20.223.88

Forum Moderators: phranque

Message Too Old, No Replies

Editing a large amount of files at once

How to do it?

     
5:19 am on May 27, 2004 (gmt 0)

10+ Year Member



I have about 20,000 HTML files on my computer, but they just have raw text in them, no HTML formatting.

So my question is:

Is there a way to add the <html> tag's etc. to the beginning and end of each file? in an automated way?

Cause, you can imagine, opening each of the 20,000 files individually and adding the code, might take a while. :)

12:09 pm on May 27, 2004 (gmt 0)

10+ Year Member



text pad, very easy to use and simple to insert them
12:26 pm on May 27, 2004 (gmt 0)

10+ Year Member



I haven't tried to do anything on that scale, but there are any number of text editors out there that you could try that might do the trick - ones that open up 'projects' or 'collections' of files automatically. They range from freeware to dollarware. A simple G search [google.com] should get you started. You may have to look at some patterns in the files, or perhaps bone up on regular expressions, to get the tags at the beginning and end tho'.

Good luck.

12:36 pm on May 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dreamweaver has the ability to do a find and replace in a whole folder. Don't know if you could use that in this case, but you might, so I am posting it.
12:42 pm on May 27, 2004 (gmt 0)

10+ Year Member



ultraedit's quite handy at find and replaces in folders
1:34 pm on May 27, 2004 (gmt 0)

10+ Year Member



Just throwing some thoughts out.
You could use SSI server side includes or php includes template to read the text files.
If you go the edit/append text method make a backup of all the files, you'll need it. :)

Another method is to use the copy command in DOS to append a header and footer file to your text files via a batch file.

Google for how to's or syntax.

Anyway you slice it you have a lot of work, the time it will take to collate that many pages into a useful index. Unless you consider importing all those text files into a database.

2:04 pm on May 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I posted some vb code that did something similar, where I had to plug in include files which included the <html> tags...
[webmasterworld.com...]
2:06 pm on May 27, 2004 (gmt 0)

WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



There's also a few threads in the WYSIWYG forum about it, most of the "good" text editors have some sort of regex to do the job.
2:32 pm on May 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ever thought about dumping the text into a database? You could separate the pages with a comma, make an Access dbase of them, create a page template, and have an ASP page suck the text in. You could put the text between CSS tags to format it. Our programmer does something similar for us on a regular basis on a scale of 400-500 pages a time.

Just a thought

6:46 pm on May 27, 2004 (gmt 0)

10+ Year Member



Thanks for all the replies, just woke up. :)

I'm surprised there's no real way to do this!

You'd think there'd be some programs out there that can "add to beginning" and "add to end" of a large amount of files.

The only problem with using the search and replace idea for this, is that all the text in the files are different, there's no 1 line of text that's really the same, so that wouldn't work.

If there was some word that was the same at the beginning and end of each file, I could replace the word with the html tags and what not, then re-add the word, and that would be problem solved, but unfortunately there isn't. :(

So I'm not really sure what to do now... Is there any program that just add's, rather than needing something to search and replace with?

7:05 pm on May 27, 2004 (gmt 0)

WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member



You need to make 2 files begin.txt and end.txt. Have a batch file create a new file dump begin.txt into it then have your orginal.txt file appended to the new file and then append the end.txt to that. Repeat. You could have a perl program do it real easy. I will send you a sticky of a guy that can do it real quick.
7:08 pm on May 27, 2004 (gmt 0)

WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You could do something like this in PHP

preg_replace("'$'ms","whatever you want added",$file);

Which would take the end of the last line and replace it with "whatever you want added", though you could probably do this without PHP if you don't have it installed.

'course you might want to make a backup copy before you go testing

7:28 pm on May 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you're on a Unix box you could try awk. It's not pretty though.
7:44 pm on May 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1. Write a batch file that gets a list of the files.

(something like dir>filelist.txt)

2. Write another that modifies the files.

Good resource: [robvanderwoude.com...]

7:58 pm on May 27, 2004 (gmt 0)

10+ Year Member



Unfortunately, I'm not too familiar with PHP, don't have Unix, and don't even really know what a batch file is... lol

I guess I was hoping there was a program out there that could do this.

Thanks though. :)

Guess I should also start doing some reading...

8:43 pm on May 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One last try...

Do a search for "BK ReplaceEm". It's been some time since I looked at it, but I do know that it can deal with multiple files.

And it's free!

9:39 pm on May 27, 2004 (gmt 0)

10+ Year Member



OK, I just found a way to add the <html> tag's etc. to the BEGINNING of each file, but not the end...

The files still seem to work without the closing </html> and other tag's though...

So, how important are closing tags?

I know it's not a good idea, to not close your tags, but will it still work? Will search engines, all versions of browsers etc. care?

If it's not a big deal, I'll just add 'em to the beginning.

The only tags that wouldn't be closed are:

</body>
</html>

P.S. Thanks for the link photon, I'll check it out.

9:57 pm on May 27, 2004 (gmt 0)

10+ Year Member




If you're on a Unix box you could try awk. It's not pretty though.

You can use "cygwin" to use this program (and many other common unix ones) under windows. Incedentally there is also a unix tool called sed (Stream EDitor) which performs similar tasks as awk and is often mentioned in similar contexts (its pretty equally "not pretty"). Sed is generally used for strings and files that lack orginization, awk is generally used for more orginized data, but they are both tools robust enough to allow adaption for many common tasks. I would assume that you could do something similar within dos but don't know for certain.

The Advanced Bash Scripting Guide [tldp.org] has a chapter [tldp.org] dealing specifically with these two tools and many examples that would guide you in the right direction.

That said in a unix shell you could avoid both these tools with something like 'echo "<html>" `cat $old_file` "</html>" > $new_file' where old_file and new_file are set by a suitable for loop - this would save you the considerable trouble of learning regular expressions or purposely creating bad html files.

10:13 pm on May 27, 2004 (gmt 0)

10+ Year Member



for unix
find . -name '*.html' -exec perl -pi -e 's/<body>/<html>\n<body>/g' {} \;
This recursivly searches for all files named *.txt then replaces the string <body> with <html><cr><body>. In order for this to work you must find a unique string at the beginning of the file because perl is going to do a search/replace for every line in your file.

For windows I recommend downloading ultraedit then go to 'Search' -> 'Replace in File' and replace <body> with "<html>^p<body>"

For closing tags.
unix: find . -name "*.txt" -exec perl -pi -e 's/</body>/<body>\n</html>/g' {} \;
ultraedit: replace </body> with "</body>^p</html>"

A more efficient way probably exists in unix by using sed/awk but find & perl will get the job done.

10:21 pm on May 27, 2004 (gmt 0)

10+ Year Member



TextPipe Pro is a Windows application that'll do what you want, and there are time-limited trials available so you can have a play before buying. Lots of presets, but regular expressions based.

The Regex Coach is a free app which is a fantastic regular expressions sandbox, allowing you to learn how to do regular expression matching without breaking anything.

Take a look at both of these, find a ten minute basic regular expressions tutorial on the web and you'll have the job done very quickly indeed.

11:00 pm on May 27, 2004 (gmt 0)

10+ Year Member



If you're using a Mac, BBEdit is pretty good at doing stuff to a directory of files. Saved me hours of work
4:01 am on May 28, 2004 (gmt 0)

WebmasterWorld Administrator anallawalla is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Look up a batch file tutorial. They are MS-DOS commands in a text file saved with a .bat extension.

This would be my approach:

First use one of the editors mentioned to add <p>before each para and </p>after it. I'd search for two paragraph markers to do it automatically.

This will tag all contents as paragraph text.

Create header and footer files containing the HTML for top and bottom.

Next use batch files to concatenate to a new extension. I can't be bothered to look it up, but it is conceptually:

For * in folder foo, cat header.txt+*+footer.txt > newfile.htm

You can almost slice bread with batch files, more so in Unix.

I found some examples to illustrate batch files, not necessarily the perfect solution:

[fireflysoftware.com...]
[lc.yi.org...]

11:05 am on May 28, 2004 (gmt 0)

10+ Year Member



Under Windows you don't need to resort to batch files. You can just use command line to do what you want. For example:

for %a in (*.html) do copy header.txt + %a + footer.txt out\%a

What this will do is for each file in the current directory it will add the header.txt and footer.txt to the file and then put the result in a subdirectory called out. A directory call 'out' must already exist for this to work.

The 'for' command seaches the local directory for files with an extension '.html'. Then calls the copy command and substitutes %a with the file name of the current file being processed.

11:45 am on May 28, 2004 (gmt 0)



try dreamweaver

[netvedam.com ]

12:02 pm on May 28, 2004 (gmt 0)

10+ Year Member



wdsriram, would you mind stop posting one line messages with your website appended at the bottom?
1:05 pm on May 28, 2004 (gmt 0)

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



oh and also
[quote]http://www.webmasterworld.com/red.cgi?f=100&d=11&url=http://www.netvedam.com[/quote}
multi thread url drop spamming ...

my formatting error is ( for once ) intentional : )

admins ...please?...

 

Featured Threads

Hot Threads This Week

Hot Threads This Month