homepage Welcome to WebmasterWorld Guest from 54.204.67.26
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

    
Editing a large amount of files at once
How to do it?
PFOnline




msg:379903
 5:19 am on May 27, 2004 (gmt 0)

I have about 20,000 HTML files on my computer, but they just have raw text in them, no HTML formatting.

So my question is:

Is there a way to add the <html> tag's etc. to the beginning and end of each file? in an automated way?

Cause, you can imagine, opening each of the 20,000 files individually and adding the code, might take a while. :)

 

Bigozzie




msg:379904
 12:09 pm on May 27, 2004 (gmt 0)

text pad, very easy to use and simple to insert them

2oddSox




msg:379905
 12:26 pm on May 27, 2004 (gmt 0)

I haven't tried to do anything on that scale, but there are any number of text editors out there that you could try that might do the trick - ones that open up 'projects' or 'collections' of files automatically. They range from freeware to dollarware. A simple G search [google.com] should get you started. You may have to look at some patterns in the files, or perhaps bone up on regular expressions, to get the tags at the beginning and end tho'.

Good luck.

hannamyluv




msg:379906
 12:36 pm on May 27, 2004 (gmt 0)

Dreamweaver has the ability to do a find and replace in a whole folder. Don't know if you could use that in this case, but you might, so I am posting it.

mcavill




msg:379907
 12:42 pm on May 27, 2004 (gmt 0)

ultraedit's quite handy at find and replaces in folders

grifter51




msg:379908
 1:34 pm on May 27, 2004 (gmt 0)

Just throwing some thoughts out.
You could use SSI server side includes or php includes template to read the text files.
If you go the edit/append text method make a backup of all the files, you'll need it. :)

Another method is to use the copy command in DOS to append a header and footer file to your text files via a batch file.

Google for how to's or syntax.

Anyway you slice it you have a lot of work, the time it will take to collate that many pages into a useful index. Unless you consider importing all those text files into a database.

Easy_Coder




msg:379909
 2:04 pm on May 27, 2004 (gmt 0)

I posted some vb code that did something similar, where I had to plug in include files which included the <html> tags...
[webmasterworld.com...]

brotherhood of LAN




msg:379910
 2:06 pm on May 27, 2004 (gmt 0)

There's also a few threads in the WYSIWYG forum about it, most of the "good" text editors have some sort of regex to do the job.

SEOMike




msg:379911
 2:32 pm on May 27, 2004 (gmt 0)

Ever thought about dumping the text into a database? You could separate the pages with a comma, make an Access dbase of them, create a page template, and have an ASP page suck the text in. You could put the text between CSS tags to format it. Our programmer does something similar for us on a regular basis on a scale of 400-500 pages a time.

Just a thought

PFOnline




msg:379912
 6:46 pm on May 27, 2004 (gmt 0)

Thanks for all the replies, just woke up. :)

I'm surprised there's no real way to do this!

You'd think there'd be some programs out there that can "add to beginning" and "add to end" of a large amount of files.

The only problem with using the search and replace idea for this, is that all the text in the files are different, there's no 1 line of text that's really the same, so that wouldn't work.

If there was some word that was the same at the beginning and end of each file, I could replace the word with the html tags and what not, then re-add the word, and that would be problem solved, but unfortunately there isn't. :(

So I'm not really sure what to do now... Is there any program that just add's, rather than needing something to search and replace with?

ogletree




msg:379913
 7:05 pm on May 27, 2004 (gmt 0)

You need to make 2 files begin.txt and end.txt. Have a batch file create a new file dump begin.txt into it then have your orginal.txt file appended to the new file and then append the end.txt to that. Repeat. You could have a perl program do it real easy. I will send you a sticky of a guy that can do it real quick.

brotherhood of LAN




msg:379914
 7:08 pm on May 27, 2004 (gmt 0)

You could do something like this in PHP

preg_replace("'$'ms","whatever you want added",$file);

Which would take the end of the last line and replace it with "whatever you want added", though you could probably do this without PHP if you don't have it installed.

'course you might want to make a backup copy before you go testing

photon




msg:379915
 7:28 pm on May 27, 2004 (gmt 0)

If you're on a Unix box you could try awk. It's not pretty though.

HughMungus




msg:379916
 7:44 pm on May 27, 2004 (gmt 0)

1. Write a batch file that gets a list of the files.

(something like dir>filelist.txt)

2. Write another that modifies the files.

Good resource: [robvanderwoude.com...]

PFOnline




msg:379917
 7:58 pm on May 27, 2004 (gmt 0)

Unfortunately, I'm not too familiar with PHP, don't have Unix, and don't even really know what a batch file is... lol

I guess I was hoping there was a program out there that could do this.

Thanks though. :)

Guess I should also start doing some reading...

photon




msg:379918
 8:43 pm on May 27, 2004 (gmt 0)

One last try...

Do a search for "BK ReplaceEm". It's been some time since I looked at it, but I do know that it can deal with multiple files.

And it's free!

PFOnline




msg:379919
 9:39 pm on May 27, 2004 (gmt 0)

OK, I just found a way to add the <html> tag's etc. to the BEGINNING of each file, but not the end...

The files still seem to work without the closing </html> and other tag's though...

So, how important are closing tags?

I know it's not a good idea, to not close your tags, but will it still work? Will search engines, all versions of browsers etc. care?

If it's not a big deal, I'll just add 'em to the beginning.

The only tags that wouldn't be closed are:

</body>
</html>

P.S. Thanks for the link photon, I'll check it out.

nalin




msg:379920
 9:57 pm on May 27, 2004 (gmt 0)


If you're on a Unix box you could try awk. It's not pretty though.

You can use "cygwin" to use this program (and many other common unix ones) under windows. Incedentally there is also a unix tool called sed (Stream EDitor) which performs similar tasks as awk and is often mentioned in similar contexts (its pretty equally "not pretty"). Sed is generally used for strings and files that lack orginization, awk is generally used for more orginized data, but they are both tools robust enough to allow adaption for many common tasks. I would assume that you could do something similar within dos but don't know for certain.

The Advanced Bash Scripting Guide [tldp.org] has a chapter [tldp.org] dealing specifically with these two tools and many examples that would guide you in the right direction.

That said in a unix shell you could avoid both these tools with something like 'echo "<html>" `cat $old_file` "</html>" > $new_file' where old_file and new_file are set by a suitable for loop - this would save you the considerable trouble of learning regular expressions or purposely creating bad html files.

iblaine




msg:379921
 10:13 pm on May 27, 2004 (gmt 0)

for unix
find . -name '*.html' -exec perl -pi -e 's/<body>/<html>\n<body>/g' {} \;
This recursivly searches for all files named *.txt then replaces the string <body> with <html><cr><body>. In order for this to work you must find a unique string at the beginning of the file because perl is going to do a search/replace for every line in your file.

For windows I recommend downloading ultraedit then go to 'Search' -> 'Replace in File' and replace <body> with "<html>^p<body>"

For closing tags.
unix: find . -name "*.txt" -exec perl -pi -e 's/</body>/<body>\n</html>/g' {} \;
ultraedit: replace </body> with "</body>^p</html>"

A more efficient way probably exists in unix by using sed/awk but find & perl will get the job done.

jetboy_70




msg:379922
 10:21 pm on May 27, 2004 (gmt 0)

TextPipe Pro is a Windows application that'll do what you want, and there are time-limited trials available so you can have a play before buying. Lots of presets, but regular expressions based.

The Regex Coach is a free app which is a fantastic regular expressions sandbox, allowing you to learn how to do regular expression matching without breaking anything.

Take a look at both of these, find a ten minute basic regular expressions tutorial on the web and you'll have the job done very quickly indeed.

basenotes




msg:379923
 11:00 pm on May 27, 2004 (gmt 0)

If you're using a Mac, BBEdit is pretty good at doing stuff to a directory of files. Saved me hours of work

anallawalla




msg:379924
 4:01 am on May 28, 2004 (gmt 0)

Look up a batch file tutorial. They are MS-DOS commands in a text file saved with a .bat extension.

This would be my approach:

First use one of the editors mentioned to add <p>before each para and </p>after it. I'd search for two paragraph markers to do it automatically.

This will tag all contents as paragraph text.

Create header and footer files containing the HTML for top and bottom.

Next use batch files to concatenate to a new extension. I can't be bothered to look it up, but it is conceptually:

For * in folder foo, cat header.txt+*+footer.txt > newfile.htm

You can almost slice bread with batch files, more so in Unix.

I found some examples to illustrate batch files, not necessarily the perfect solution:

[fireflysoftware.com...]
[lc.yi.org...]

racer_x




msg:379925
 11:05 am on May 28, 2004 (gmt 0)

Under Windows you don't need to resort to batch files. You can just use command line to do what you want. For example:

for %a in (*.html) do copy header.txt + %a + footer.txt out\%a

What this will do is for each file in the current directory it will add the header.txt and footer.txt to the file and then put the result in a subdirectory called out. A directory call 'out' must already exist for this to work.

The 'for' command seaches the local directory for files with an extension '.html'. Then calls the copy command and substitutes %a with the file name of the current file being processed.

wdsriram




msg:379926
 11:45 am on May 28, 2004 (gmt 0)

try dreamweaver

[netvedam.com ]

2oddSox




msg:379927
 12:02 pm on May 28, 2004 (gmt 0)

wdsriram, would you mind stop posting one line messages with your website appended at the bottom?

Leosghost




msg:379928
 1:05 pm on May 28, 2004 (gmt 0)

oh and also
[quote]http://www.webmasterworld.com/red.cgi?f=100&d=11&url=http://www.netvedam.com[/quote}
multi thread url drop spamming ...

my formatting error is ( for once ) intentional : )

admins ...please?...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved