homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

Need to edit the HTML of about 20,000 files - How to do it?
Surely there must be an easier way than 1 at a time! :)

 9:05 pm on Mar 22, 2004 (gmt 0)

I've accumulated about 20,000 raw guitar tab files (hope that's not too site specific) and now I need to turn them all into HTML files that include my logo, links, title tags and what not.

Are there any tools that help you do this?

I've heard of something called "Search and Replace" which can mass replace parts of files, but I don't need to replace anything, just add stuff.

Right now these files just contain the tablature and nothing else, so I'd need to include the <html> tag's for every file and what not.

Also, they are all .crd and .tab files and will also need to eventually be changed to .html... Are there any ways to "mass change file extensions"?

How would you guys go about doing these things? Any ideas?



 9:22 pm on Mar 22, 2004 (gmt 0)

hehe.. I have just hte thing for you.. sticky on the way as soon as I can find the URL


 9:26 pm on Mar 22, 2004 (gmt 0)

I was looking for something similar. I need to append text to the end of a bunch of files. I use BK Replace em (or whatever its called) but because I have no string that is present in all the files I don't think it will do in this instance.

Please sticky me that url too if it might help me deejay :)


 9:27 pm on Mar 22, 2004 (gmt 0)

that really sounds like something more condusive to a database driven system where you inject the tabs into html templates...


 9:30 pm on Mar 22, 2004 (gmt 0)

ha. so excited about the first question I forgot to read the rest of your post. Got a solution for the file extensions too. comin atcha.


 9:51 pm on Mar 22, 2004 (gmt 0)

database driven system where you inject the tabs into html templates...

This sounds intriguing. I've never taken on a project like this, have always had just small 20 page or so websites.

I have no idea really what a database driven system is, are there any sites out there that i could read and might help?

Does doing it that way require a special type of webhosting?


 10:05 pm on Mar 22, 2004 (gmt 0)

deejay, the only problem with the search and replace utility is that it needs something to replace, but with these tabs, there is only stuff to add (the html tags and stuff.) So there's no way to really add it, if I'm not mistaken.


 10:22 pm on Mar 22, 2004 (gmt 0)

You can still use search and replace, it's just a matter of unique code.

Let's say you want to add
to the same place in every document.

In each document, seek out a consistent phrase of content, string of code, etc., that appears only in the same area of the code/page where you want to insert the new line. Let's say that currently existing phrase, on each page, is
yadda yadda yadda

You can run a search and replace on
yadda yadda yadda

and replace it with
yadda yadda yadda

You take a bit of unique content/code, and replace it with itself and with whatever you want to add.

Tip - if you want to go this route, first figure out exactly how many replaces need to be made. Eg, if you have 20,000 pages and want the change made 20,000 times, then you should first take the unique bit of code and find and replace it with itself. If you come up with 20,000 results, you should be fine. Test a group of sample pages before you run the whole site.

A bit of a workaround, but it's done the job for me loads of times.


 10:27 pm on Mar 22, 2004 (gmt 0)

I would leave the files as they are and use SSI to include them and the required html code into the .html files which I would then auto-generate.

It should take a couple of hours at most to do the entire site.


 10:28 pm on Mar 22, 2004 (gmt 0)

antsaint, beautiful idea, but these guitar tabs are all different, there's no one piece of text that's the same in each guitar tab... Wish there was... :)


 10:31 pm on Mar 22, 2004 (gmt 0)

quotations, that sounds more actually like the route I want to take now, only thing is I don't know how to do it, lol...

Is that basically the same thing oilman said?

Is that a database driven system?

I WILL figure out how to do this... ;)


 11:07 pm on Mar 22, 2004 (gmt 0)

Not really a database per se, more like a listing of files names.

Each html file would look something like this:

<!--#include file="1.txt"-->widget bit of text - <!--#include file="2.txt"-->bit of widget text - <!--#include file="3.txt"-->widgets, text, bit, of, widget, <!--#include file="4.txt"--><h1>Widget!</h1><h2>Bit of text about Widgets</h2><!--#include file="5.txt"--><!--#include file="content_file_1.tab"--><!--#include file="6.txt"-->Some text which is unique to the concept and theme of the particular widget tablulature file - <!--#include file="7.txt"-->

file1.txt through file7.txt would be your html code content and content_file_1.tab through content_file_20000.tab would be the names of your tabulature files.


 11:13 pm on Mar 22, 2004 (gmt 0)

Very interesting, let me see if I have this correct...

Then you would have to call all the files .shtml or. asp right? Because SSI must have one of those extensions?

(I forgot, I do know a little about SSI, lol... Use it to update the news on my site, but just forgot it was called SSI...)

Or, you just do that to GENERATE the regular html pages?


 12:10 pm on Mar 23, 2004 (gmt 0)

How about perl?



 12:44 pm on Mar 23, 2004 (gmt 0)


are you going to post the url here?



 10:20 pm on Mar 24, 2004 (gmt 0)

quotations, I figured out how to leave the tabs as they are and inject them into the regular html files, using SSI.

BUT, this would still take making each page 1 at a time... and adding the <#include virtual sample_tab_1.tab> to 20,000 different pages.

Or what did you mean by auto-generate?

Hopefully it's what I'm looking for! :)


 10:57 pm on Mar 24, 2004 (gmt 0)

Ok, here's how I would break it down. I'm assuming you're working on a unix server, have access to mod_rewrite and are almost as lazy as I am.

Upload all your tab files to your server, in a folder you call /tabs/

You could also plug everything into a database, but that will take more work and server resources.

Now use mod_rewrite to send all requests to a file /tabs/nameOfSong.html to a PHP script, with nameOfSong as a parameter

In the PHP script, format the data, add the text you need before or after, and include the file that was requested.

There, a very lazy solution. Does that work, or is it still missing a few things?


 2:51 pm on Mar 25, 2004 (gmt 0)

Then you would have to call all the files .shtml or. asp right? Because SSI must have one of those extensions?

Actually, you can use includes in .html files as well. You just have to tell the server to look for includes in these files.

Just add the following line to an .htaccess file (provided, of course, that the host allows you to use .htaccess files):

AddHandler server-parsed .html

(Possible downside -- the server will now check every .html file for includes. That's fine if most of these pages have them, but if most do not it's an unnecessary load on the server and will slow down the serving of your pages.)


 11:20 pm on Mar 25, 2004 (gmt 0)

bruhaha, is it possible to direct the server to parse the files in certain directories only, or must it affect the whole site?

add> or if the instruction is to parse .html files, it won't parse .htm files?


 11:40 pm on Mar 25, 2004 (gmt 0)


if you can't find a solution I may be able to do it for you. I have home grown code that I use to this type thing all the time.

any chance we could barter?

sticky me if you are interested.


 3:11 pm on Mar 26, 2004 (gmt 0)


You can certainly have it parse .htm files the same way. You just have to specify it, by substituting ".htm" for ".html" in the code. If you want to do both at the same time (though I don't recommend alternating between .htm and .html on the same site!) you should be able to do so by simply expanding the line to:

AddHandler server-parsed .html .htm

is it possible to direct the server to parse the files in certain directories only, or must it affect the whole site?

Yes, by placing the .htaccess file with these instructions in the proper place.

Here's how it works:
1) an .htaccess file affects the directory it resides in and any nested subdirectories, and
2) an .htaccess file further down in the directory structure overrides .htaccess files higher up (rather like the "cascading" of style sheets --the more "local" directions override "global" ones)

Thus, by adding an .htaccess in a particular directory, you could tell the server to look for includes only in the .htm(l) files found in that directory.


 7:10 am on Mar 27, 2004 (gmt 0)

bruhaha, what can I say?

Succinct and very helpful.

Thank you!

Global Options:
 top home search open messages active posts  

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved