|Need to edit the HTML of about 20,000 files - How to do it?|
Surely there must be an easier way than 1 at a time! :)
I've accumulated about 20,000 raw guitar tab files (hope that's not too site specific) and now I need to turn them all into HTML files that include my logo, links, title tags and what not.
Are there any tools that help you do this?
I've heard of something called "Search and Replace" which can mass replace parts of files, but I don't need to replace anything, just add stuff.
Right now these files just contain the tablature and nothing else, so I'd need to include the <html> tag's for every file and what not.
Also, they are all .crd and .tab files and will also need to eventually be changed to .html... Are there any ways to "mass change file extensions"?
How would you guys go about doing these things? Any ideas?
hehe.. I have just hte thing for you.. sticky on the way as soon as I can find the URL
I was looking for something similar. I need to append text to the end of a bunch of files. I use BK Replace em (or whatever its called) but because I have no string that is present in all the files I don't think it will do in this instance.
Please sticky me that url too if it might help me deejay :)
that really sounds like something more condusive to a database driven system where you inject the tabs into html templates...
ha. so excited about the first question I forgot to read the rest of your post. Got a solution for the file extensions too. comin atcha.
|database driven system where you inject the tabs into html templates... |
This sounds intriguing. I've never taken on a project like this, have always had just small 20 page or so websites.
I have no idea really what a database driven system is, are there any sites out there that i could read and might help?
Does doing it that way require a special type of webhosting?
deejay, the only problem with the search and replace utility is that it needs something to replace, but with these tabs, there is only stuff to add (the html tags and stuff.) So there's no way to really add it, if I'm not mistaken.
You can still use search and replace, it's just a matter of unique code.
Let's say you want to add
to the same place in every document.
In each document, seek out a consistent phrase of content, string of code, etc., that appears only in the same area of the code/page where you want to insert the new line. Let's say that currently existing phrase, on each page, is
yadda yadda yadda
You can run a search and replace on
yadda yadda yadda
and replace it with
yadda yadda yadda
You take a bit of unique content/code, and replace it with itself and with whatever you want to add.
Tip - if you want to go this route, first figure out exactly how many replaces need to be made. Eg, if you have 20,000 pages and want the change made 20,000 times, then you should first take the unique bit of code and find and replace it with itself. If you come up with 20,000 results, you should be fine. Test a group of sample pages before you run the whole site.
A bit of a workaround, but it's done the job for me loads of times.
I would leave the files as they are and use SSI to include them and the required html code into the .html files which I would then auto-generate.
It should take a couple of hours at most to do the entire site.
antsaint, beautiful idea, but these guitar tabs are all different, there's no one piece of text that's the same in each guitar tab... Wish there was... :)
quotations, that sounds more actually like the route I want to take now, only thing is I don't know how to do it, lol...
Is that basically the same thing oilman said?
Is that a database driven system?
I WILL figure out how to do this... ;)
Not really a database per se, more like a listing of files names.
Each html file would look something like this:
<!--#include file="1.txt"-->widget bit of text - <!--#include file="2.txt"-->bit of widget text - <!--#include file="3.txt"-->widgets, text, bit, of, widget, <!--#include file="4.txt"--><h1>Widget!</h1><h2>Bit of text about Widgets</h2><!--#include file="5.txt"--><!--#include file="content_file_1.tab"--><!--#include file="6.txt"-->Some text which is unique to the concept and theme of the particular widget tablulature file - <!--#include file="7.txt"-->
file1.txt through file7.txt would be your html code content and content_file_1.tab through content_file_20000.tab would be the names of your tabulature files.
Very interesting, let me see if I have this correct...
Then you would have to call all the files .shtml or. asp right? Because SSI must have one of those extensions?
(I forgot, I do know a little about SSI, lol... Use it to update the news on my site, but just forgot it was called SSI...)
Or, you just do that to GENERATE the regular html pages?
How about perl?
are you going to post the url here?
quotations, I figured out how to leave the tabs as they are and inject them into the regular html files, using SSI.
BUT, this would still take making each page 1 at a time... and adding the <#include virtual sample_tab_1.tab> to 20,000 different pages.
Or what did you mean by auto-generate?
Hopefully it's what I'm looking for! :)
Ok, here's how I would break it down. I'm assuming you're working on a unix server, have access to mod_rewrite and are almost as lazy as I am.
Upload all your tab files to your server, in a folder you call /tabs/
You could also plug everything into a database, but that will take more work and server resources.
Now use mod_rewrite to send all requests to a file /tabs/nameOfSong.html to a PHP script, with nameOfSong as a parameter
In the PHP script, format the data, add the text you need before or after, and include the file that was requested.
There, a very lazy solution. Does that work, or is it still missing a few things?
|Then you would have to call all the files .shtml or. asp right? Because SSI must have one of those extensions? |
Actually, you can use includes in .html files as well. You just have to tell the server to look for includes in these files.
Just add the following line to an .htaccess file (provided, of course, that the host allows you to use .htaccess files):
AddHandler server-parsed .html
(Possible downside -- the server will now check every .html file for includes. That's fine if most of these pages have them, but if most do not it's an unnecessary load on the server and will slow down the serving of your pages.)
bruhaha, is it possible to direct the server to parse the files in certain directories only, or must it affect the whole site?
add> or if the instruction is to parse .html files, it won't parse .htm files?
if you can't find a solution I may be able to do it for you. I have home grown code that I use to this type thing all the time.
any chance we could barter?
sticky me if you are interested.
You can certainly have it parse .htm files the same way. You just have to specify it, by substituting ".htm" for ".html" in the code. If you want to do both at the same time (though I don't recommend alternating between .htm and .html on the same site!) you should be able to do so by simply expanding the line to:
AddHandler server-parsed .html .htm
|is it possible to direct the server to parse the files in certain directories only, or must it affect the whole site? |
Yes, by placing the .htaccess file with these instructions in the proper place.
Here's how it works:
1) an .htaccess file affects the directory it resides in and any nested subdirectories, and
2) an .htaccess file further down in the directory structure overrides .htaccess files higher up (rather like the "cascading" of style sheets --the more "local" directions override "global" ones)
Thus, by adding an .htaccess in a particular directory, you could tell the server to look for includes only in the .htm(l) files found in that directory.
bruhaha, what can I say?
Succinct and very helpful.