Forum Moderators: coopster

Message Too Old, No Replies

php to output static html

         

ByronM

2:01 am on Jan 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm interested in using mysql to store a vast catalog of information that i want to dump out into hundreds of webpages nightly.

Is there already such a tool developed? Has anyone attempted anything and seen if its better to use an XML system rather then mysql?

Reason the database is involved at all is there will be some querying and comparisons, but i want static html for performance and easy of getting spidered.

thanks

dmorison

6:55 pm on Jan 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use wget to do things like this, using the "mirror" functionality. Set it going on your dynamic website; and half an hour later you have a massive "static" website built from your database content.

jamie

7:55 pm on Jan 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi dmorison - thanks for tip, i am also interested in this - how do you go about updating the content when it changes?

would you have to write a script which checked the last update time of each page (querying the database, etc), and comparing it to the creation time of the static page, and if necessary re-run wget on each page which needs updating?

does that sound along the right lines?

thanks

<added> sorry i got my knickers in a twist - instead of re-running wget on each page, you'd have to generate the html page again from a script... hmm i'm confused.. ;-)

jamie

9:04 pm on Jan 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



dmorison,

i just realised we don't need to do that, because we generate all of our pages from one script anyway. it will be a simple matter to change the content management system to generate a static page and to upload it automatically everytime something is changed. this is Brilliant! it means i can have all the code and dbqueries on a test server, and serve plain vanilla html on the live server.

(sorry about the hijacking byronM ;-)

ByronM

2:18 pm on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hehe its cool. I'm just researching different ways of doing this.

I may look at mod'ing the Moveable Type to suit my needs or use it as a starting base since it is a template based system that allows you to ad content and dump to static pages.

Is there a PHP system with a similar interface?

jamie

6:38 pm on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ByronM,

i've just been reading up on wget and it has so many options, it sounds like it could do all you need.

as long as you have some (any CMS) way of producing pages, wget can then rename them; rename the links accordingly; mirror sites by just checking last modified dates...

i am sure dmorison can give you a clearer idea, as i've just started looking into it. but it sounds really good.

(e.g. i'm gonna get it running periodically through my dynamic pages, looking for updates and mirroring those to the live server)

good luck

dmorison

7:00 pm on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not running a site at the moment that works in this way; but what I have done in the past is to first of all configure Apache so that .html files are processed by PHP. This simply means you don't have to mess about with wget's filename changing capabilities, or worry about the internal linking of your site.

Then, in a folder I create two directories:

dynamic.website.com
static.website.com

You then build your PHP/database application, using files with a .html extension in the directory dynamic.website.com.

To serve the website; you point Apache at a soft link as the web root; rather than an actual directory. This soft link can point at either dynamic.website.com; or static.website.com.

So, to serve the dynamic version of your site:

/var/www -> dynamic.website.com

Now, you can use the mirror functionality of wget to retrieve the entire website into the directory static.website.com. Once you have built the static version; you can then swap your soft link over:

/var/www -> static.website.com

Study the man page for wget to learn how to use the mirror functionality. The option you want to look out for is the one that keeps it onsite - you don't want to go off retrieving sites that you link to!

I know this isn't a step-by-step how-to, but it should point you in the right direction.

dmorison

7:08 pm on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



how do you go about updating the content when it changes

The last time I built such a site; I actually went another step and could access both the static or dynamic versions using different URL's (this is not hard to setup using Apache's dynamic virtual host support).

The dynamic site had various people doing stuff, and their actions may cause an update of one or more tables.

A cron job running every night then looked directly at the last modified time of the mysql database table files, which on a simple mysql setup are in files called:

/var/lib/mysql/[database_name]/[table_name].MYI

The cron job (a Perl hack) then knew which portions of the site had to be rebuilt if a given table had been modified. The script then called wget to retrieve that portion of the site into the static web root.

jamie

8:49 pm on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



great stuff dmorison :-)

thanks for explaining - i was thinking along the right lines, but the trick with the mysql last update location is a nice one!

as said above i wanted to get wget running on a cron job to check all the last modified times - but with our larger site wget would be running the whole time - i much prefer your suggestion to tell it which bits to update.

cheers

g1smd

12:23 am on Jan 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Make sure that the dynamic website can't be spidered by Google otherwise those pages might start turning up in the SERPs (and the static site delisted).

You could have the dynamic site above the web root, or have it visible and rely on a robots.txt file to keep bots out, or you could put a meta robots noindex tag on every page of only the dynamic site, or use .htaccess to keep them out.

dmorison

5:52 am on Jan 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Make sure that the dynamic website can't be spidered by Google otherwise those pages might start turning up in the SERPs (and the static site delisted).

Good point - in the scenario described above the dynamic site was protected by HTTP authentication; so it could never have been crawled.

Note however that in the basic setup (using a soft link to either a dynamic or static site so that only one version is ever visible to the web) this isn't a problem.

g1smd

8:07 pm on Jan 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You'd be amazed at where bots can start poking around in your folders, unless you take steps to explicitly keep them out.

You had a good scheme. My note was a general comment for future surfers.