Right. I simply want to generate html-htm pages. Not intereted in url rewrite since all cgi-bin pages of this marketplace deny access to search engines inside the robots.txt
simply want now to generate static pages of the cgi pages to reduce the incredible cpu load (up to 99%) of some of these pages.
Here's the meat from a stripped down example:
For this example I am loading the variables with some strings, in a working example you would loop through a database or other source to pull TITLE, META, and CONTENT and format it as HTML.
NOTE: be sure to test the paths you use, most likely they will need to be specified from the root directory for the domain where the pages are to be published.
#!/usr/local/bin/perl
#
$page_title = "This is the title";
$meta_description = "This page is about stuff";
$meta_keywords = "html, static, page, generator";
$page_content = "<p><b>HELLO WORLD</b> I once was dynamic, but now am static content.</p>\n";
#
#==================
# START HTML OUTPUT
#==================
#
$HTMLfilespec = '/user99/dummy/test/etc/testfile.htm';
open (HTM,">$HTMLfilespec");
&StaticHeader;
print (HTM "$page_content");
&StaticFooter;
close (HTM);
#
exit;
#=================
sub StaticHeader {
#=================
#
print (HTM "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n\n");
print (HTM "<html>\n");
print (HTM "<head>\n");
print (HTM "<title>$page_title</title>\n");
print (HTM "<meta HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=ISO-8859-1\">\n");
print (HTM "<meta name=\"description\" content=\"$meta_description\">\n");
print (HTM "<meta name=\"keywords\" content=\"$meta_keywords\">\n");
print (HTM "</head>\n");
print (HTM "<body>\n");
}
#=================
sub StaticFooter {
#=================
#
print (HTM "</body>\n");
print (HTM "</html>\n");
#
}
Next, you need a place to store your data, titles, and output file names. Assuming you already have this in place. Likely candidates are a database or (ack) a text file database. If you use text, make the DB point to the plain text content pages, as in
id^title^output^source
Where source only points to the actual page content, as in "mainpage.txt". The point is to not store your entire content in a plain text database, this can create more work for you in swapping out newlines.
Last, a template with substitution markers. The markers can be any character that is not normally used in text, but it must be unique so it can't accidentally come up in your content.
<html><head><title><PAGETITLE></title></head>
<body>
<h1><PAGETITLE></h1>
<PAGECONTENT>
</body>
</html>
so when you output your pages, first store the entire page in a scalar. Assume that $title, $content, and $output_file are already populated. $output_file is in the format /full/virtual/path/to/file, not a URL:
open (TEMPLATE, "$template") or &error("can't open template $template $!");
# (Always always always have an error trap for every action)
while ($line = <TEMPLATE>) {
if ($line =~ /\<PAGETITLE\>) { $line =~ s/\<PAGETITLE\>/$title/g; }
if ($line =~ /\<PAGECONTENT\>) { $line =~ s/\<PAGECONTENT\>/$content/g; }
}
$final .= $line;
}
close TEMPLATE;
Now that your page is ready, write it:
open (FILE, ">$output_file") or &error("can't write file $output_file $!");
print FILE $final;
close FILE;
Doners. :-) If you store this in a sub, it can be called recursively and write out an entire website.
[edited by: phranque at 11:18 pm (utc) on May 12, 2008]
[edit reason] disabled smileys ;) [/edit]
Report for task WebmasterWorld News and Discussion for the Web Professional
[webmasterworld.com...]
Exploration depth: 1
The task is stopped. 118 files are queued.
60 files were retrieved. Total size of retrieved files: 793,563
0 files were not retrieved due to network connection errors. (Retry)
3 files were not retrieved due to server errors (e.g. Not Found).
2 files were not accepted. Test filters
The task was created on 10/25/2006 1:16:56 PM
The task has finished on 10/25/2006 1:19:05 PM
You could just let it run until it slurps up the whole website. Might take a while for a big website.
it seems a big tak for me to come up to a solution since there are two more problems to take into consideration,
1- The Member user session id
2- the fact that the main CPU consuming file is retriving at the same time from the database the categories and the items and should be made static only in the categories part since the items are addedd and expire anytime by the users.
In either case, the editor will just see it as an invalid tag, but the real question is who uses a WYSIWYG editor? :-) In reality I use a pipe for my markers, but this message board breaks them, it was easier to exemplify with < and >.
Maybe rocknbil will have a suggestion.
Only with some program alterations, did you write this program? Presuming you did:
for my dinamic cgi-bin script
Instead of using a sessionid in the query string set a cookie and read the cookie for sessionid values. You can do this for dynamic or static pages, but you have to use Javascript for static pages. But if you hope to generate these pages as HTML, I can't imagine why you need a sessionid once they're output as HTML.
IMO my guess that big sites use them because 1) business is so good they don't care about these SEO issues, 2) they use other methods for search engine friendliness, or 3) the specific pages using the sessionid's aren't important for search indexing, such as search results pages as a user shops.
If your dynamic pages generate content that you hope to be indexed, there are a number of methods you can use to enhance the digestibility for the URL, but that's not what you're asking.
^ ^ I think he/she doesn't mean the head tag, the means <HEADING> markers. Also note my use of capitals and the non-use of /i, you want it as unique as possible.
Ahh... That makes sense. But since you substitute the <MARKERS> for real content they never get seen by the browser anyway. So I think he/she maybe misunderstood what you are doing.
Instead of using a sessionid in the query string set a cookie and read the cookie for sessionid values. You can do this for dynamic or static pages, but you have to use Javascript for static pages. But if you hope to generate these pages as HTML, I can't imagine why you need a sessionid once they're output as HTML.
what if you want to make static only a part of a dynamic page that is taking too much cpu retrieving too many data from the database. Is this possible using javascript too?
it seems like you are trying to do something that is really not possible. Either the programs you use are not well written and are using too much server resources or you have too much traffic to your site. Trying to reduce the dynamic content isn't a bad thing, but I don't think you can reduce it enough to make any difference, and as you can see the suggestions get more and more complicated.
Try and figure out what it is that is taking up the most server resources and see if you can fix that. If not, it might be time to update to a dedicated server or add some more servers if you're already using a dedicated server or hire a programmer to help you out (not me).
Can you say what the name of the program is you are using? Maybe it has some well known problems.
the most important is becouse the item listings are displayed toghether with the sub-category list. (similar to ebay)
the retriving from the database of both data is taking too much cpu resources and only making the category list static will help reduce it.