Forum Moderators: not2easy
I have recently had a couple of requests from users on one of my sites for a downloadable, printable version of the site.
The trouble is, that while the site is predominantly text (over 90%), there are many pages, such that copying and pasting out of every one will take an age.
Does anyone know of a quick-and-nifty way to do this, or an application that will do it for me?
Yours lazily
Cy :)
Thanks for the tips. I think I'll use pdf, but allow printing as that's why people want it in ebook format. I'll use the text protection feature so people can't just copy it.
AS you say, that still leaves me with the crushingly boring task of converting all that HTML to text. Any ideas anyone?
Cy
filename.txt.
#!/usr/bin/perl -w
use strict [perldoc.com];
use HTML::Parser [perldoc.com] ();sub start_handler {
return [perldoc.com] if shift [perldoc.com] ne "body";
my $self = shift [perldoc.com];
$self->handler(text => sub [perldoc.com] { print [perldoc.com] OUT shift [perldoc.com] }, "dtext");
$self->handler(end => sub [perldoc.com] { shift [perldoc.com]->eof if shift [perldoc.com] %eq% "body"; },
"tagname,self");
}
#
my $p = HTML::Parser->new(api_version => 3);
$p->handler( start => \&start_handler, "tagname,self");
#
while (<>) {
open [perldoc.com] 'OUT', ">$_.txt" or die [perldoc.com] "Can't open $_: $!\n";
$p->parse_file($_) ¦¦ die [perldoc.com] "Can't parse $_: $!\n";
close [perldoc.com] 'OUT';
}
Run the script like so:
script < filename_list.txt HTH Andreas