Forum Moderators: open
I personally don't like PDFs very much -- the interface alone feels troublesome. Just like Photoshop, Adobe developed Acrobat for print and then adapted it to the web -- but I don't think the transplant is nearly as comfortable so far. Still, PDFs do serve a purpose.
PLUS
Portable over various OS
Good market penetration
File will print properly
Can minimize big files
Can be secured against changes/printing
Opens in the browser (is this really a plus?)
Internal links
MINUS
Many failed downloads (why is this?)
Long wait when visitors may expect speed
Awkward feel to the interface
Newbie confusion
No. When the PDF is used as content for your website, then a big NO. PDF is not suited for on-screen reading, navigation is particulary bad. Convert it to HTML instead. You could better let the PDF open in a separate window if possible.
"Many failed downloads (why is this?)"
Do they show up in the stats as failed downloads? Maybe because people expect a download dialog when the click on the hyperlink to the pdf. Instead they must wait for the pdf file to load. Because they don't want to wait they press the back button on the browser and the download is cancelled. I would like a special attribute for a anchor tag so i can specify if the file should be browsed or explicitly downloaded.
PDF was designed to be able to exchange documents between different platforms and with preservation of the exact layout. If you use it for that purpose, PDF is OK!
(edited by: wardbekker at 11:30 pm (utc) on Feb. 15, 2002)
In this particular market, viewers love to print these files and save them for their records, so I guess it could be considered a necessity. I don't know if newbies have more trouble with the new windows opening for print jobs than if they printed it from straight HTML or not. Could be a coin toss.
I believe the use of PDFs as a trend or internal production shortcut happens more often than the use of PDFs as the 'best' solution for online publishing. I still side with HTML as the best current content publishing solution, with or without db integration.
BTW - I could fix the fundamental errors of client-generated PDF files, but that would take the fun their office has in churning these out and all the money they save doing it themselves. I file this under "oh well".
That said, I would have to say for anything but "Here, you'll probably want to print this out at some point," type materials, I don't see that PDF is any good at all.
I don't think of a PDF as website content, I think of it as a download file.
That would be nice! I'm so tired of writing descriptions about right clicks and trying to cover all the browser and OS differences. It would also help with .doc files, since Word is now integrated into the new IE browser (shudder!).
How about using an FTP URL? Does that open up security problems or compatibility issues? It's not an area I'm familiar with.
That's the thread where we were talking about Word's integration in IE6. As I see it, the way we need to go when there is a reason to offer downloadable files (and there are many) is to give right click instructions.
Mark, your article was on the money - people should NOT be adding extra technology into their web pages unless HTML can't do the job properly. If anyone wants to read Mark's article, drop him a Sticky Mail for the URL. He writes a very good rant.
(edited by: tedster at 12:27 am (utc) on Feb. 16, 2002)
I tend to agree but would prefer to force the issue, pdf files are fine from your hard disk and the reader is fine (even excellent) for accessing pdf files at hard disk read speeds but in a browser imho it sucks.
I would prefer, but have not done to bundle say 10 x pdf pages :
"instructions_on unblocking_your_dog.pdf"
into
"instructions_on unblocking_your_dog.zip"
then the browser forces (currently) a download of a smaller file, and when the user reads the exracted pdf file they load faster off the hd and the dog gets relief quicker :-)
Of course this is all complete tosh because a simple html page could have told the user .. with quick loading text and images :
* dont let your domestic pet eat glue *
* Here is why!
and thus saved the user from having to download the unblocking instructions in the first place .. and the dog from undignified acts :-)
wiring diags I would not expect to OCR but as you mention scan / trace or photo - gif / jpg to html pg ...
I agree with your first post:
"I don't think of a PDF as website content, I think of it as a download file."
trouble imho is:
1. many sites use normal html links to pdf files so users have no warning.
2. cant force a download if the reader is installed and browser recognises it unless you zip the pdf.
3. google went and started indexing textual contents and creating awful html versions which reinforces clients views that pdf documents are web friendly....
We have all flash sites .. how soon will some plonker try to make an all pdf site?
HTML is for displaying content nicely in many different resolutions and screen sizes and formats with hyperlinks.
Adobe's pitch to make them like Web documents was really just a spin. Web hyperlinks and in-document links work but the whole thing is so clunky on-line.
Agree including advice to right click is important, and make clear that it is a PDF document (the latter we do) and we give the actual file name something.pdf so we almost do the first too.
What is curious to see is the amount of stuff in PDF format on the Web that could much simpler be published as HTML. - 1 or 2 pager stuff. Maybe another case of let's use it becuase it is there.
So the right click instructions are avoided, but then we need unzip instructions, and a plug for Alladin or Winzip or whatever.
Since the only PDFs my clients use are definitely download files, I'm sticking with right click instructions and clearly identified links. So far it works pretty well - but I would love a neater solution.
Just tell the browser that even though it has a .PDF extension, it's not a PDF file. From looking in my script archive, the following is a PHP script by Kris Hedstrom:
function download($path)
global $HTTP_USER_AGENT;
$file=basename($path);
$size = filesize($path);
header("Content-Type: application/octet-stream");
header("Content-Type: application/force-download");
header("Content-Length: $size");
// IE5.5 just downloads index.php if we don't do this
if(preg_match("/MSIE 5.5/", $HTTP_USER_AGENT))
header("Content-Disposition: filename=$file");
} else
header("Content-Disposition: attachment; filename=$file");
}
header("Content-Transfer-Encoding: binary");
$fh = fopen($path, "r");
fpassthru($fh);
}
I'm pretty sure ASP/CF/JSP/Perl etc, would have a similar soltuion - change the header content to that of a Binary file - this forces the browser to download as a Binary/Unknown format.
and on the PDF subject - most definately use PDF's to offer downloadable content if needed. Otherwise, use an alternative HTML/graphics option for "quick viewing".
Spot on. How much you use them, once again, seems to depend on your market. For our website we are often targetting CIOs etc who often want to read the full 'bumph', but don't have time to do it. Offering PDF brochures (inline with our web content) is an ideal solution as they print beautifully and are ideal for the commute home;). Just one click and the whole lot is there, whereas with HTML they (or more likely their PA) will have to print out each web page (including your logo, header and footer) for each item.
and you can use:
Header("Content-Disposition: filename=downloaded.pdf");
to "suggest" to the browser what filename to use, however it does have it's problems on on a Mac...
I personally still opt the Right-click (or "click and hold" for Mac) solution - but previously tell the user they have the option to download or view the PDF. If the user wants to see it, or if they want to download it, it's up to them, because they're the ones who want the information in the first place.
(oh and the script above is somewhat poor - it was just the first "force download" script I found on my backups)
I'm going to put all the dupe downloads in a separate folder with robots.txt protection right now. I'll let you know if there's any boost after a few months.
By the way, that other thread has some developed good discussion about partial PDF downloads [webmasterworld.com].
Opens in the browser (is this really a plus?)
Humm... If I am at a web site and I see something in PDF. The next quetsion, I ask myself it do I really (I mean really) want to read it? if I am interested enough, I will click on it. Next, if I see, that it won't let me read on my browser and asks me for download, I will not download/read it - unless, it is of my high interest. I guess, I just don't have time to (or don't wont to ?) save it, and then open it.
So I will say Opens in the browser (is this really a plus?) is a big plus if someone cares about users like me. [I think there might be at least few others like me, in the world, right? :)]
Our users don't want to read these reports on the screen, and a hyperlinked document isn't of much use to them -- they just want to print them out, put them in their briefcase, and read them later on the train/plane/bus... So, as a means of making it easy for our users to obtain a printed version to read, PDF is clearly the way to go. Otherwise, we'd have a huge FedEx bill. But, I also use PDF only for documents/versions that are only intended to be printed.
Nevertheless, creating PDFs is often a maddening process, because I'm usually converting from MS Word docs, and there are often various objects embedded in the Word doc that are far larger (in kbytes) than they need to be. Example: I just got a report e-mailed to me for conversion that was a 42-page, 5 MB Word document. I looked through it and saw 5 graphics: a logo on the cover, and inside, two pie charts and to bar charts. Sure enough, I discovered that the graphics and fonts were the problem. The logo was an embedded Corel graphic in millions colors that only actually used two colors of blue -- measuring 2 MB -- and the charts were embedded Excel workbooks with several pages of data behind them. There were also over seven different fonts in the document even though only two were actually used in the text. The blank lines were in various fonts that weren't used elsewhere -- and if converted to PDF would be embedded in the PDF and unnecessarily swell its size.
After banging on my desk and shouting "No! No! No! No!" I spent an hour and a half exporting these items, getting rid of all the unused extraneous colors and data in other applications, then replacing them in the Word document with my new versions. Then searched/replaced the unused fonts on blank lines with Times (which doesn't get embedded in PDFs since it's universal) and printed the thing to Adobe Distiller.
The Word doc was cut down from 5 MB to 130 KB, and the result PDF was 220 KB (still a bit large for my taste, but the best I could do)
Perhaps some of you know some other tricks for keeping PDFs down to a reasonable size? Meanwhile, I'm training my folks how to import ONLY the charts from Excel...
It had bemused me before, not being an intensive pdf user, when someone sent updated pdfs for their site with more pages / paragraphs added but which were smaller than the earlier versions.
What you wrote makes perfect sense, why should I have failed to realise that optimising word or pdf pages is almost the same as for html pages.
BTW for being the only one called "nonprof webguy" I come across in forums you sound pretty profficient to me :-)
I am thinking about using pdf more in a new project as a result of these threads .. and expected tables with loads of extended character set items.
I was recomended Copernick summariser today for creating text summaries, of loads of various docs, to stick in html pages leaving the extended character technical data etc in the pdf for users to download. Any views?