PDF on websites - the plus and minus - HTML forum at WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

PDF on websites - the plus and minus

tedster

10:58 pm on Feb 15, 2002 (gmt 0)

On another thread some side comments were made about not liking PDF files. On the other hand, there are some definite advantages. At tax time those IRS forms in PDF sure are handy compared to schlepping to the nearest physical supply.

I personally don't like PDFs very much -- the interface alone feels troublesome. Just like Photoshop, Adobe developed Acrobat for print and then adapted it to the web -- but I don't think the transplant is nearly as comfortable so far. Still, PDFs do serve a purpose.

PLUS
Portable over various OS
Good market penetration
File will print properly
Can minimize big files
Can be secured against changes/printing
Opens in the browser (is this really a plus?)
Internal links

MINUS
Many failed downloads (why is this?)
Long wait when visitors may expect speed
Awkward feel to the interface
Newbie confusion

wardbekker

11:24 pm on Feb 15, 2002 (gmt 0)

"Opens in the browser (is this really a plus?)"

No. When the PDF is used as content for your website, then a big NO. PDF is not suited for on-screen reading, navigation is particulary bad. Convert it to HTML instead. You could better let the PDF open in a separate window if possible.

"Many failed downloads (why is this?)"

Do they show up in the stats as failed downloads? Maybe because people expect a download dialog when the click on the hyperlink to the pdf. Instead they must wait for the pdf file to load. Because they don't want to wait they press the back button on the browser and the download is cancelled. I would like a special attribute for a anchor tag so i can specify if the file should be browsed or explicitly downloaded.

PDF was designed to be able to exchange documents between different platforms and with preservation of the exact layout. If you use it for that purpose, PDF is OK!

(edited by: wardbekker at 11:30 pm (utc) on Feb. 15, 2002)

Mark_A

11:28 pm on Feb 15, 2002 (gmt 0)

Tedster no offence but is your pdf post a statement or a discussion proposal?

tedster

11:41 pm on Feb 15, 2002 (gmt 0)

It's a discusion proposal. I've been reading side comments on some other threads around WmW, and I thought we might sort it out a bit here in a more direct manner -- maybe even find some fixes for the more problematic aspects.

idiotgirl

11:43 pm on Feb 15, 2002 (gmt 0)

I have a client who loves PDFs. Has a ton of them which their office produced. I see lots of errors in display and proprietary fonts that show as error screens when the PDF loads into a new window. These are 'hard-wired' problems rather than generic PDF problems simply from viewing the file as a hyperlink (in a new Acrobat window).

In this particular market, viewers love to print these files and save them for their records, so I guess it could be considered a necessity. I don't know if newbies have more trouble with the new windows opening for print jobs than if they printed it from straight HTML or not. Could be a coin toss.

I believe the use of PDFs as a trend or internal production shortcut happens more often than the use of PDFs as the 'best' solution for online publishing. I still side with HTML as the best current content publishing solution, with or without db integration.

BTW - I could fix the fundamental errors of client-generated PDF files, but that would take the fun their office has in churning these out and all the money they save doing it themselves. I file this under "oh well".

mivox

11:50 pm on Feb 15, 2002 (gmt 0)

PDFs beat anything for providing downloadable product manuals and technical spec sheets, IMO. If you produce the PDF correctly, there are no worries whatsoever about cross-platform/browser issues, they print like a dream, and if the only copy of a piece of material is in hardcopy, I'll spend some time with my scanner and save it to PDF before I even CONSIDER re-typing a 20 page document into HTML.

That said, I would have to say for anything but "Here, you'll probably want to print this out at some point," type materials, I don't see that PDF is any good at all.

I don't think of a PDF as website content, I think of it as a download file.

tedster

11:55 pm on Feb 15, 2002 (gmt 0)

>> I would like a special attribute for a anchor tag so i can specify if the file should be browsed or explicitly downloaded.

That would be nice! I'm so tired of writing descriptions about right clicks and trying to cover all the browser and OS differences. It would also help with .doc files, since Word is now integrated into the new IE browser (shudder!).

How about using an FTP URL? Does that open up security problems or compatibility issues? It's not an area I'm familiar with.

Mark_A

12:04 am on Feb 16, 2002 (gmt 0)

Mivox >I'll spend some time with my scanner and save it to PDF before I even CONSIDER re-typing a 20 page document into HTML

.. get something like WinFax on your pc .. fax the 20 pages in, OCR the text into your editor job done.

physics

12:06 am on Feb 16, 2002 (gmt 0)

And if you want to create math/physics documents they can't be beat. I mean, look at how successful [lanl.arXiv.org ] is! ;)

Mark_A

12:07 am on Feb 16, 2002 (gmt 0)

Tedster >It would also help with .doc files, since Word is now integrated into the new IE browser (shudder!).

Double shudder .. versions no please?
Can I refer you to my very recent stickymail to you with a url of an article on just this subject area you are getting into :-)

tedster

12:22 am on Feb 16, 2002 (gmt 0)

[webmasterworld.com...]

That's the thread where we were talking about Word's integration in IE6. As I see it, the way we need to go when there is a reason to offer downloadable files (and there are many) is to give right click instructions.

Mark, your article was on the money - people should NOT be adding extra technology into their web pages unless HTML can't do the job properly. If anyone wants to read Mark's article, drop him a Sticky Mail for the URL. He writes a very good rant.

(edited by: tedster at 12:27 am (utc) on Feb. 16, 2002)

mivox

12:26 am on Feb 16, 2002 (gmt 0)

OCR the text into your editor job done

Um, not for a technical manual with a lot of wiring diagrams, strange symbols in the text and text-wrap style formatting.

I've tried OCR. The proofreading and reformatting necessary are not worth my time in any way shape or form.

Mark_A

12:34 am on Feb 16, 2002 (gmt 0)

Tedster >As I see it, the way we need to go when there is a reason to offer downloadable files (and there are many) is to give right click instructions.

I tend to agree but would prefer to force the issue, pdf files are fine from your hard disk and the reader is fine (even excellent) for accessing pdf files at hard disk read speeds but in a browser imho it sucks.

I would prefer, but have not done to bundle say 10 x pdf pages :

"instructions_on unblocking_your_dog.pdf"

into

"instructions_on unblocking_your_dog.zip"

then the browser forces (currently) a download of a smaller file, and when the user reads the exracted pdf file they load faster off the hd and the dog gets relief quicker :-)

Of course this is all complete tosh because a simple html page could have told the user .. with quick loading text and images :

* dont let your domestic pet eat glue *
* Here is why!

and thus saved the user from having to download the unblocking instructions in the first place .. and the dog from undignified acts :-)

Mark_A

12:45 am on Feb 16, 2002 (gmt 0)

Mivox I sit corrected - for technical manuals with strange symbols you are quite right OCR is a "chocholate teapot".

wiring diags I would not expect to OCR but as you mention scan / trace or photo - gif / jpg to html pg ...

I agree with your first post:

"I don't think of a PDF as website content, I think of it as a download file."

trouble imho is:
1. many sites use normal html links to pdf files so users have no warning.
2. cant force a download if the reader is installed and browser recognises it unless you zip the pdf.
3. google went and started indexing textual contents and creating awful html versions which reinforces clients views that pdf documents are web friendly....

We have all flash sites .. how soon will some plonker try to make an all pdf site?

chiyo

12:57 am on Feb 16, 2002 (gmt 0)

We use PDF files a lot. They were made for printing - pure and simple. The layout is exsctly as the author or publisher wants, including as many say intrciate diagrams and mathematical symbols. There are still such things as page numbers in a printed out PDF document. Printing out a HTML documnt and there are NO page numbers, it depends on your paper size etc. So it is harder to cite or reference by page number (or impossible)

HTML is for displaying content nicely in many different resolutions and screen sizes and formats with hyperlinks.

Adobe's pitch to make them like Web documents was really just a spin. Web hyperlinks and in-document links work but the whole thing is so clunky on-line.

Agree including advice to right click is important, and make clear that it is a PDF document (the latter we do) and we give the actual file name something.pdf so we almost do the first too.

What is curious to see is the amount of stuff in PDF format on the Web that could much simpler be published as HTML. - 1 or 2 pager stuff. Maybe another case of let's use it becuase it is there.

tedster

3:51 am on Feb 16, 2002 (gmt 0)

I like the fact that zipping a pdf file forces the browser to download. But it also runs the risk of users with no zip utility and no understanding of how to get and use one (definitely a bit more complex than Acrobat Reader.)

So the right click instructions are avoided, but then we need unzip instructions, and a plug for Alladin or Winzip or whatever.

Since the only PDFs my clients use are definitely download files, I'm sticking with right click instructions and clearly identified links. So far it works pretty well - but I would love a neater solution.

jammy

11:46 am on Feb 16, 2002 (gmt 0)

"Hello Browser, here's a Binary file to download, it most certainly isn't a PDF."

Just tell the browser that even though it has a .PDF extension, it's not a PDF file. From looking in my script archive, the following is a PHP script by Kris Hedstrom:

function download($path)

global $HTTP_USER_AGENT;
$file=basename($path);
$size = filesize($path);
header("Content-Type: application/octet-stream");
header("Content-Type: application/force-download");
header("Content-Length: $size");
// IE5.5 just downloads index.php if we don't do this
if(preg_match("/MSIE 5.5/", $HTTP_USER_AGENT))

header("Content-Disposition: filename=$file");
} else

header("Content-Disposition: attachment; filename=$file");
}
header("Content-Transfer-Encoding: binary");
$fh = fopen($path, "r");
fpassthru($fh);
}

I'm pretty sure ASP/CF/JSP/Perl etc, would have a similar soltuion - change the header content to that of a Binary file - this forces the browser to download as a Binary/Unknown format.

and on the PDF subject - most definately use PDF's to offer downloadable content if needed. Otherwise, use an alternative HTML/graphics option for "quick viewing".

joshie76

2:03 pm on Feb 16, 2002 (gmt 0)

>> PDFs beat anything for providing downloadable product manuals and technical spec sheets

Spot on. How much you use them, once again, seems to depend on your market. For our website we are often targetting CIOs etc who often want to read the full 'bumph', but don't have time to do it. Offering PDF brochures (inline with our web content) is an ideal solution as they print beautifully and are ideal for the commute home;). Just one click and the whole lot is there, whereas with HTML they (or more likely their PA) will have to print out each web page (including your logo, header and footer) for each item.

wardbekker

4:47 pm on Feb 16, 2002 (gmt 0)

Great solution Jammy!, BUT ;-)

What about the extension of the file when saved? As far as i can see it's the name of the script, and because the extension doesn't match the file on a windows system, people maybe wonder why the pdf file won't open. Not a fool proof solution.

jammy

7:14 pm on Feb 16, 2002 (gmt 0)

nothing is ever a foolproof situation... i did only say it was "a" solution ;-)

and you can use:
Header("Content-Disposition: filename=downloaded.pdf");

to "suggest" to the browser what filename to use, however it does have it's problems on on a Mac...

I personally still opt the Right-click (or "click and hold" for Mac) solution - but previously tell the user they have the option to download or view the PDF. If the user wants to see it, or if they want to download it, it's up to them, because they're the ones who want the information in the first place.

(oh and the script above is somewhat poor - it was just the first "force download" script I found on my backups)

tedster

7:44 pm on Feb 16, 2002 (gmt 0)

Stating the file size is an important courtesy. One thing that ticks me off as a user is having no idea what kind of time my click might involve when I'm on a dial-up.

physics

8:05 pm on Feb 16, 2002 (gmt 0)

Here's a question:
If I offer files in multiple formats, say PDF, .doc, HTML, and TEXT, will I get in trouble with the SEs for having dupe content? Or will their spiders not notice because they're in different formats?

tedster

8:23 pm on Feb 16, 2002 (gmt 0)

That's a good question, and I never considered it. I have one site with exactly that situation.

I'm going to put all the dupe downloads in a separate folder with robots.txt protection right now. I'll let you know if there's any boost after a few months.

By the way, that other thread has some developed good discussion about partial PDF downloads [webmasterworld.com].

New_Alex

9:49 pm on Feb 16, 2002 (gmt 0)

I am not an experienced web designer but I' m keen to learn!!!

Very interesting topic

Purple Martin

1:46 am on Feb 18, 2002 (gmt 0)

Accessibility! That's why PDFs should never be used for content.

Sure, use them for printable documents (for example a printable form that is meant to be posted via snail mail), but never use a PDF for content. Never ever.

Vishal

2:21 am on Feb 18, 2002 (gmt 0)

Opens in the browser (is this really a plus?)

Humm... If I am at a web site and I see something in PDF. The next quetsion, I ask myself it do I really (I mean really) want to read it? if I am interested enough, I will click on it. Next, if I see, that it won't let me read on my browser and asks me for download, I will not download/read it - unless, it is of my high interest. I guess, I just don't have time to (or don't wont to ?) save it, and then open it.

So I will say Opens in the browser (is this really a plus?) is a big plus if someone cares about users like me. [I think there might be at least few others like me, in the world, right? :)]

rogerd

1:35 pm on Feb 18, 2002 (gmt 0)

PDF files aren't very friendly for the non-technical user. One site we work with was targeted mostly at elderly folks, and the previous designer put a PDF form it it. Needless to say, it didn't get used much. These users are mystified by boxes that pop up and say, "Open file from this location or Download"... forget about clicking on a link to Adobe to install Acrobat...

celerityfm

2:20 pm on Feb 18, 2002 (gmt 0)

PDF files have their place, often times they work much better then HTMLizing a Word document, I prefer "printing to PDF" (with full Acrobat) with a Word document rather then posting the actual Word Document.

nonprof webguy

10:47 am on Feb 20, 2002 (gmt 0)

We use PDFs for distributing reports that are 20 to 200 pages in length, most of which have several charts (bar charts, pie charts) but otherwise few graphics.

Our users don't want to read these reports on the screen, and a hyperlinked document isn't of much use to them -- they just want to print them out, put them in their briefcase, and read them later on the train/plane/bus... So, as a means of making it easy for our users to obtain a printed version to read, PDF is clearly the way to go. Otherwise, we'd have a huge FedEx bill. But, I also use PDF only for documents/versions that are only intended to be printed.

Nevertheless, creating PDFs is often a maddening process, because I'm usually converting from MS Word docs, and there are often various objects embedded in the Word doc that are far larger (in kbytes) than they need to be. Example: I just got a report e-mailed to me for conversion that was a 42-page, 5 MB Word document. I looked through it and saw 5 graphics: a logo on the cover, and inside, two pie charts and to bar charts. Sure enough, I discovered that the graphics and fonts were the problem. The logo was an embedded Corel graphic in millions colors that only actually used two colors of blue -- measuring 2 MB -- and the charts were embedded Excel workbooks with several pages of data behind them. There were also over seven different fonts in the document even though only two were actually used in the text. The blank lines were in various fonts that weren't used elsewhere -- and if converted to PDF would be embedded in the PDF and unnecessarily swell its size.

After banging on my desk and shouting "No! No! No! No!" I spent an hour and a half exporting these items, getting rid of all the unused extraneous colors and data in other applications, then replacing them in the Word document with my new versions. Then searched/replaced the unused fonts on blank lines with Times (which doesn't get embedded in PDFs since it's universal) and printed the thing to Adobe Distiller.

The Word doc was cut down from 5 MB to 130 KB, and the result PDF was 220 KB (still a bit large for my taste, but the best I could do)

Perhaps some of you know some other tricks for keeping PDFs down to a reasonable size? Meanwhile, I'm training my folks how to import ONLY the charts from Excel...

Mark_A

11:01 am on Feb 20, 2002 (gmt 0)

nonprof webguy - interesting to read your post on pdf file sizes.

It had bemused me before, not being an intensive pdf user, when someone sent updated pdfs for their site with more pages / paragraphs added but which were smaller than the earlier versions.

What you wrote makes perfect sense, why should I have failed to realise that optimising word or pdf pages is almost the same as for html pages.

BTW for being the only one called "nonprof webguy" I come across in forums you sound pretty profficient to me :-)

I am thinking about using pdf more in a new project as a result of these threads .. and expected tables with loads of extended character set items.

I was recomended Copernick summariser today for creating text summaries, of loads of various docs, to stick in html pages leaving the extended character technical data etc in the pdf for users to download. Any views?

This 35 message thread spans 2 pages: 35