I think it may not be displayed when Google actually has a hard time converting the PDF document to HTML, not because the webmaster instructed them to.
To really disable direct access to it, block the PDF doc in your robots.txt file.
I suppose you're not interested in hearing that many people such as myself routinely choose "view as HTML" because having to deal with PDF documents is so God-awful.
If I don't see an html link I usually go onto the next result. I hate pdf's. Mainly because the developers at Adobe have build such a f..dup piece of junk for the browser plugin. If I every run into one of those developers I'd probably beat them within an inch of their life.
If you want to publish documents on the web, put them in a web format.
The problem is that every 'update' of Acrobat is to ensure Adobe protect their copyright - they have little interest in making it quicker, slicker or more user friendly - they just want to make it hard for people to milk their cash cow!
Which means you can bet that adobe are trying to find ways to do exactly what the OP asks; just a matter of time.
Meanwhile, the only hope for the rest of us is that Google buys Adobe and makes it free - before M$ buys Adobe and succeeds in making it 100% restricted!
[edited by: Quadrille at 1:54 pm (utc) on July 23, 2007]
When I see a pdf, I click the view as html, or not at all.
If the view as html looks bad, but the content seems like it is going to be useful and would benefit my looking at it as a pdf I will probably view it in adobe, but it's so slow to load, that it's a rare day when that happens.
About the only things I use pdfs for is emailing certain documents (invoices, resumes, contracts, etc)
[edited by: Gibble at 2:12 pm (utc) on July 23, 2007]
My Adobe wants to update every single time I fire it up, so I'm in the "view as HTML or not at all" crowd too.
Adobe Reader Install = 28MB
1MB Reader Functionality
Maybe now Google have an online documents service, they could 'invent' a new pdf-style service. That's if they've committed so much to wireless services they cannot afford to buy adobe this month (they could always sell-on flash and dreamweaver ...)
That would be a major contribution to information distribution on the Internet!
Going back to the OP's question...
I'm not that good with PDF's myself... but what about securing it with "No Content Copying or Extraction" when you create the PDF?
I never click on the PDF doc link either, if there is no html link, I pass. Very slow to load and overall bloated plug-in. I wish it wasn't cluttering the Google SERPs.
Same with all flash web sites, especially those god awful movie productions sites.
PDF is, essentially, a print format. They should never have released a reader - just a print driver. So those above who use PDF for invoices, letters, etc. are using it correctly.
Thank you all for your valuable feedback. It appreciate it.
Bones, I'm going to try that.
A little background for my question. The idea is to offer an eBook for free on the website. It's got more than 50 pages, therefore I thought a pdf might be more appropriate than an html document (ok, I might split it into 50 html documents, but who will read 50 html pages? IMO potential readers will want to print it).
Whole story about this ebook is that it's supposed to act as a linkbait (well, reading your comments I might have to reconsider this ;).
The site owner is ok to give this ebook for free but as a reward, he wants to get trafic on his website. So my idea was to allow search engines to index the pdf, but redirect to the page of the website that offers downloading the pdf all human visitor who would try to download the pdf from outside the website, especially from the Google SERPs (well that's kind of cloaking but I think it's no issue since it's not deceptive for the user).
That's also why I thought it could be a good idea to kill the "View as HTML" link in Google SERPs which would be a killer for the website's trafic. Am I totally wrong?
If that is the motivation, then I would suggest providing it as HTML pages (50 of them if necessary) and having a link (appropriately excluded from robots) to the PDF for printing purposes.
Agreed. I have a B2B client who has used this strategy very successfully for 7 years. He puts professional papers online in HTML, with a complete print version PDF linked from every HTML page of the article. The HTML pages are a major search engine draw, and they bring in new traffic all the time. The print version PDFs are downloaded quite often, but we actually disallow them in the robots.txt file.
Incorporating a media="print" css file [w3schools.com] will allow you to control the layout of the printed output down to the last point - even if your screen css uses pixel values
Thank you Vince. I think this is the way we will go.
I'm in the view as html crowd...
|If that is the motivation, then I would suggest providing it as HTML pages (50 of them if necessary) and having a link (appropriately excluded from robots) to the PDF for printing purposes. |
It looks like you have already decided (and it seems a viable solution), but I would also consider providing a page or two of html as a 'teaser' and allowing full PDF to be downloaded to read / print the remaining portion, because I'm also a card carrying member of the 'not a chance I'm going to read 50 html pages crowd'.
The biggest shock I'm finding in this thread is people still use printers!
I don't have one at home...well, not one that's hooked up. And we have one at work, that is only used by the boss. I can safely say, I haven't printed ANYTHING in 5 months.
But back on topic, I'd use a good print css style sheet, and let users print from their browser rather than maintain two files (one html, one pdf)
[edited by: Gibble at 2:16 pm (utc) on July 25, 2007]
Gibble, I personaly like to read in bed, but no chance my wife will let me bring a laptop in bed (she's got shares in a printer company ;)
jd01, I like the teaser idea. Moreover, after thinking twice, a big drawback I see in the "50 html pages" idea is that Google will send trafic to any page of the document (unless I cloak), so that it has to be read in the right order to make sense.
On the other hand, 50 html pages is probably good for the site's SE ranking.
I'm a bit confused. I should probably look for the best in the "make sites for humans not for search engines" philosophy. But which is it?
Which is better?
That's the million dollar question...
You have a few options I can see:
1. Put two (or the desired amount of) PDF pages on a single longer html page and allow visitors to download the PDF from there.
2. Go ahead with the 50 page idea and 'noindex,follow' all pages, except the first one.
3. Go ahead with the 50 page idea and use 'header tags' to indicate the 'starting point' document in the collection of documents to search engines.
|...it has to be read in the right order to make sense. |
You're on the right track, but you need to loosen up your thinking a bit more. The above statement is just not true. Not for any work of nonfiction, anyway. Maybe it's true for a suspense novel where you don't want the reader picking up the story just as the name of the murderer is revealed.
All you need is a clear navigational structure so that a user can arrive at any page and have an intuitive sense of what's going on and where they are. And if there are certain HTML pages you don't want indexed, just use a NOINDEX meta tag. Cloaking is not required.
|On the other hand, 50 html pages is probably good for the site's SE ranking. |
More pages are not necessarily better. More pages help if they make your site more granular and get you inbound links for phrases people search for.
For SEO purposes, I'd break the content up if it logically breaks into topical areas that you can target for search... say, as chapters in a book. If done properly, this also helps by providing readable chunks for your users.