homepage Welcome to WebmasterWorld Guest from 54.161.147.106
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Forbid Google to offer the "View as HTML" link for pdf
How can I forbid Google to offer the "View as HTML" link for pdf documents?
yves1




msg:3401422
 7:18 pm on Jul 22, 2007 (gmt 0)

Hi,
Google SERPs usually display a "View as HTML" link beneath the title of the pdf documents.

I could notice that for some pdfs, this link is not displayed though. How can I instruct Google to do that for my pdf documents?

 

koan




msg:3401454
 9:24 pm on Jul 22, 2007 (gmt 0)

I think it may not be displayed when Google actually has a hard time converting the PDF document to HTML, not because the webmaster instructed them to.

To really disable direct access to it, block the PDF doc in your robots.txt file.

jomaxx




msg:3401798
 8:26 am on Jul 23, 2007 (gmt 0)

I suppose you're not interested in hearing that many people such as myself routinely choose "view as HTML" because having to deal with PDF documents is so God-awful.

Drag_Racer




msg:3401883
 10:39 am on Jul 23, 2007 (gmt 0)

If I don't see an html link I usually go onto the next result. I hate pdf's. Mainly because the developers at Adobe have build such a f..dup piece of junk for the browser plugin. If I every run into one of those developers I'd probably beat them within an inch of their life.

If you want to publish documents on the web, put them in a web format.

Quadrille




msg:3402034
 1:54 pm on Jul 23, 2007 (gmt 0)

The problem is that every 'update' of Acrobat is to ensure Adobe protect their copyright - they have little interest in making it quicker, slicker or more user friendly - they just want to make it hard for people to milk their cash cow!

Which means you can bet that adobe are trying to find ways to do exactly what the OP asks; just a matter of time.

Meanwhile, the only hope for the rest of us is that Google buys Adobe and makes it free - before M$ buys Adobe and succeeds in making it 100% restricted!

[edited by: Quadrille at 1:54 pm (utc) on July 23, 2007]

Gibble




msg:3402057
 2:11 pm on Jul 23, 2007 (gmt 0)

When I see a pdf, I click the view as html, or not at all.

If the view as html looks bad, but the content seems like it is going to be useful and would benefit my looking at it as a pdf I will probably view it in adobe, but it's so slow to load, that it's a rare day when that happens.

About the only things I use pdfs for is emailing certain documents (invoices, resumes, contracts, etc)

[edited by: Gibble at 2:12 pm (utc) on July 23, 2007]

netmeg




msg:3402148
 3:50 pm on Jul 23, 2007 (gmt 0)

My Adobe wants to update every single time I fire it up, so I'm in the "view as HTML or not at all" crowd too.

bcolflesh




msg:3402150
 3:55 pm on Jul 23, 2007 (gmt 0)

Adobe Reader Install = 28MB

1MB Reader Functionality
27MB DRM

Quadrille




msg:3402229
 5:08 pm on Jul 23, 2007 (gmt 0)

Maybe now Google have an online documents service, they could 'invent' a new pdf-style service. That's if they've committed so much to wireless services they cannot afford to buy adobe this month (they could always sell-on flash and dreamweaver ...)

That would be a major contribution to information distribution on the Internet!

Bones




msg:3402389
 8:29 pm on Jul 23, 2007 (gmt 0)

Going back to the OP's question...

I'm not that good with PDF's myself... but what about securing it with "No Content Copying or Extraction" when you create the PDF?

koan




msg:3402802
 6:40 am on Jul 24, 2007 (gmt 0)

I never click on the PDF doc link either, if there is no html link, I pass. Very slow to load and overall bloated plug-in. I wish it wasn't cluttering the Google SERPs.

Same with all flash web sites, especially those god awful movie productions sites.

vincevincevince




msg:3402804
 6:43 am on Jul 24, 2007 (gmt 0)

PDF is, essentially, a print format. They should never have released a reader - just a print driver. So those above who use PDF for invoices, letters, etc. are using it correctly.

yves1




msg:3403517
 8:23 pm on Jul 24, 2007 (gmt 0)

Thank you all for your valuable feedback. It appreciate it.
Bones, I'm going to try that.

A little background for my question. The idea is to offer an eBook for free on the website. It's got more than 50 pages, therefore I thought a pdf might be more appropriate than an html document (ok, I might split it into 50 html documents, but who will read 50 html pages? IMO potential readers will want to print it).

Whole story about this ebook is that it's supposed to act as a linkbait (well, reading your comments I might have to reconsider this ;).

The site owner is ok to give this ebook for free but as a reward, he wants to get trafic on his website. So my idea was to allow search engines to index the pdf, but redirect to the page of the website that offers downloading the pdf all human visitor who would try to download the pdf from outside the website, especially from the Google SERPs (well that's kind of cloaking but I think it's no issue since it's not deceptive for the user).

That's also why I thought it could be a good idea to kill the "View as HTML" link in Google SERPs which would be a killer for the website's trafic. Am I totally wrong?

vincevincevince




msg:3403787
 3:50 am on Jul 25, 2007 (gmt 0)

If that is the motivation, then I would suggest providing it as HTML pages (50 of them if necessary) and having a link (appropriately excluded from robots) to the PDF for printing purposes.

tedster




msg:3403806
 4:40 am on Jul 25, 2007 (gmt 0)

Agreed. I have a B2B client who has used this strategy very successfully for 7 years. He puts professional papers online in HTML, with a complete print version PDF linked from every HTML page of the article. The HTML pages are a major search engine draw, and they bring in new traffic all the time. The print version PDFs are downloaded quite often, but we actually disallow them in the robots.txt file.

lavazza




msg:3403816
 5:23 am on Jul 25, 2007 (gmt 0)

Incorporating a media="print" css file [w3schools.com] will allow you to control the layout of the printed output down to the last point - even if your screen css uses pixel values

yves1




msg:3403986
 11:56 am on Jul 25, 2007 (gmt 0)

Thank you Vince. I think this is the way we will go.

jd01




msg:3403996
 12:15 pm on Jul 25, 2007 (gmt 0)

I'm in the view as html crowd...

If that is the motivation, then I would suggest providing it as HTML pages (50 of them if necessary) and having a link (appropriately excluded from robots) to the PDF for printing purposes.

It looks like you have already decided (and it seems a viable solution), but I would also consider providing a page or two of html as a 'teaser' and allowing full PDF to be downloaded to read / print the remaining portion, because I'm also a card carrying member of the 'not a chance I'm going to read 50 html pages crowd'.

Justin

Gibble




msg:3404107
 2:15 pm on Jul 25, 2007 (gmt 0)

The biggest shock I'm finding in this thread is people still use printers!

I don't have one at home...well, not one that's hooked up. And we have one at work, that is only used by the boss. I can safely say, I haven't printed ANYTHING in 5 months.

But back on topic, I'd use a good print css style sheet, and let users print from their browser rather than maintain two files (one html, one pdf)

[edited by: Gibble at 2:16 pm (utc) on July 25, 2007]

yves1




msg:3404447
 6:31 pm on Jul 25, 2007 (gmt 0)

Gibble, I personaly like to read in bed, but no chance my wife will let me bring a laptop in bed (she's got shares in a printer company ;)

jd01, I like the teaser idea. Moreover, after thinking twice, a big drawback I see in the "50 html pages" idea is that Google will send trafic to any page of the document (unless I cloak), so that it has to be read in the right order to make sense.

On the other hand, 50 html pages is probably good for the site's SE ranking.

I'm a bit confused. I should probably look for the best in the "make sites for humans not for search engines" philosophy. But which is it?

jd01




msg:3404464
 7:02 pm on Jul 25, 2007 (gmt 0)

Which is better?

That's the million dollar question...
You have a few options I can see:
1. Put two (or the desired amount of) PDF pages on a single longer html page and allow visitors to download the PDF from there.
2. Go ahead with the 50 page idea and 'noindex,follow' all pages, except the first one.
3. Go ahead with the 50 page idea and use 'header tags' to indicate the 'starting point' document in the collection of documents to search engines.

[w3.org...]
[w3.org...]

Justin

jomaxx




msg:3404509
 7:56 pm on Jul 25, 2007 (gmt 0)

...it has to be read in the right order to make sense.

You're on the right track, but you need to loosen up your thinking a bit more. The above statement is just not true. Not for any work of nonfiction, anyway. Maybe it's true for a suspense novel where you don't want the reader picking up the story just as the name of the murderer is revealed.

All you need is a clear navigational structure so that a user can arrive at any page and have an intuitive sense of what's going on and where they are. And if there are certain HTML pages you don't want indexed, just use a NOINDEX meta tag. Cloaking is not required.

Robert Charlton




msg:3404781
 1:23 am on Jul 26, 2007 (gmt 0)

On the other hand, 50 html pages is probably good for the site's SE ranking.

More pages are not necessarily better. More pages help if they make your site more granular and get you inbound links for phrases people search for.

For SEO purposes, I'd break the content up if it logically breaks into topical areas that you can target for search... say, as chapters in a book. If done properly, this also helps by providing readable chunks for your users.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved