Google Adds "Quick View" of PDFs to SERPs

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Adds "Quick View" of PDFs to SERPs

engine

9:02 am on Oct 8, 2009 (gmt 0)

Google Adds "Quick View" of PDFs to SERPs [googleblog.blogspot.com]

Today, we've added new links to "Quick View" PDFs in your browser with the formatting intact. The new links are based on the same technology that's available in Google Docs and Gmail, as well as to webmasters through the Google Docs viewer. We've been rolling this technology out to the search results page since July, and as of today we've added "Quick View" links to more than 50% of the PDFs in our index. The new links appear at the end of the second line of the result, right underneath the title

tedster

3:46 pm on Oct 8, 2009 (gmt 0)

There is some good for the end user in this new feature. But I'm wondering if each "Quick View" will still require a visit to the site's server - it sounds like it won't. For instance, I don't think the html view required a visit either, it's just served up by Google.

So it looks like one more way that Google Search can distribute a site's content without requiring a direct visit to the site itself - and in this case, it's an entire document, not just a snippet. And the intention is to roll this out for other file format types, too.

engine

5:33 pm on Oct 8, 2009 (gmt 0)

Yes, a great tool, however, I can confirm that it does not appear in the server logs. That concerns me as we'll never know the stats.

tedster

5:45 pm on Oct 8, 2009 (gmt 0)

Time to get serious about the X-robots tag, I guess.

maximillianos

1:31 pm on Oct 10, 2009 (gmt 0)

That or start finding ways to embed ads into your documents...

badbadmonkey

3:04 pm on Oct 10, 2009 (gmt 0)

Yes, a great tool, however, I can confirm that it does not appear in the server logs. That concerns me as we'll never know the stats.

Are you sure, isn't it effectively just an online PDF viewer?

Gomvents

4:16 pm on Oct 10, 2009 (gmt 0)

badbadmonkey, they cache and re-serve. At best they are acting like a proxy so it'll only look like a hit from Googlebot, not a user :(

boplicity

6:08 pm on Oct 10, 2009 (gmt 0)

Is anyone else concerned by this? Should Google have the right to host and distribute your intellectual property without your explicit permission?

This isn't just a "PDF viewer", it is part of Google's pattern of appropriating, hosting, and redistributing the intellectual property of others for their own gain, without the permission of the copyright owner.

First the cache, then Google books, now this. What's next?

kksite123

8:30 pm on Oct 10, 2009 (gmt 0)

Sad news and again confirmation Google is trying to keep the user at the Google domain longer and longer.

Eventually Google will be able to more or less answer any question, since every other website is embedded in the Google domain.

They're definitely playing it smart, but IMHO it's about time to put a hold on this nonsense.

You don't want to be recognised in Street View when you're doing something embarrassing? How could we know, just file a complaint. You don't want your website indexed? Just modify your robots.txt. You don't want us to steal your PDF? ....How could we know you didn't want that?!

They're pushing it more and more and I don't like it.

JAB Creations

4:12 am on Oct 11, 2009 (gmt 0)

Should Google have the right to host and distribute your intellectual property without your explicit permission?

It was old even before the first time someone said anything like this.

If you do NOT exclude something on your site via robots.txt then you have NO RIGHT to complain. It's no different from driving around not knowing what speed limits are and then complaining about getting a ticket for speeding. If you don't know how to operate a website then you don't have any justification for complaining about how others operate theirs.

- John

tedster

4:40 am on Oct 11, 2009 (gmt 0)

Allowing Google to spider and index does NOT imply a legal right to reproduce the entire document for the public from their server. This feels like being a frog in the cooking pot who doesn't notice the water temperature keeps going up.

zett

5:07 am on Oct 11, 2009 (gmt 0)

John, with all due respect, but

If you do NOT exclude something on your site via robots.txt then you have NO RIGHT to complain.

is not how the copyright laws have been constructed. The copyright laws have been constructed as "opt-in", i.e. by default noone (not even Google) has the right to use material without acquiring the rights to it.

However, Google -over the past few years- has made clear that they don't care about this and made their search service "opt out", i.e. they WILL take what material they can get (whether they have the rights or not) and will only stop if you as a rights holder stop them. But just because they do, it does NOT mean they have the legal right to do so.

(For search they could get away because they claim to use just snippets and thumbnails which are covered by "fair use". I don't think they can as easily claim "fair use" when displaying entire PDF files.)

Now off to creating PDF files...!

badbadmonkey

6:26 am on Oct 11, 2009 (gmt 0)

Indexing/caching HTML is one thing and understandable - sorry, but get over it - the "cached" link also has a semantic meaning and the visitor understands what they are looking at.

Ripping documents like PDFs off sites and presenting them in a Google branded viewer however is something quite different, and obviously blatant copyright infringement.

I think this is just a fancy viewer. The problem is the webmaster will be seeing stats credited to the Google server, but that's the only real issue. I await to be proven wrong.

graeme_p

11:24 am on Oct 11, 2009 (gmt 0)

All the major search engines have offered view as HTML or view cache functionality for a long time. This is provides better rendering, than the old functionality (or Yahoos), but so does Bing.

The question I have is whether there is an easy way to tell Google (and Bing) not to cache PDF document or show it in the viewer. Yahoo will follow a noarchive HTTP header, do the others?

engine

2:48 pm on Oct 12, 2009 (gmt 0)

Are you sure, isn't it effectively just an online PDF viewer?

As I said, I can confirm there was no entry in the server logs for the PDF I checked. If I hadn't tried it myself, I would have doubted it, too.

phranque

4:56 am on Oct 14, 2009 (gmt 0)

an easy way to tell Google (and Bing) not to cache PDF document or show it in the viewer

as ted mentioned, the X-robots tag [googleblog.blogspot.com] is probably the way to go.

hughie

8:16 am on Oct 15, 2009 (gmt 0)

talk about a lawsuit waiting to happen, very dangerous ground.

What's quite amusing is that if i type in "PDF" here in australia, result no.4 is a "quick view" of a pdf about plagiarism....

driller41

11:03 am on Oct 15, 2009 (gmt 0)

I think we will see more of this plagiarism happening over time, eventually they will try to scrape and then present content from a number of sites/pages and present it within one single document on their own server and call it an "authorative source".

Initially they will use snippets and then expand on those snippets until the viewer does not need to go to the target site at all.