Forum Moderators: open

Message Too Old, No Replies

Increase in PDF exposure

         

JonnyWales

9:22 am on Aug 11, 2003 (gmt 0)

10+ Year Member



Do people really want to see PDF documents when searching? I've just checked my keywords and PDF are all over the place. In positions 1 and 2; with 3-4 on each page ... I got to page 5 and gave up. I'm going to try FAST when I want something next time.

What is going on? Jakob is right about PDF being unfit!

Marcia

9:37 am on Aug 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Do people really want to see PDF documents when searching?

People do, at least I do! Not those files, but what's in them. NOT the files themselves, but the content that's in them. Big difference.

However, I really don't care to wait for dog-slow files to load and half the time crash my browser to boot. What I see as a plus is that Google will show an HTML version of those pages, which alleviates my suffering quite a bit but it's not exactly something we can link to if we'd like to.

No, PDF files are not fit for human consumption; and without Google they're more than useless a good part of the time. Just my experience and my two centavos. They're as bad as Javascript, if not worse, which crashes me at least a half dozen times a day and is permanently on my list of what I most love to hate.

[edited by: Marcia at 9:43 am (utc) on Aug. 11, 2003]

KevinC

9:39 am on Aug 11, 2003 (gmt 0)

10+ Year Member



I totally agree I think pdf's suck! They load horribly slow..... but if your serps are full of them why not create a couple of your own?

mil2k

8:24 pm on Aug 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



PDF are taking over!

Yes I am also observing this Alongwith the Amazon and other listings which are now dominating the SERPS. Wil be interesting to wait till GoogleGuy comes up with his answers in this thread More eBay, Amazon, Dealtime, Epinions, etc. in Results [webmasterworld.com]

jonknee

8:33 pm on Aug 14, 2003 (gmt 0)

10+ Year Member



PDF's work really nicely with the MacOS. Actually, a lot of the OS is made with PDF's (the Dock for example). In any application that you can print from, you can "print" to a PDF. Very handy.

But yes, I'm much more likely to hit up the HTML version that is made by Google.

mack

8:40 pm on Aug 14, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I think Google giving the option to view PDF as html is a great tool. Perhaps this is why we are seeing a lot more pdf's within the index.

Although I think personaly I would be reluctant to click on a pdf link.

Mack.

rfgdxm1

8:41 pm on Aug 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Free hint Marcia: try upgrading to the latest version of the PDF reader from Adobe. Far less crashes with the latest software.

Duckula

8:41 pm on Aug 14, 2003 (gmt 0)

10+ Year Member



PDFs may be awful for web content - but they are good for another kind of needs.

I have some on my site to publish some math concepts a friend asked me for. They are more popular than the heavily indexed free software subsite.

The PDF is not for reading "on the spot" but is quite good for keeping on your machine and reading later.

JonnyWales

8:57 pm on Aug 14, 2003 (gmt 0)

10+ Year Member



OK, the content of PDF's might be fine and pertinent to the search keywords used but isn't the whole point of them to produce an easy to print formatted document. They are not (usually) designed as web pages and I would rather see them differentiated (ie.seperated) in some way from standard SERPS.

werty

3:09 am on Aug 15, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



These are on the same topic:
[webmasterworld.com...]

[webmasterworld.com...]
Out of the first 100 results 58 are PDF or DOC

58% doc/pdfs in the first 100 results makes me think that PDFO (pdf optimization? ha) may be valuable...does anyone know pointers on this?

I am hoping they do add seperate tab on the google home page to handle docs/pdf's in the future. Personally I hate pdfs. I have never have had a machine that can handle them without some lag.

I do like the view as .html option but would prefer not to see them at all.

2 "Normal folks" have also noticed the ammount of pdf's and doc's in the serps and asked me about it. The major complaint is not the pdfs but the way their computers handle them.

Perhaps google should swap the view as .html with a view original .pdf, and make the title and url link to the .html version. I know this would take away some of the:
"this might be what I was looking for"
*click* ............
*10 seconds go by* .............
*click* *click* *click*
"random swearing"
*crtl + alt + del*

I have become a much more of a "cautious clicker" since the serps have been flooded with these pdf's. Also I add -.pdf to many of my queries now.

dauction

3:20 am on Aug 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would like to Google simply provide a PDF database.
...or at least provide it as a search option..in a similiar manner whether or not you want Adult material included in your results.

MonkeeSage

3:41 am on Aug 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just upgraded to Acrobat Reader version 6 (was using v. 5), and I am duely impressed. Ghostview was good, but had no browser integration; now w/ Reader 6, I'm sure I'll be viewing PDFs much more often. :)

Ps. An interesting aside that we can learn from this situation is that the number of links on a page really has nothing to do with it's placement in the SERPS -- PDFs usually have no links.

Jordan

bilalak

5:39 am on Aug 15, 2003 (gmt 0)

10+ Year Member



So why not optimize for this new feature. From my side, I would try to use a PHP library to make a PDF copy of most of the articles on my site.
This would add a better chance for higher PR and a copy of the text in PDF would not mean claoting.

Nice Gogol

jonknee

6:17 am on Aug 15, 2003 (gmt 0)

10+ Year Member



I've used PHP to make PDF's on the fly for a "print page" button. Makes it nice and easy to control the elements of the page (font/size/images). Users seem to like it as well.

BlueSky

6:41 am on Aug 15, 2003 (gmt 0)

10+ Year Member



Except for those who plan to read the info later on or print it out, I think most people would probably prefer HTML webpages. As long as it's relevant though, I don't think the alternative formats are that much of a detractor. Getting served PDF or doc files may be very new to many right now, but people will get use to it. On some searches, I even hunt them down first. I always check the HTML format to be sure it's something I want to read because the PDF does take longer to load.

I like Google, but I honestly think it's results have deteriorated quite a bit. As a result, I'm having to jump more and more over the top five or 10 and going down three, four, or more pages to find the info I need. If PDF and doc files which contain relevant info float above all the spammy and mirror sites, I say more power to them.

Hopefully some day their PHD's will wake up and realize you cannot completely remove the human factor and depend solely on computer algorithms. Hopefully, it's before another SE has caught up and then bypassed them.

Dave_Hawley

6:55 am on Aug 15, 2003 (gmt 0)



Wil be interesting to wait till GoogleGuy comes up with his answers in this thread More eBay, Amazon, Dealtime, Epinions, etc. in Results

Unfortunately the thread has been locked. I can see no reason given but I can guess :o)

Here is part of what GG said in his last post there:

We're going to take whatever actions need to happen to make sure that our search is where we want it to be

I fear that *may* well mean that nothing will change on both the PDF and Amazon problems. It very hard to tell with political type speak.

It would also appear that many threads are being locked or deleted simply for speaking ones mind. In particular, mention anything bad about DMOZ and the thread is history.

Dave

dmorison

7:51 am on Aug 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Webmasters should not overlook the value of providing a PDF document with details of their product / service, and even forcing download so that your message is saved onto hard disks, shared around, even printed out; forgotten about and left on the Laserjet in the corner for everyone else in the office to stumble across... ;)

GoogleGuy

3:39 pm on Aug 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey Dave_Hawley, I'm not sure why that thread was locked either. Arguments on PDF pro's and con's aside, I have gone to bother a few engineers about this question. We're going to revisit the issue, but first we need to collect some more data--that'll take several days, up to a week or so. When I have more info, I'll be happy to post here about PDF-ish stuff, what I've found, and whether anything is changing. It will take a little while yet to collect more data, but I don't consider this issue closed out yet.

Hope that helps,
GoogleGuy

Dave_Hawley

2:10 am on Aug 16, 2003 (gmt 0)



Hi GoogleGuy

Great! How about all the mention of Amazon results?

I don't particularly like it when a thread is locked/deleted without any reason given. People will always assume the worst unless told otherwise (or is it just my suspicious mind :o))

Dave

kevinpate

2:54 am on Aug 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> People will always assume the worst

Hmmm, Iffin only you had started the sentence with the qualifier "Some", I'd been a tad more inclined to have said yeppers. However, I can't. I know far too many peeps who take delight in making lemons into lemonade, no matter how many lemons they come across (and yeah, some of them probably actually invite the lemons, up to and including wearing signs that say "will squeeze for juice". ;^)

Dave_Hawley

3:57 am on Aug 16, 2003 (gmt 0)



Ok then

Most people......

Will that get a "yeppers" from you, or does it have to be "some"?

Dave

netguy

2:57 pm on Aug 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




It may be a fluke, but on one of my keywords that had been bombarded with all the PDF and .GOV garbage has now been cleaned up on CW.

I don't know if they are testing on the one data center, but hopefully it moves across to the rest soon!

lk125

5:21 pm on Aug 25, 2003 (gmt 0)

10+ Year Member



GG, I just wanted to check back in with you regarding this issue. I haven't really seen much of an improvement, and some examples I sent you have actually gotten worse. 1.6 million key phrase search.

1 - doc
2 - pdf
3 - ppt
4 - pdf
5 - ppt
6 - html
7 - html (same company as 6)
8 - html
9 - pdf
10 - asp (does not mention key phrase)

netguy

8:58 pm on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



lk125, I've also been waiting for GG's response. I sent him a detailed email over a week ago on the pdf, doc, gov problem, but I've heard nothing since on the subject.

In the mean time, Google keeps juggling new projects and experimenting with their algo.

Unfortunately, GoogleGuy has been a little too busy putting out other fires, rather than worry about poor user results.

GoogleGuy

12:33 am on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Still measuring on this one. Ping me in about a week and a half and I should have more data.

Net_Wizard

1:01 pm on Aug 26, 2003 (gmt 0)



Anxiously waiting to see what Google is going to do about it, positive or negative, we want to know the answer. I'll be tracking this thread as well. Because, on the locked thread, the example 'web database applications' has now 6 Amazon listing out of the Top 10.

How ridiculous can that be? Meanwhile the adwords are getting more attractive to click through.

1.Is this an intentional design to muddle the regular serp a little as to make the adwords 'stand out'?

2.Is there any behind the door agreement between Google and these Corporations(Amazon, Dealtime, etc.)?

3.Is this simply an engineering problem?

Whatever it is, we want to know.

ALbino

8:00 pm on Aug 26, 2003 (gmt 0)

10+ Year Member



I haven't encountered this problem myself until today. I thought everybody was just overreacting, but I was wrong. I've done some searches today and it's been way more prevalent, or perhaps the specifics of my search are more prone to that type of return. Regardless, one of my searches today returned only 7 HTML/PHP pages out of the first 40 results. The other 33 were PDF, DOC and TXT. I didn't even bother to go any further as none of the results were even close to relevant. I think maybe they should add a checkbox if you want to include PDF/DOC/TXT.

Let's be hoenst, PDF is arguably the worst format to become 'popular' that was ever created. Most technically oriented people hate it, and most non-computer literate people don't know what the heck to do with it. IMHO it's wasting everybody's time to include it in the index.

If my grandma is searching from her AOL account for a special kind of sewing method is she better off receiving 10 PDF documents leading to instructions for sewing machines and DOC files for contracts with companies that do mass stitching and sewing or is she better off receiving 10 HTML pages showing various sewing methods and maybe pages from messageboards where people asked similar questions? I think it's obvious the latter is the more appropriate response. If somebody wants 50 page PDF files on how to operate some brand of sewing machine then I'm sure they're going to make their search string more specific to that. Or just go to their brand of sewing machine's website.

Anyway, that's just my two cents.

dougs

10:16 pm on Aug 26, 2003 (gmt 0)

10+ Year Member



PDF´s ....ugh

Net_Wizard

2:37 pm on Aug 27, 2003 (gmt 0)



Anybody have read Chris Sherman article 'Search Engines Uncover Compromising Documents' at Searchenginewatch?

Basically, 'doc' files can possibly compromise some sensitive information according to the article.

lk125

12:56 pm on Sep 8, 2003 (gmt 0)

10+ Year Member



Hey GG just wanted to check back with you on the pdf/doc issue. I've seen the results I've monitored get better so I'm assuming it has been fixed.
This 32 message thread spans 2 pages: 32