Forum Moderators: open

Message Too Old, No Replies

Optimizing non-HTML file formats

.pdf, .doc, .ppt, etc.

         

visibot

6:28 pm on Jul 14, 2002 (gmt 0)

10+ Year Member



I've seen this mentioned in passing in some other threads, but I'm now dealing with a site which has links to many of these file types so I'm researching it in earnest.

1. The company is contemplating conversion of all of the current .doc files to .pdf. My own experience and instincts suggest that this is a better approach to online document delivery. On the other hand I'm contemplating some research and testing to determine which format is more "indexable" by search engines. Any thoughts out there? Assume that the primary goal is just to make these documents indexable for people who might be seeking good information on the topics in question, and that optimization for marketing purposes is secondary. (btw, some of the .docs will be converted to html for usability reasons, but much of the collection could go either way).

2. In any case, whether it's for .pdf or for MS Office formats (.doc,.ppt,.xls) I'm curious as to how much importance search engines place on the document "Properties" in the MS office formats, and the "General Info" fields in PDF documents.

Here are the fields in question:

PDF - General Info:
- Title
- Subject
- Author
- Keywords

MS Office Properties:
- Title
- Subject
- Author
- Manager
- Company
- Category
- Keywords
- Comments
- Hyperlink base

I've seen some indications that companies use these fields to organize documents for searchability and retrieval within their intranets, so it seems that the same logic would extend to public engines, but I just haven't seen much discussion of this out there. If the company management is willing to invest the time and resources to populating these fields for their entire document collection I'd prefer to be able to assure them that it's more than just "busy work"!

;-)

victor

7:19 pm on Jul 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know one events-listing site that has a PDF of forthcoming events.

It's there simply to offer an easy way for someone to print what is otherwise spread across several webpages.

The PDF ranks 2 positions higher in Google than the website itself (when searching on the most common terms for that site). Clearly Google likes it better.

I never did find out why, so I'll be following your research with interest.

g1smd

9:47 pm on Jul 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




PDFs seem to index well, especially on Google. However, make sure you fill in the document properties on each file. There are many entries on Google with a Title of 'Template' and a blank description, no Author information, and no date.