Welcome to WebmasterWorld Guest from

Message Too Old, No Replies

Handling duplicate content in downloadable MS Word docs



4:44 am on May 12, 2013 (gmt 0)

I have ms word and power points documents that are templates to download. These templates are used by the readers to create certain types of documents.
I don't want search engine to crawl and index the documents (about 100 docs) because the content of these documents is duplicated - the same sentences are repeated and used as fillers.
The only way I see right now to avoid indexing these pages is to zip them so that it won't be readable by search engines.
Any other solution?

Martin Ice Web

8:11 am on May 12, 2013 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month


disallow the downloadpath in robots.txt.

As I know that google does fetch the files anyway and i also saw some disallowed files in serps, I prefer to set a

Deny from IP

in .htaccess for the download-folder, that will work.


9:22 am on May 12, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

Any other solution?

<FilesMatch "\.(doc|zip)$">
Header set X-Robots-Tag "noindex"

But zipping downloadable files is a decent idea anyway, even if things don't get mangled in transit as much as they used to.


4:08 pm on May 12, 2013 (gmt 0)

Thanks lucy24 and MIW.
you gave me two good ideas.

Featured Threads

Hot Threads This Week

Hot Threads This Month