Welcome to WebmasterWorld Guest from 54.227.52.24

Forum Moderators: martinibuster

Message Too Old, No Replies

How do I stop Y from cracking open PDF files and

publishing them as html?

     

Lokutus

3:23 am on Sep 14, 2004 (gmt 0)

10+ Year Member



These are private files on my site for members and customers--not for the general public. I use fully locked PDFs for a good reason--so that no one can extract the content.

Yet Y does it.

I have avoided using a robots.txt file because Google doesn't like them.

Any way to stop Y from doing this?

Warren

4:55 am on Sep 14, 2004 (gmt 0)

10+ Year Member



A robots.txt will do it. Make sure it is a valid file, and then Google will have no problem with it.

Your other option is to require the user to log in and store them on a secure part of your site.

Tim

5:17 am on Sep 14, 2004 (gmt 0)

10+ Year Member



The "view as" functionality is part of the page cache so you can use this tag to prevent this. I will also have someone investigate why we are cracking open these files if you send me some examples.
Thanks,
Tim

How do I keep my page from being cached in Yahoo! Search?
Our search engine contains "snapshots" of the majority of pages discovered during the crawl on the Web and caches them. This enables us to highlight the search terms on text-heavy pages so you can find relevant information quickly. And if the site's server temporarily fails, you can still see the page.
If you run a web site and do not want your content to be accessible through the cache, you can use the NOARCHIVE meta-tag. Place this in the <HEAD> section of your documents:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

This tag will tell robots not to archive the page. Our crawler will continue to index and follow links from the page, but it will not display a cached page in search results.

Please note that the change will occur the next time the search engine crawls the page containing the NOARCHIVE tag (typically at least once per month).

Also, the NOARCHIVE tag controls only whether the cached page is shown. To prevent the page from being indexed, use the NOINDEX tag. To prevent the crawler from following links, use the NOFOLLOW tag.

TheDoctor

3:12 pm on Sep 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



AFAIK, if you reference them via "ftp", rather than "http", this will also prevent them being spidered.

Someone will no doubt correct me if I'm wrong, but it seems to have worked for me.

roitracker

3:18 pm on Sep 14, 2004 (gmt 0)

10+ Year Member



You could also put the PDFs in a directory & use htaccess to require a username/password.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month