Forum Moderators: open

Message Too Old, No Replies

Block PDF from being spidered within weblink. Possible?

         

martinbsp

7:52 pm on May 16, 2005 (gmt 0)

10+ Year Member



Is it possible to put something in a weblink to stop it being spidered?

For example, I have a webpage that has the synopsis of a technical paper on it (this is HTML). On this page is a link to the full PDF. Is it possible to put something into the link to the PDF to stop it being spidered, e.g.

nofollow:http://wwwipaper17.pdf

or do I actually have to write something into the HTML itself?

Thanks

Martin

P.S. No programming knowledge at all

Robin_reala

7:57 pm on May 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, for Google at least you could put:

<a href="http://wwwipaper17.pdf" rel="nofollow">

No idea if other search engines have adopted that.

martinbsp

8:08 pm on May 16, 2005 (gmt 0)

10+ Year Member



OK bear with me here. When I use the editor at work (WYSIWYG) I highlight text I want to turn into a link and then type a link in. The text is then turned into a link.

Are you saying I just type that text string in there, or would I have to type that directly into the HTML code?

Sorry, IT is not my strong point.

Thanks

girish

8:15 pm on May 16, 2005 (gmt 0)

10+ Year Member



block it in the robots.txt

Lord Majestic

8:17 pm on May 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




<a href="http://wwwipaper17.pdf" rel="nofollow">

No idea if other search engines have adopted that.

Nooooooooooooooooooooooooooo -- nofollow only means the link won't get weight, it does not mean it won't be followed - there is no obligation taken by search engine not to follow the link: this value is meant to be used for ranking purposes.

Robin_reala

12:34 pm on May 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Uh.

Sorry, I really should have remembered that :(

TheDoctor

4:41 pm on May 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apologies for the late intervention here, but you could use

<a href="ftp://wwwipaper17.pdf">

ie type "ftp" where you would normally type "http" in the link. Search engine bots will not follow an ftp link, since ftp is not, technically speaking, a web protocol (so don't use it for web pages, only things like pdf files and the like, that should be downloaded rather than viewed).

However, using ftp will not do you much good unless you find out where the server you use expects to find ftp files. This can vary from host to host. One that I use allows me to put them in the same directories as my http files, while another demands I put them in a different place. Yours may do either of these, so you'll have to find out. Just ask technical support where to put an ftp file. You will not get a rude answer.

One downside if you do this is that you will not get numbers of times the paper is downloaded in your logs, since logs only record web activity (ie http and https).

krt1

12:09 am on May 31, 2005 (gmt 0)

10+ Year Member



Just put
/disallow *.pdf
in robots.txt