Welcome to WebmasterWorld Guest from 107.20.20.39

Forum Moderators: goodroi

Message Too Old, No Replies

Using Robot.txt to Block Spidering of Link pages

Is this a common underhanded practice?

   
2:28 pm on Apr 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It seems if someone blocks the spidering of their “links” page directory in the Robots file it would create the effect of reciprocal links turning into one way links.

Should I be looking at the Robots.txt file of all my linking partners? Has anyone ever discovered that one of their link partners was doing this?

2:40 pm on Apr 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



it depends on whether you wanted the link for user or search engine

If 'user' then it's still a reciprocal

if it's SE there are easier ways of stopping a SE spidering from that page (but always peak inside the robots.txt just in case)

DaveN

4:55 am on Apr 29, 2003 (gmt 0)

10+ Year Member



I have a handful of "partners" that block their links pages one way or another. I keep the links because I have them for my users, but I am sorry to lose out on whatever small SE benefit their page may have given.
Any body else think creating "one way" reciprocal links by blocking is a bit rude?
4:58 am on Apr 29, 2003 (gmt 0)



I think it is very important to be honest with any partnerships we make. It's a good idea to post your policy on your site, if you decide on this method, just to keep from offending.

Welcome clueless, to Webmasterworld. Happy posting.

11:59 am on Apr 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was thinking a page that was blocked from Google spidering would not get PR, however, now that I think about it, the page would still show up as having PR. I have noticed as soon as I upload a new page to my site it gets "PR" from the main domain long before Google stops by.

So unless a link is made purely because of value to visitors, it is wise to make sure your reciprocal link partners are not blocking the search engines from spidering their outbound links.

1:47 pm on May 4, 2003 (gmt 0)

10+ Year Member



Hi guys,

Can you tell me...

How do you check if a website is blocking search engines from spidering their links pages? How do you check the robots.txt file?

Thanks.

1:55 pm on May 4, 2003 (gmt 0)

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>Any body else think creating "one way" reciprocal links by blocking is a bit rude?

Only if the site explicity stated in their agreement with you that your links should be spiderable by SEs, in which case its not only rude but lying.

dont assume that other sites realise why you wanted the link! Many wouldnt have a clue what PR is and probably just thought you were just being neighbourly and social!

We rarely do reciprocal links but when we do we assume that when people ask for links they are looking to receive direct referrals from people reading our pages, unless they specifically state they want the "link popularity benefit" or that the linked pages needs to be spiderable, in which case we usually don't bother to go any further.

[edited by: chiyo at 1:59 pm (utc) on May 4, 2003]

1:57 pm on May 4, 2003 (gmt 0)

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



flintops..

type their domain name followed by robots.txt Thay will tell you which directories they are requesting SE's to ignore.

eg:

http:*//www.suspectedtrickyguy.com/robots.txt

2:08 pm on May 4, 2003 (gmt 0)

10+ Year Member



Thanks Chiyo,

Simple when you know how!

3:27 pm on May 4, 2003 (gmt 0)

10+ Year Member



Hi Chiyo,

Wondered if you can help...

If a site has 'Disallow: /root/' in the file, will that stop the WHOLE site being spidered?

Thanks.

3:48 pm on May 4, 2003 (gmt 0)

WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



flintops,

'Disallow: /' would disallow crawling of the whole site.

Check out robotstxt.org [robotstxt.org], there's only a few pages of light reading, all about robots.txt and how to use it.

4:26 pm on May 8, 2003 (gmt 0)

10+ Year Member



I think its kind of rude to have robots.txt eliminate the value of a link page.

Also, all you'd need is for one of your link partners to get wise and email the others, and its game over.