homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Using Robot.txt to Block Spidering of Link pages
Is this a common underhanded practice?

 2:28 pm on Apr 28, 2003 (gmt 0)

It seems if someone blocks the spidering of their “links” page directory in the Robots file it would create the effect of reciprocal links turning into one way links.

Should I be looking at the Robots.txt file of all my linking partners? Has anyone ever discovered that one of their link partners was doing this?



 2:40 pm on Apr 28, 2003 (gmt 0)

it depends on whether you wanted the link for user or search engine

If 'user' then it's still a reciprocal

if it's SE there are easier ways of stopping a SE spidering from that page (but always peak inside the robots.txt just in case)



 4:55 am on Apr 29, 2003 (gmt 0)

I have a handful of "partners" that block their links pages one way or another. I keep the links because I have them for my users, but I am sorry to lose out on whatever small SE benefit their page may have given.
Any body else think creating "one way" reciprocal links by blocking is a bit rude?


 4:58 am on Apr 29, 2003 (gmt 0)

I think it is very important to be honest with any partnerships we make. It's a good idea to post your policy on your site, if you decide on this method, just to keep from offending.

Welcome clueless, to Webmasterworld. Happy posting.


 11:59 am on Apr 29, 2003 (gmt 0)

I was thinking a page that was blocked from Google spidering would not get PR, however, now that I think about it, the page would still show up as having PR. I have noticed as soon as I upload a new page to my site it gets "PR" from the main domain long before Google stops by.

So unless a link is made purely because of value to visitors, it is wise to make sure your reciprocal link partners are not blocking the search engines from spidering their outbound links.


 1:47 pm on May 4, 2003 (gmt 0)

Hi guys,

Can you tell me...

How do you check if a website is blocking search engines from spidering their links pages? How do you check the robots.txt file?



 1:55 pm on May 4, 2003 (gmt 0)

>>Any body else think creating "one way" reciprocal links by blocking is a bit rude?

Only if the site explicity stated in their agreement with you that your links should be spiderable by SEs, in which case its not only rude but lying.

dont assume that other sites realise why you wanted the link! Many wouldnt have a clue what PR is and probably just thought you were just being neighbourly and social!

We rarely do reciprocal links but when we do we assume that when people ask for links they are looking to receive direct referrals from people reading our pages, unless they specifically state they want the "link popularity benefit" or that the linked pages needs to be spiderable, in which case we usually don't bother to go any further.

[edited by: chiyo at 1:59 pm (utc) on May 4, 2003]


 1:57 pm on May 4, 2003 (gmt 0)


type their domain name followed by robots.txt Thay will tell you which directories they are requesting SE's to ignore.




 2:08 pm on May 4, 2003 (gmt 0)

Thanks Chiyo,

Simple when you know how!


 3:27 pm on May 4, 2003 (gmt 0)

Hi Chiyo,

Wondered if you can help...

If a site has 'Disallow: /root/' in the file, will that stop the WHOLE site being spidered?


brotherhood of LAN

 3:48 pm on May 4, 2003 (gmt 0)


'Disallow: /' would disallow crawling of the whole site.

Check out robotstxt.org [robotstxt.org], there's only a few pages of light reading, all about robots.txt and how to use it.


 4:26 pm on May 8, 2003 (gmt 0)

I think its kind of rude to have robots.txt eliminate the value of a link page.

Also, all you'd need is for one of your link partners to get wise and email the others, and its game over.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved