homepage Welcome to WebmasterWorld Guest from 50.16.130.188
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Using Robot.txt to Block Spidering of Link pages
Is this a common underhanded practice?
jdancing




msg:1525801
 2:28 pm on Apr 28, 2003 (gmt 0)

It seems if someone blocks the spidering of their “links” page directory in the Robots file it would create the effect of reciprocal links turning into one way links.

Should I be looking at the Robots.txt file of all my linking partners? Has anyone ever discovered that one of their link partners was doing this?

 

DaveN




msg:1525802
 2:40 pm on Apr 28, 2003 (gmt 0)

it depends on whether you wanted the link for user or search engine

If 'user' then it's still a reciprocal

if it's SE there are easier ways of stopping a SE spidering from that page (but always peak inside the robots.txt just in case)

DaveN

clueless




msg:1525803
 4:55 am on Apr 29, 2003 (gmt 0)

I have a handful of "partners" that block their links pages one way or another. I keep the links because I have them for my users, but I am sorry to lose out on whatever small SE benefit their page may have given.
Any body else think creating "one way" reciprocal links by blocking is a bit rude?

paynt




msg:1525804
 4:58 am on Apr 29, 2003 (gmt 0)

I think it is very important to be honest with any partnerships we make. It's a good idea to post your policy on your site, if you decide on this method, just to keep from offending.

Welcome clueless, to Webmasterworld. Happy posting.

jdancing




msg:1525805
 11:59 am on Apr 29, 2003 (gmt 0)

I was thinking a page that was blocked from Google spidering would not get PR, however, now that I think about it, the page would still show up as having PR. I have noticed as soon as I upload a new page to my site it gets "PR" from the main domain long before Google stops by.

So unless a link is made purely because of value to visitors, it is wise to make sure your reciprocal link partners are not blocking the search engines from spidering their outbound links.

flintops




msg:1525806
 1:47 pm on May 4, 2003 (gmt 0)

Hi guys,

Can you tell me...

How do you check if a website is blocking search engines from spidering their links pages? How do you check the robots.txt file?

Thanks.

chiyo




msg:1525807
 1:55 pm on May 4, 2003 (gmt 0)

>>Any body else think creating "one way" reciprocal links by blocking is a bit rude?

Only if the site explicity stated in their agreement with you that your links should be spiderable by SEs, in which case its not only rude but lying.

dont assume that other sites realise why you wanted the link! Many wouldnt have a clue what PR is and probably just thought you were just being neighbourly and social!

We rarely do reciprocal links but when we do we assume that when people ask for links they are looking to receive direct referrals from people reading our pages, unless they specifically state they want the "link popularity benefit" or that the linked pages needs to be spiderable, in which case we usually don't bother to go any further.

[edited by: chiyo at 1:59 pm (utc) on May 4, 2003]

chiyo




msg:1525808
 1:57 pm on May 4, 2003 (gmt 0)

flintops..

type their domain name followed by robots.txt Thay will tell you which directories they are requesting SE's to ignore.

eg:

http:*//www.suspectedtrickyguy.com/robots.txt

flintops




msg:1525809
 2:08 pm on May 4, 2003 (gmt 0)

Thanks Chiyo,

Simple when you know how!

flintops




msg:1525810
 3:27 pm on May 4, 2003 (gmt 0)

Hi Chiyo,

Wondered if you can help...

If a site has 'Disallow: /root/' in the file, will that stop the WHOLE site being spidered?

Thanks.

brotherhood of LAN




msg:1525811
 3:48 pm on May 4, 2003 (gmt 0)

flintops,

'Disallow: /' would disallow crawling of the whole site.

Check out robotstxt.org [robotstxt.org], there's only a few pages of light reading, all about robots.txt and how to use it.

rintrah




msg:1525812
 4:26 pm on May 8, 2003 (gmt 0)

I think its kind of rude to have robots.txt eliminate the value of a link page.

Also, all you'd need is for one of your link partners to get wise and email the others, and its game over.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved