Forum Moderators: goodroi

Message Too Old, No Replies

Excluding domain in robots.txt

         

triumph

2:02 am on Oct 11, 2005 (gmt 0)

10+ Year Member



Can I exclude a domain in robots.txt instead of using "nofollow"

For example, i have a site called widgets.com, and I link often to example.com. Can I block all bots from following example.com, or do I have to put nofollow in all of the links?

Hope that made sense..

Lord Majestic

2:04 am on Oct 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As per standard every domain name (this includes subdomains, ie www.example.com is different from example.com) has to have robots.txt which can contain the following data to exclude ALL urls from that domain:

User-agent: *
Disallow: /

If you put this robots.txt file into all your domains then all well behaved robots should avoid crawling any urls.

cws3di

3:11 am on Oct 11, 2005 (gmt 0)

10+ Year Member



Wow, Lord Majestic, if we could only convince all spam sites, scrapers and hijackers to put that in their robots.txt files, we could all breathe easier. :-)

Not only would the "good bots" not follow their links, but we would also get better SERPS placement!

Hmmm, somehow I don't quite think that is what triumph is trying to accomplish.

Triumph, your question is a bit confusing - what exactly are you trying to do? Why put a link on your site if you don't want it to be followed? Why not just delete those links out to example.com?

triumph

4:31 am on Oct 11, 2005 (gmt 0)

10+ Year Member



I own the site I'm linking to - example.com.

Example.com hosts a php script that I use to redirect affiliate links. The problem is widget.com's host (a popular weblog host) doesn't support PHP, so I have to use example.com's host. Therefore, I was hoping to use robots.txt to avoid getting penalized by Google for what would look like heavy interlinking between both sites. Every affiliate link on my site would look like this "example.com/jump.php?affiliate"

Widget.com currently enjoys nice organic results, so I don't want to compromise that in any way.

cws3di

6:20 am on Oct 11, 2005 (gmt 0)

10+ Year Member



OOHHH, I get it.

What you are trying to do cannot be accomplished in the robots.txt file

You can use this meta tag inside your <head></head> on each page of your blog:

<meta name="robots" content="index,nofollow">

The robots will index your blog page, but not follow ANY of the links on the page. i.e. if you have other links besides the ones you discussed, they will not be followed either.

Another alternative is to use the type of javascript links that are well known to not be "spiderable".

Sorry, I can;t give you the code for those because I have never used them, just know about them as a thing most webmasters want to avoid :-)

triumph

2:15 pm on Oct 11, 2005 (gmt 0)

10+ Year Member



Thanks for your help CWS3di. I guess I'll just insert the nofollow link for every redirect link I insert.

Lord Majestic

2:26 pm on Oct 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I guess I'll just insert the nofollow link for every redirect link I insert.

This is not what he said -- you are confusing "nofollow" attributed of a URL with "nofollow" in META tags -- the former is NOT an obligation for a crawler to NOT follow the URL -- its a hint to the indexer that this URL should not be credited towards PR (or its equivalent), but its NOT a guarantee that URL won't be followed.

Even META tags are suspect as to whether they guarantee that the URLs won't be followed, so robots.txt is really the best way to ensure data won't get CRAWLED.

Angelis

2:29 pm on Oct 11, 2005 (gmt 0)

10+ Year Member



Cant you just link in another way e.g. using javascript or flash?

triumph

7:11 pm on Oct 11, 2005 (gmt 0)

10+ Year Member



If I have control of the domain I'm linking to, and I'm not using it for any other purpose except this redirect, I can just use robots.txt. Wouldn't that be the easiest route?

cws3di

7:25 pm on Oct 11, 2005 (gmt 0)

10+ Year Member



Yes, it is pretty simple to block all "well behaved" bots in a robots.txt for example.com (use the code provided by Lord Majestic in post#2 above)

However, you then still have the concern about all of those outbound links on widgets.com, which are seen by the spiders.

We all have concerns about what Google may judge as odd or strange. I think that was the reason for your original post, that you were concerned that your high-traffic site at widgets.com might incur some sort of penalty for excess outbound linking to the same example.com site.

If you just have a simple blocking robots.txt on your example.com, I doubt that Google takes that into account when judging your pages on widgets.com

Maybe the safest thing would be to inquire in the javascript forum about linking code that can't be followed by spiders (then use that type of code on widgets.com to call your script from example.com)