homepage Welcome to WebmasterWorld Guest from 107.20.37.62
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Blocking domain specific links
seasalt




msg:1527540
 1:21 am on Dec 20, 2005 (gmt 0)

For the sake of argument, I have two sites - www.siteA.com and www.siteB.com.

siteA has some links to siteB. siteB does not link to siteA at all.

I want to restrict search engines from following all links to siteB that are on siteA.

Will something like this in siteA's robots.txt work to accomplish that?:

Disallow: siteB.com/

If this is correct, does it matter if the www. is used or not (i.e. www.siteB.com/ vs. siteB.com/)?

 

Pfui




msg:1527541
 3:46 am on Dec 21, 2005 (gmt 0)

Sorry but your example won't work because a robot won't understand it. Here are some things you can do instead:

1.) You can restrict or disallow robots from directories where you have pages you don't want crawled/spidered:

User-agent: *
Disallow: /example/

2.) Many robots will also let you exclude single pages:

User-agent: *
Disallow: /example/private.html

I say 'many' because not all robots, even the big ones, will follow all of your instructions (or even all of the time), and the bad ones will ignore your robots.txt file altogether.

3.) You can also include put HTML tags 'in' the pages you don't want crawled. But again, some robots will heed them, and others won't. (See "HTML Author's Guide" reference, below.)

You'll find loads of info about how to write your robots.txt file(s) here:

The Web Robots Pages
[robotstxt.org...]

And be sure to check out these two sections for specific info:

* Web Server Administrator's Guide to the Robots Exclusion Protocol
* HTML Author's Guide to the Robots Exclusion Protocol

When you're all set, upload your file and run it though SEW's:

Robots.txt Validator
[searchengineworld.com...]

Good luck!

Dijkgraaf




msg:1527542
 11:33 pm on Dec 21, 2005 (gmt 0)

Put rel="nofollow" in all the links on SiteA that point to SiteB.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved