homepage Welcome to WebmasterWorld Guest from 107.20.37.62
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt
Disallow other URLs for one site.
spinweb




msg:1526058
 4:29 pm on Apr 11, 2003 (gmt 0)

I have a site with several domain names pointing to it. I am going to submit it to the search engines under the main URL, but I want to make sure that if crawlers find the site via links to the other domain names it is registered under, I won't get banned for "flooding" search engines with many URLs for one site.

I am going to write a robots.txt exclusion file to address this issue. I believe the right format should be as follows (the URLs listed are not the real ones, just examples):

User-agent: *
Disallow: [exampleURL1.com...]
Disallow: [exampleURL2.com...]
Disallow: [exampleURL3.com...]

Would this be the correct syntax in order to have the crawlers not crawl the site under the example URLs listed?

Thanks for your help.

 

le_gber




msg:1526059
 6:02 pm on Apr 11, 2003 (gmt 0)

Hi,

I'm not sure of that otherwise I would disallow all my competitors from my robots.txt file ;)

what happend if you put a robots.txt on the other site's server for the SE telling the spider not to index them?

leo

[edited by: le_gber at 6:16 pm (utc) on April 11, 2003]

jdMorgan




msg:1526060
 6:09 pm on Apr 11, 2003 (gmt 0)

spinweb,

That is not valid robots.txt syntax.

What you need to do is to split those domains out, and give each one its own robots.txt. This can be done, even though they all go to the same hosting account. Then simply disallow all robots from those domains you don't want indexed.

The method used to do this will depend on your server, e.g. Apache or IIS. It's fairly easy with Apache if you can use mod_rewrite.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved