Welcome to WebmasterWorld Guest from 54.167.157.247

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt

Disallow other URLs for one site.

   
4:29 pm on Apr 11, 2003 (gmt 0)

10+ Year Member



I have a site with several domain names pointing to it. I am going to submit it to the search engines under the main URL, but I want to make sure that if crawlers find the site via links to the other domain names it is registered under, I won't get banned for "flooding" search engines with many URLs for one site.

I am going to write a robots.txt exclusion file to address this issue. I believe the right format should be as follows (the URLs listed are not the real ones, just examples):

User-agent: *
Disallow: [exampleURL1.com...]
Disallow: [exampleURL2.com...]
Disallow: [exampleURL3.com...]

Would this be the correct syntax in order to have the crawlers not crawl the site under the example URLs listed?

Thanks for your help.

6:02 pm on Apr 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

I'm not sure of that otherwise I would disallow all my competitors from my robots.txt file ;)

what happend if you put a robots.txt on the other site's server for the SE telling the spider not to index them?

leo

[edited by: le_gber at 6:16 pm (utc) on April 11, 2003]

6:09 pm on Apr 11, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



spinweb,

That is not valid robots.txt syntax.

What you need to do is to split those domains out, and give each one its own robots.txt. This can be done, even though they all go to the same hosting account. Then simply disallow all robots from those domains you don't want indexed.

The method used to do this will depend on your server, e.g. Apache or IIS. It's fairly easy with Apache if you can use mod_rewrite.

Jim