Welcome to WebmasterWorld Guest from 54.152.38.154

Forum Moderators: goodroi

Message Too Old, No Replies

Disallow via robots.txt - but with indexed incoming links

     
7:26 pm on Oct 16, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 5, 2001
posts:392
votes: 0


I have a website that offers a service to other webmasters. Other webmasters put code on their page that pops a window that is

www.mysite.com/clients/clientnumbers/stuff

inside js. Now that Google is indexing these js links, mysite.com is quickly filling google with pages I don't want indexed. Same 'stuff' for each client means dup content to google.

my question is, if I set robots.txt to dissallow /clients/ will google respect this and deindex the files? Or will it ignore robots.txt because there are links coming into the specific pages?

I've also read cases of google ignoring noindex on the pages themselves, but that may be a better option....?

8:03 pm on Oct 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


Google will respect /robots.txt exclusion and will not fetch excluded URLs.

If there are incoming links then you should expect to see 'URL-only' listings. No title, snipped, size or cache.

/robots.txt

User-agent: *
Disallow: /foo

Google result:

www.example.com/foo/bar.html
Similar pages

As far as I know, using noindex in a robots META tag or returning HTTP status 404 will remove the result if you remove the /robots.txt exclusion and the bot fetches the URL.

8:23 pm on Oct 16, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 5, 2001
posts:392
votes: 0


ok. I would much prefer 0 listings at all.. so my best solution is to allow the bot to crawl, but meta tag a noindex on every page.

thanks a bunch ciml

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members