Forum Moderators: goodroi
I only want to block google from spidering the very top index.html of the top directory of the old site. Since the move will be taking awhile, I'd still like for google to spider the rest of the site.
Although I've been web-stuff since '95, I'm old, muddled, dazed and confused, and never bothered with a robots.txt before. I got no secrets, but it has become an issue and I don't want to screw this up for the rest of the 6,000+ pages I have on the site.
Would this be what I need?
User-agent: google
Disallow: /index.html
You basically have two choices - the first is to force a redirect to http://example.com/index.html when http://example.com/ is called, then exclude index.html.
The second and much easier method is to use a meta noindex element on the page:
<meta name="robots" content="noindex">