Forum Moderators: Robert Charlton & goodroi
Be careful about disallowing search engines from crawling your pages. Using the robots.txt protocol on your site can stop Google from crawling your pages, but it may not always prevent them from being indexed. For example, Google may index your page if we discover it by following a link from someone else's site. To display it in search results, Google will need to display a title of some kind and because we won't have access to any of your page content, we will rely on off-page content such as anchor text from other sites. (To truly block a URL from being indexed, you can use meta tags.)
[support.google.com...]
To keep a non-html file (such as a Word document) out of the index altogether, the best bet is using an X-Robots tag [code.google.com]
top sites like amazon, olx even google dont return root as first page in their site: search... does that mean their seo is twisted?
Home page is second result.