shaunm - 10:18 am on Sep 6, 2012 (gmt 0)
Thank you all for your insightful answering! I very much appreciate all your help. I like this forum so much than any other forums out there.
To put an end to this getting-bigger thread, I now understand the purpose of Sitemaps and Robots.txt. I have furnished below my understanding of this two. If what I think as right is wrong, please tell me that I am wrong :) I would appreciate you saying that.
1. Primary reason of any robots.txt file is to 'stop the crawlers from crawling the page CONTENT/HEADERS'
2. Through a robots.txt, I can prevent a page/directory/file from getting crawled - No content, no header will be crawled. Because of this, the particular page/file will not rank for any search queries, but will appear only as a SNIPPET ONLY PAGE, that too not for any query but only when I use site: command. Also the SNIPPET only version shows up because this particular page has internal links, anchor texts pointing to it from somewhere else. If no reference URLs exists, even the SNIPPET won't come up in the SERPs.
3. If I put disallow rules for a page in robots.txt, but at the same time there pages is in SITEMAP. Google and other crawlers will prefer SITEMAP details over ROBOTS.TXT files. Thus, they will crawl that particular page and will index it in the SERPs - Like they display any pages(Title, Desc, URL)
4. If I use NOINDEX meta tag and have that particular page in SITEMAP, it will be indexed at the end ignoring NOINDEX command because the URL is in SITEMP.
5. Finally, a robots.txt is USELESS and WASTE OF TIME when you put it in place for blocking a particular page/portion of your website from appearing in the SERPs. But what you should do is, make sure all that URLs are not in your SITEMAP and put a NOINDEX meta tag in EACH and EVERY page?
Many thanks guys!