Forum Moderators: Robert Charlton & goodroi
User-agent: GoogleBot
Disallow: /folder/
Disallow: /private/
Disallow: /rd
I presume the format is not incorrect.
Any feedback in this regard will be very helpful.
Thanks a lot for replying.
They will get crawled - but they will not be indexed.
Google has interpreted robots.txt to only apply to indexing and not spidering.
Actually I could not get this correctly. What I interpret is if I disallow a particular file or folder using robots.txt it gets crawled but the content is not indexed in the SE's database.
But what I am observing is, I have disallowed a few folders in my robots.txt file, but inspite of that respective Url's are being displayed in the SERP's for a particular keyword, without any title tags or description.
Do I have to use .htaccess file in order to stop this.
I shall be highly obliged if you help me out in this regard.
what i've seen is if there are external links to the pages you have disallowed, they still appear as URL only listings. They will either dissappear in time or stay url only. If you really want to get them removed from the index submit your bot file to Google they will remove any pages that are indexed. But be warned make sure your Bot file is correct syntax wise (use the bot validator) and these pages will return after 180 days i think it mentions that on the page somewhere.
Vimes.
I have seen googlebot perfectly respecting the /robots.txt - that is: NO crawling of Disallowed: stuff and therefore NO indexing.
If the files are old and you have put up the robots.txt only recently, perhaps you have to give it some more time to settle.
The format of your file looks good.
Make sure you have the robots.txt in the domain's document root and that it is accessible (file permissions). Check in the logs that it got accessed by googlebot without error.
Regards,
R.
If you only use robots.txt then Google will always show the page as a URL-only listing, and will show it (probably) for ever more.
.
If you disallow something using robots.txt, something that is already indexed, then Google will not remove it on its own. You can submit the URL of the robots.txt file to the Removal Tool on the Google URL Console and that will remove it for 180 days (sometimes only 90) but then it will be relisted, even if it is still disallowed in the robots.txt file. Use the meta tag for full and permanent removal.
"I have seen googlebot perfectly respecting the /robots.txt - that is: NO crawling of Disallowed: stuff and therefore NO indexing.
If the files are old and you have put up the robots.txt only recently, perhaps you have to give it some more time to settle."
im having the same problem, my ecommere software generates a horrible site map. its too big, has far too many links etc.
so i set out my robots.txt like this:
==========================
User-agent: *
Disallow: sitemap.htm
==========================
is this ok?
thankyou