Forum Moderators: Robert Charlton & goodroi
How hard is it for Google engineers to enter a little bit of code that says "robots.txt blocked = don't list it in the search results".
serve a custom 403 Forbiden page
NoIndex means don't show it in the results. Robots.txt means don't access it. They're two totally different things IMO.
Yeah, but I'm not sure I get your point. Are you saying that "don't access it" means "but please index the URl and list it in the results"?
...so they have no way of knowing if it's a content rich page people expect to find when they search or not and if the inbound links indicate it's the correct result they show it.
My user's admin areas are blocked in the robots.txt file. Of course, several of the user pages are linked to from the website and the users enter their own personal login page manually ... obviously using a Google Toolbar too because Googlebot keeps trying to reach them.
Also note that it's been reported that Bing apparently doesn't obey the meta robots noindex tag, and it's likely that password protection is the best solution.
<FilesMatch "(appropriate regexp)">
Header set X-Robots-Tag "noindex"
</FilesMatch>
...but it occurred to me that there might be an advantage in this case to omit the "nofollow" instruction initially. I was just thinking that in this situation, you want Googlebot to see those "noindex" <meta> tags ASAP, and by allowing the links to be crawled, you might well speed up the process....