Andy_Langton - 8:06 pm on Apr 12, 2012 (gmt 0)
But it doesn't apply when the bot gets redirected or finds a link to that page from another site, in which case it may access it directly
Not quite - it won't apply if spiders have not retrieved the robots.txt file at all, or if the version they have is out of date. But the method of discovery of the URL makes no difference. The spider should check all references to a URL against the corresponding robots.txt file. I believe Google always asks for robots.txt first before requesting URLs from a new site.
Final comment - spiders from the major search engines normally "queue" URLs for spidering as they discover them, rather than immediately spider discovered URLs, whether a redirect or otherwise. So, once they're in the queue, they're handled alongside any other URLs for that site, and again the discovery mechanism makes no difference.