tedster - 6:38 pm on May 30, 2010 (gmt 0)
I need more literal definitions of crawling, indexing, parsing.
I'd say the entire webmaster community needs more precision in this vocabulary - it's way too easy for us to be casual. In this case, I think our disconnect was between crawling the robots.txt file itself, and crawling the URIs that it disallows.
I have never seen Google create a URI-only entry because that URI was listed in robots.txt, by the way. It's certainly not the routine, at any rate.
So where do they come from? Somewhere on the web there's a link - that's the most common way. I suspect other forms of URI discovery may also play in - such as Google toolbar data.