Page is a not externally linkable
GoogleGuy - 4:25 am on Apr 11, 2003 (gmt 0)
However, we might be able to find external evidence on the web that www.aclin.org is a good match for the query "colorado virtual library." Maybe we found an entry in the Open Directory Project, Yahoo, or another directory. Maybe we saw references to it; it could have really good PageRank, which means that it's a reputable site--there are lots of ways. Truthfully, this is just one of those tiny little things that we do that improves Google and most people never even notice. So when you type colorado virtual library, we return the best result we can (www.aclin.org) without ever having crawled that page. For example, you'll notice that there isn't a link to see the cached page, because we never crawled it. We don't really know what's on that page, because we never crawled it. Yet we can return it as a valid result for a query. Let's bring things back to SafeSearch. With SafeSearch on, we think that www.aclin.org is a good match for colorado virtual library, but we don't actually know the content of the page--we aren't allowed to crawl it. Because we can't be sure whether the page is safe or not, we have to be conservative, so we can't return it. You could look at it two ways. You could criticize Google for "dropping" www.aclin.org in SafeSearch due to failure of Google to retain a copy of the pages in its cache. The other way is to be happy that with SafeSearch off, Google is smart enough to return a page we never crawled as a relevant match. I prefer the second way, but that's just me. :)
bakedjake, just to clarify based on your post, let's use a concrete example. www.aclin.org, the Colorado Virtual Library, has a robots.txt that prevents spiders from crawling it. Google abides by that robots.txt file--we *do not* crawl www.aclin.org.