If they are using the same technology that Google Voice uses to transcribe my phone messages, this will be all but useless. At best, I can often grasp general context from a phone message. It tends to translate names into other words. So "This is Tommy" was transcribed as "This is call me". It does fairly well on stop words and numbers, but tends overwhelmingly to miss the words of significance.
It's fun to guess though. I actually guessed based on area code and the few correct words of significance that "call me" was "Tommy".
Still, there's a long way to go for this be a significant help to the deaf. Professional audio from a professional announcer who doesn't say "um" a lot (a killer on Google Voice transcriptions) might help though.
Assuming they roll it out more broadly, it raises an interesting question. Four actually
1. Is Google using captions in the index currently?
2. Is this also the first step to enabling search of audio and video content beyond title, tags and surrounding text?
3. Will they be able to implement duplicate content filters eventually? I know there are people who use the same video many times, with different tags, titles and surrounding text. My understanding is that currently, if you change the lenght by a second or two, it will not be seen as the same video.
4. Does this make it more important to add voiceover to your videos instead of just my dramatic video of birds diving underwater?