Monday, June 21, 2010

Interactive Transcripts and Automatic Captions for Developer Videos

Did you notice the new Interactive Transcript feature that lets you scan quickly through the full text of any owner-captioned video that you’re watching on YouTube? For videos from I/O, that means you can quickly scan through a 60 minute talk to find just the part of the talk that you need to see. Or use your browser search with the Interactive Transcript to find a mention of an API call, and then click on a word in the transcript to jump straight to that part of the video.

Because developers don’t all speak English (and because some developers speak really fast when presenting) we caption every video that we post to Most of the year, that’s a pretty easy thing to keep up with. But last year, when we posted all the videos for Google I/O 2009, it took us months to get everything done.

This year, we captioned everything within 24 hours or less of the videos going live. I’m excited about that, because it wouldn’t have been possible without the new auto-caption and auto-timing features in YouTube. We also did something a little nerdy -- we used four different methods of captioning.

If you use YouTube to share talks from your own developer events, you might find this summary useful.

The two fastest options for producing and cleaning up our captions used auto-timing. We uploaded a transcript and had YouTube’s speech recognition calculate the timecodes for us.

The two auto-timing methods were:

  • CART live real-time transcript + auto-timing
    Because we had professional real-time transcriptionists at I/O, we could instantly caption anything that had a live session transcript. That’s how we got the keynotes captioned on the day of the event. We also used this method for the android talks.

  • Professional transcription + auto-timing
    This was less expensive than CART, and faster than full captions with timecodes, but slower than real-time transcription because we had to get video files to the transcribers.

Although these methods were fastest, auto-timing turned out not to be perfect for all videos. When mic quality varied, or we had too many speaker changes in a short period of time (e.g panel discussions or fireside chats), the timing sometimes slipped out of sync. You can still use the Interactive Transcript to see what was said, but it’s not ideal.

The two slower methods that we used were:

  • Pure 'traditional' captioning
    This is what we did last year for Google I/O 2009 videos. It’s slower, and more expensive, because you have to transcribe and set all the timecodes correctly. But the end result is 100% accurately timed. We did this to fix a video that the auto-timing had a lot of difficulty with.

  • Speech recognition (auto-captions) with human cleanup and editing
    This gave us perfect timecodes, just like traditional captions, and took less time than traditional captioning. It took slightly longer than auto-timing alone because we had to download the machine-generated auto-captions from YouTube to do the edits.

    Automatic captions are fantastic if you don't have time or budget to put any work into your captioning. But for I/O, we wanted our captions to be perfect on technical terms, so fully automatic captions weren't the best fit.

Not all of these methods are equal in terms of quality, but it’s interesting to compare. To see which method was used on a video, look for the track name in the caption menu. To compare owner-uploaded captions with pure machine-generated auto-captions, you can always choose ‘Transcribe Audio’ from the caption menu for our videos.

If you’d like to help improve caption quality, please watch a video and fill out our caption survey to tell us what you think of these captions! We know some of them are going to be a little off -- if you report issues, we’ll fix them.


  1. Where videos are broadcast live, you can also make use of Twitter backchannel comments as captions. And if it's not broadcast live, and you still want to tweet along to an emergent backchannel, that works too...

    See Martin Hawksey's Twitter captioner [ ] and uTitle anytime captioner [ ] for more details.

  2. COACH is a well-known brand Coach Outlet,Coach has all kinds of handbag designs Coach Handbags,All of these kind of Coach totes,The bow tie was find from ralph lauren polo, This offer has no cash value ralph lauren outlet,There are also various types polo ralph lauren,The pocket is usually slanted lacoste polo,The signature of crocodile is Moncler jackets,This is of classic fit Moncler,As we supply great A quality Moncler coats,We thank you for your attention gold ghd,this was worn by ED Hardy,who work in japan. its original Discount ED Hardy,all the shoes from us ED Hardy Outlet,All kinds of the Burberry Sunglasses,This is of classic fit ED Hardy Sunglasses,There are also various types bape shoes,buy nice and good bape jackets

  3. YouTube/Google folk, it would be nice if the interactive transcription could eventually be also embedded in a web page.

    Interactive transcripts are an amazing feature, thanks for adding this (embedded or not)!

  4. If you want to embed the interactive transcripts, take a look at


  5. As mentioned above, I too was hoping to embed that feature in my website. As for 3rd party solutions, CaptionBox by SpeakerText is Free:
    Along with a wordpress plugin.

    But I was hoping for youtube's to also help automate the process. I couldn't find a free on to do that as well. :/