Tuesday, October 18, 2011

New Text-to-Speech API for Chrome extensions

Author Photo
By Dominic Mazzoni, Software Engineer

Interested in making your Chrome Extension (or packaged app) talk using synthesized speech? Chrome now includes a Text-to-Speech (TTS) API that’s simple to use, powerful, and flexible for users.

Let’s start with the "simple to use" part. A few clever apps and extensions figured out how to talk before this API was available – typically by sending text to a remote server that returns an MP3 file that can be played using HTML5 audio. With the new API, you just need to add "tts" to your permissions and then write:
chrome.tts.speak('Hello, world!');
It’s also very easy to change the rate, pitch, and volume. Here’s an example that speaks more slowly:
chrome.tts.speak('Can you understand me now?', {rate: 0.6});
How about powerful? To get even fancier and synchronize speech with your application, you can register to receive callbacks when the speech starts and finishes. When a TTS engine supports it, you can get callbacks for individual words too. You can also get a list of possible voices and ask for a particular voice – more on this below. All the details can be found in the TTS API docs, and we provide complete example code on the samples page.

In fact, the API is powerful enough that ChromeVox, the Chrome OS screen reader for visually impaired users, is built using this API.

Here are three examples you can try now:

    TTS Demo (app)
    Talking Alarm Clock (extension)
    SpeakIt (extension)

Finally, let's talk about flexibility for users. One of the most important things we wanted to do with this API was to make sure that users have a great selection of voices to choose from. So we've opened that up to developers, too.

The TTS Engine API enables you to implement a speech engine as an extension for Chrome. Essentially, you provide some information about your voice in the extension manifest and then register a JavaScript function that gets called when the client calls chrome.tts.speak. Your extension then takes care of synthesizing and outputting the speech – using any web technology you like, including HTML5 Audio, the new Web Audio API, or Native Client.

Here are two voices implemented using the TTS Engine API that you can install now:

    Lois TTS - US English
    Flite SLT Female TTS - US English

These voices both use Native Client to synthesize speech. The experience is very easy for end users: just click and install one of those voices, and immediately any talking app or extension has the ability to speak using that voice.

If a user doesn't have any voices installed, Chrome automatically speaks using the native speech capabilities of your Windows or Mac operating system, if possible. Chrome OS comes with a built-in speech engine, too. For now, there's unfortunately no default voice support on Linux – but TTS is fully supported once users first install a voice from the Chrome Web Store.

Now it's your turn: add speech capability to your app or extension today! We can't wait to hear what you come up with, and if you talk about it, please add the hashtag #chrometts so we can join the conversation. If you have any feedback, direct it to the Chromium-extensions group.


Dominic Mazzoni is a Software Engineer working on Chrome accessibility. He's the original author of Audacity, the free audio editor.

Posted by Scott Knaster, Editor

6 comments:

  1. Why not use CSS 2 aural stylesheets?

    ReplyDelete
  2. How about an API to record from the Mic?

    The x-webkit-speed is usfeul, but there's no programmatic access to it.
    Also it would be nice to at times skip the voice recognition and just get a wave for to use with the audio api.

    ReplyDelete
  3. Do you plan to support SSML [1] or IPA?

    [1] http://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language

    ReplyDelete
  4. Doesn't work for any language other than English.

    ReplyDelete
  5. Now a days every body have started using chrome an text to speech extension for chrome amazing one!!!

    _________________________

    Voice to text

    ReplyDelete
  6. no way to define the language to be used, so the SAPI layer in windows could use the information for non-english purpose ?

    ReplyDelete