The Consortium is working to develop speech recognition based “captioning” systems that automatically convert spontaneous speech into accurate, readable text that can be displayed as captions to improve accessibility.

Using speech recognition to caption live speech is a complex challenge and differs significantly from tasks such as dictation.   Spontaneous speech that occurs in a lecture is acoustically, linguistically, and structurally different than read speech or speech used to create written documents.

The Consortium’s blueprint for a classroom speech recognition captioning interface includes the following guiding principles:

  1. Minimal  training / enrollment requirements to minimize effect on instructors
  2. As a spoken lecture is digitized, transcribed text needs to be simultaneously displayed with minimal lag.
  3. Displayed text needed to be readable
  4. Speech recognition generated text needs to be accurate
  5. Learners should be able to customize how text is accessed
  6. Editing tools are needed to allow third party editing of recognition errors
  7. Transcribed text should be synchronized with source audio, allowing dissemination using standards for accessible media

Prototypes
Based on these core requirements, various prototype systems have been developed and evaluated in live classroom tests including IBM ViaScribe , Caption Editing System, and a Microsoft Speech API (SAPI) based tool.

To capitalize on core technology advances, the Consortium is continuing research and development of new applications.

IBM Human Ability and Accessibility Center  is developing a web-based system that uses either local speech recognition or a web based recognizer to transcribe speech. The system also includes an online translation feature.

Partners from University of Massachusetts Boston, University of Southampton, and Purdue University are also developing captioning applications using Nuance’s  DragonNaturallySpeaking SDK Client Edition software.