The Consortium is working to develop speech recognition based “captioning” systems that automatically convert spontaneous speech into accurate, readable text that can be displayed as captions to improve classroom accessibility.

Using speech recognition to caption live speech is a complex challenge and differs significantly from tasks such as dictation.   Spontaneous speech that occurs in a lecture is acoustically, linguistically, and structurally different than read speech or speech used to create written documents.

The Consortium’s blueprint for a classroom speech recognition captioning interface includes the following guiding principles:

  1. Minimal  training / enrollment requirements to minimize effect on instructors
  2. As a spoken lecture is digitized, text needs to be simultaneously displayed with minimal lag.
  3. Displayed text needed to be readable
  4. Speech recognition generated text needs to be accurate
  5. Speech recognition generated text needs to be synchronized with source audio in an a format open to students
  6. Editing tools are needed to allow third party editing of recognition errors

Prototypes
Based on these core requirements, prototype systems have been developed for classroom use, including IBM ViaScribe and Caption Editing System.

To capitalize on core technology advances, the Consortium is developing a new speech recognition classroom interface that can leverage multiple speech recognition engines.  The prototype is available for testing in April 2011 and will be open for ongoing development.