Hosted Transcription Systems are prototype Speech Recognition platforms that automatically transcribe multimedia to create accessible multimedia.
Hosted Transcription Systems perform offline transcription of audio and video files. They convert digital audio and video (such as mp3, MPG, WMV, AVI) into a suitable format for processing. To use these systems, authenticated users visit an online portal, log into their secure accounts, and then upload a media file for automatic transcription.
Traditional transcripts are typically generated by converting a spoken language source (audio, video) into text. This process is typically achieved by listening and manually typing what is heard.
Multimedia Transcripts refer to speech recognition generated text that is synchronized with a spoken language source and possibly other media (slides, images, etc). The Liberated Learning Consortium has been developing alternate multimedia transcript formats that increase access to transcribed lectures.
Based on the format of the recorded lecture and on the learner’s preferences, content can be viewed in a number of predefined layouts or adjusted dynamically. Although the content is automatically synchronized, disaggregated content is also available, which allows learners to self-select the individual learning objects and combinations (text only, text and audio, etc) that suit individual preferences.
To support the Liberated Learning Consortium’s objective to improve recognition accuracy, IBM Research and Nuance have engineered next generation speech recognition engines and customizable platforms that can automatically transcribe recorded media. These flexible systems can be implemented locally or virtually via a cloud computing environment.
IBM Research developed the consortium’s first Hosted Transcription System called IBM HTS. IBM HTS is a speaker independent system; this means no voice training is required. It uses a double pass decoding technique and adjusts to the speaker’s voice to produce more accurate transcription results. IBM HTS was successfully used to create and share accessible media by Saint Mary’s University, Trent University, University of Toronto, University Western Sydney, University of Southampton, UMass Boston, Algonquin College, and the Youth Initiative.
Other Consortium partners have developed Hosted Transcription systems. MIT’s Spoken Language Systems group created the MIT Lecture Browser through MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). The Lecture Browser uses speech recognition to transcribe and index hundreds of MIT lectures and allows users to search for key topics. The University of Sheffield implemented a system called webASR, which provided free access to state-of-the-art speech recognition to a widespread community.
IBM Human Ability and Accessibility Center’s AbilityLab is working with the Consortium to test Media Captioner and Editor . IBM Watson group, led by researchers from IBM China, are working with the Consortium to develop a cloud based system iTranS (IBM Transcription Server), which leverages the Atilla speech engine.
UMass Boston is also developing a new hosted system based on Nuance’s Dragon NaturallySpeaking SDK Server Edition