Hosted Transcription Systems are prototype Speech Recognition platforms that automatically transcribe multimedia to create accessible multimedia transcripts .
Hosted Transcription Systems perform offline transcription of audio and video files. They convert digital audio and video (such as mp3, MPG, WMV, AVI) into a suitable format for processing. To use these systems, authenticated users visit an online portal, log into their secure accounts, and then upload a media file for automatic transcription.
Traditional transcripts are typically generated by converting a spoken language source (audio, video) into text. This process is typically achieved by listening and manually typing what is heard.
Multimedia Transcripts refer to speech recognition generated text that is synchronized with a spoken language source and possibly other media (slides, images, etc). The Liberated Learning Consortium has been developing alternate multimedia transcript formats that increase access to transcribed lectures.
Based on the format of the recorded lecture and on the learner’s preferences, content can be viewed in a number of predefined layouts or adjusted dynamically. Although the content is automatically synchronized, disaggregated content is also available, which allows learners to self-select the individual learning objects and combinations (text only, text and audio, etc) that suit individual preferences.
To support the Liberated Learning Consortium’s objective to improve recognition accuracy, IBM Research and Nuance have engineered next generation speech recognition engines and customizable platforms that can automatically transcribe recorded media. These flexible systems can be implemented locally or virtually via a cloud computing environment.
2009 – 2011 Activities
IBM Research developed the consortium’s first Hosted Transcription System called IBM HTS. IBM HTS is a speaker independent system; this means no voice training is required. It uses a double pass decoding technique and adjusts to the speaker’s voice to produce more accurate transcription results. IBM HTS is currently being used by Trent University, University of Southampton, UMASS Boston, Algonquin College, and the Youth Initiative.
Other Consortium partners have developed Hosted Transcription systems. MIT’s Spoken Language Systems group created the MIT Lecture Browser through MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). The Lecture Browser uses speech recognition to transcribe and index hundreds of MIT lectures and allows users to search for key topics. The University of Sheffield implemented a system called webASR. The goal of webASR is to provide free access to state-of-the-art speech recognition to a widespread community. IBM Human Ability and Accessibility Center’s AbilityLab is working with the Consortium to test another example of a Hosted Transcription System called Media Captioner and Editor . It uses speech recognition to create captions automatically.
The Consortium is implementing two new Hosted Transcription platforms. IBM iTrans, developed by IBM Research, is a newer version of IBM HTS and operates in the same manner but with several key technical upgrades.
The Consortium is also developing a new system based on Nuance’s Dragon Software Development Kit. Collaborators include Nuance Communications, Saint Mary’s University, Purdue University, University of Southampton, Trent University, and Macquarie University.