Speech Media Laboratory
 Members | Project | Publications
We are studying speech and natural language processing to automate metadata indexing and annotation of digital archives, which are vital for efficient access to multi-media digital content. Moreover, we are exploring methods to automatically generate thumbnails of audio-visual streaming and to interactively search large archives through dialogue with users.

  • Automatic Speech Recognition and Indexing
       overview for general people (PDF)    overview for speech researchers (PDF)
       introduction for graduate school applicants (PDF)
    • Large Vocabulary Continuous Speech Recognition Platform
    • ...Julius site
      We are developing an open-source speech recognition software named Julius, which has become a widely-used platform for speech research and application development.

    • Spontaneous Speech Recognition

    • We are studying the next-generation speech recognition that can automatically transcribe real-world spontaneous speech such as lectures and meetings. Specifically, we investigate modeling of acoustic and pronunciation variations and adaptation methods of language model.

    • Indexing and Annotation of Lectures & Meetings

    • We are also studying content-based indexing and annotation for speech archives. They involves transformation of the transcript into document-style, indexing of key sentences and annotation of discourse tags as well as speaker indexing.

    • Robust Speech Processing

    • We are also conducting research on robust speech processing that is necessary for speech recognition systems to be deployed in the real world. Topics include voice activity detection (VAD), denoising and dereverberation.

  • Spoken Language Understanding and Dialogue
    • Speech Understanding

    • For effective human-machine speech interface, robust approach of speech understanding and domain handling is necessary. We are studying an approach of concept-level key-phrase detection and verification as well as the in-domain classification and verification.

    • Spoken Dialogue Interface for Document Information Retrieval

    • The current information retrieval systems such as Web search engines require users to specify keywords, but do not ask questions to narrow down to the intended items. We are studying interactive systems that enable users efficiently search large archives through natural spoken dialogue.

  • CALL (Computer Assisted Language Learning)

  • We are involved in research and development of the next-generation CALL system that can automatically check pronunciation of foreign language learners and serve as a virtual language teacher for simulated conversation practice.
    To Homepage