Speaker Indexing in a Meeting
- Technological fields
- Cutting-edge Technologies
- Keyword
- Speech retrieval
- Voice activity detector
- Minute taking
- Laboratory organization
- NTT Communication Science Laboratories
These days we can record a huge amount of speech data during meetings on small, large-capacity recording media such as IC recorders. However, we do not have technologies make such data conveniently usable. We think this is because it is difficult to find the desired utterance from a large amount of recorded data and because the recording has noise caused by other voices and ambient noise.
Speaker indexing gives "who spoke when" information to recorded speech data so that users can access the speech of interest more easily. It detects "who spoke when" information by estimating speech periods in the recording and the direction of arrival of speech. Our technique requires only a few microphones set up at the center of a table: it does not require as many microphones as there are attendees. However, recording with table-center microphones is still affected by noise and other voices. Thus, to obtain accurate speaker indexing with table-center microphones, we developed a voice activity detector (VAD) that can deal with noisy signals. We also developed a speech enhancement method that suppresses noise and other voices and clearly replays the speaker's utterance. The speaker indexing and enhancement technique enables users to access and replay utterances of the speaker of interest easily, and it is applicable to archiving and retrieval of meetings and computer-aided minute taking.
Our next step will include estimating more detailed information on "who" by using a speaker recognition technique. We will also attempt to extract "who spoke when and what" information by employing large-vocabulary automatic speech recognition. We will continue to seek techniques for facilitating daily communications.
Speaker indexing system

