Top of page
Content Area
  • NTT's team wins 1st place in Audio Captioning task at DCASE 2020 Challenge

NTT's team wins 1st place
in Audio Captioning task at DCASE 2020 Challenge

Yuma Koizumi, with the Media Intelligence Laboratories of the Service Innovation Laboratory Group, and Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, and Kunio Kashino, with the Communication Science Laboratories of the Science and Core Technology Laboratory Group, won 1st place in the Audio Captioning task at the DCASE 2020 Challenge held from March to July this year.

  • Yuma Koizumi(Researcher)
    Yuma Koizumi
  • Daiki Takeuchi
    Daiki Takeuchi
  • Yasunori Ohishi(Senior Research Scientist)
    Yasunori Ohishi
    (Senior Research Scientist)
  • Noboru Harada(Senior Research Scientist, Supervisor)
    Noboru Harada
    (Senior Research Scientist, Supervisor)
  • Kunio Kashino(Senior Distinguished 
    Kunio Kashino
    (Senior Distinguished Researcher)

The DCASE* Challenge is an annual international competition officially recognized by the IEEE Audio and Acoustic Signal Processing Technical Committee, and this year's event was the sixth. "Automated audio captioning" is a new task DACE introduced this year. The challenge is to automatically generate appropriate and accurate text descriptions or explanations for given audio signals of various non-speech sounds. Ten teams from around the world competed in the task.

NTT is one of the earliest research institutes in the world that to work on the verbalization of sounds. To tackle the task, we took full advantage of the algorithms and knowledge accumulated by the above members, and combined various ideas ranging from pre-processing to post-processing and automated meta-parameter tuning.

Automated audio captioning is an emerging technology field, but a method for achieving it has not yet been established. The capability to describe all kinds of sounds with texts could bring many benefits to our lives in the near future. NTT will therefore continue its research to further strengthen the technology.

Footer Area

Copyright © 2020 Nippon Telegraph and Telephone Corporation