July 10, 2019
Nippon Telegraph and Telephone Corporation
Nippon Telegraph and Telephone Corporation (NTT, Head office: Chiyoda-ku Tokyo; President & CEO: Jun Sawada) discovered that a deep neural network (DNN) *1 trained for sound recognition acquires sound representation similar to that in a mammalian brain.
Years of neurophysiological studies have examined how neurons represent sound information in the brains of a variety of animal species. This study answers why a brain acquires such neural properties observed in these studies. We trained a DNN for natural sound recognition, and examined the properties of the units in the DNN by applying a method that is comparable to conventional techniques that neurophysiologists use to study animal neurons in experiments. As a result, we discovered that properties similar to the brain emerge in the DNN.
Our discovery suggests that a brain has acquired neural representation beneficial for the process of sound recognition in the course of its evolution. This study will further push the cooperation forward of the brain research and artificial intelligence. This has been published in Journal of Neuroscience. The publication was on July 10th.
The brains of animals (mammals), including humans recognize a sound reaching the ear by processing and analyzing sound features at multiple stages from the brain stem to the cerebral cortex. Among sound features, amplitude modulation (AM) *2, a slow fluctuation of the amplitude waveform, is an important cue for sound recognition. Years of neurophysiological studies have revealed “how” neurons represent AM in multiple brain regions in the auditory nervous system (ANS) *3. However, the answer to “why” neurons represent AM in such a way has yet to be answered because it is difficult, in principle, to examine experimentally the relationship between neural properties and the process of evolution.
Simulation with a computational model can complement experimental approaches. However, conventional approaches with computational models can reproduce detailed properties of neural circuits for processing specific sound features only; they cannot explain how those properties are related to the whole process of natural sound recognition, which is an important function of the ANS.
Recently, artificial neural networks (ANNs) have attained the ability to recognize natural and complex sounds. Our study exploited this ability to explore the answer to the “why” question (Figure 1). A deep neural network (DNN), a type of ANN, has a structure similar to the ANS in the sense that it consists of multiple cascading layers each consisting of numerous units. However, other than that cascading structure, our DNN is not intentionally designed to resemble any specific neural circuit in the ANS. (Connections between the cascading layers are initially random and modified by training for our tasks.)
If a DNN could acquire properties similar to those in animal brains as a result of training for animal-level natural sound recognition, this would suggest the possibility that such properties in animal brains are a consequence of their adapting for sound recognition in the course of evolution.
In the present study, we trained a DNN for natural sound classification and analyzed it with methods used to examine animal brains in neurophysiological experiments. Specifically, we fed sounds with various AM rates (i.e., speed of modulation cycles) to the trained DNN and examined outputs from the units in it (Figure 1). We found that properties comparable to those reported in previous studies on the animal ANS emerged in the DNN: Many units selectively responded to specific AM rates and the response characteristics changed systematically along the processing stages. We also found that properties similar to the brain gradually developed in the DNN in the course of the training and that DNNs with higher sound recognition accuracy exhibited higher similarity to the brain. In addition, we did not observe similarity to the brain in an ANN not trained for natural sound recognition.
These results suggest that animal brains have acquired the current form of neural representation of AM through adaptation for sound recognition in the course of evolution.
There are various types of features for sound recognition other than AM. In the future, we would like to examine them and compare the sound representations acquired by trained DNNs with those in the brain. This will broaden our knowledge of the evolution of animal brains.
Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition Takuya Koumura, Hiroki Terashima,, Shigeto Furukawa Journal of Neuroscience 10 July 2019, 39 (28) 5517-5533; DOI: 10.1523/JNEUROSCI.2914-18.2019
AM is an important cue for animals, including humans, to recognize sound (Figure 2), which is why a large number of physiological studies have investigated AM representation in neurons.
It is well known that most neurons respond by neural spikes (Figure 1, left) synchronously with AM waveforms of sound stimuli. Since the temporal patterns of spike firing represent AM, this coding scheme is called “temporal coding”. There is an upper limit in the AM rate to which spikes can synchronize, which is called the upper cutoff frequency (UCF). Furthermore, some neurons strongly synchronize only to a specific AM rate. This property is called AM tuning, and the AM rate at the maximum synchrony is called the best modulation frequency (BMF). Generally, UCFs and BMFs tend to decrease along the axis from the peripheral brain regions to the central brain regions.
Regardless of neurons’ synchrony to AM waveforms, the average response strength (firing rate) in some neurons changes depending on the AM rate. Such representation of AM with the firing rate is called “rate coding”. Rate coding is a more abstract coding scheme than temporal coding in the sense that it does not transmit the shapes of AM waveforms directly. As seen in temporal coding, rate-coding neurons have their AM tuning and upper limits of the AM rate to which they fire. Rate-coding neurons exist only in the brain regions higher than a certain processing stage.
By meta-analyzing previous studies on neural activities, we visualized these properties in neurons, which enabled us to quantitatively compare these properties with AM representations in DNNs (Figure 3).
Sound recognition is conducted in the ANS. The ANS of mammals, including humans, contains cascaded brain regions consisting of numerous neurons. In this sense, the ANS is analogous to a DNN because a DNN recognizes sounds with cascaded layers consisting of numerous units. The present study hypothesized that the ANS could be modeled by a DNN.
To maximize recognition accuracy, DNNs used for practical tasks such as sound recognition often take as inputs features derived by preprocessing sound waveforms. In addition, in scientific studies with auditory models, standard approaches use preprocessed signals that simulate outputs of the inner ear as inputs to the model. However, the hypothesis pre-determined by researchers dictates the choice of specific methods and the details of such preprocesses, which can affect the interpretation of the results.
The present study employed a DNN that takes as inputs raw waveforms without preprocessing and conducts generic sound classification tasks. In this way, we could simulate the all the stages of the ANS from the periphery to the central region in a unified model, with as little pre-determined hypothesis as possible. Specifically, the present study used a dilated convolutional network in which the temporal resolution of the signals is not degraded as they are processed in the layers.
We trained the DNN for natural sound classification. Natural sound classification is a suitable task for examining the general relationship between neural properties and animal evolution since it can be considered to be important for the survival for any animal species. After training, the ANN could classify natural sounds with good accuracy.
To compare the AM representation in the trained DNN directly with the results of previous neurophysiological studies, we analyzed AM representations in the DNN with neurophysiological methods. Sounds with sinusoidal AM were fed to the DNN, and the response of each unit was measured, from which synchrony to the AM and the average activity were calculated (corresponding to temporal and rate coding, respectively). As a result, we found that properties similar to the ANS described above emerged in the DNN. These included the existence of units with AM tuning, the emergence of rate-coding units in the higher layers, and systematic changes in the BMFs and UCFs along the axis from the input to the output layers (Figure 4). Next, we quantified the similarity between regions in the ANS and layers in the DNN, and found that the entire cascade of the DNN layers were similar to the entire cascade of ANS from the periphery to the central region (Figure 5). This finding of sound representation in the entire cascade in the DNN in relation to that in the entire cascade of the ANS is the first of its kind.
We also found that DNNs with higher recognition accuracy tended to be more similar to the brain. These results suggest that properties in the ANS are effective for sound recognition.
From the perspective of machine learning, this study also implies the effectiveness of employing neurophysiological methods for analyzing DNNs. Although DNNs have the ability to recognize signals (such as sounds and images) with high accuracy, what representations they utilize to recognize them is still unclear. A variety of methods for analyzing data representation in ANNs have been proposed. Our study presents the possibility that methods that have been applied to examine sound and image representation in brains are applicable to DNNs.
Science and Core Technology Laboratory Group, Public Relations
Information is current as of the date of issue of the individual press release.
Please be advised that information may be outdated after that point.