The NTT R&D Forum introduces the product of research at NTT Research and Development Centers. The R&D Forum is generally held at the beginning of the year in February. However, this year the NTT R&D Forum (Autumn) was held at the NTT Musashino Research and Development Center (Musashino City, Tokyo) under the concept, “Transforming Your Digital Visions into Reality” almost three months earlier this year over two day from Thursday, November 29 to Friday, November 30, 2019.
In addition to the keynote lectures by Jun Sawada, President and CEO, and Katsuhiko Kawazoe, Head of Research and Development Planning, the Forum held three special sessions.
The exhibits presented research and development around themes that included media and user interfaces, AI/IoT, security, networking and basic research as well as initiatives toward a Smart World that utilizes technologies focused on AI and IoT.
The report summarizes some of the research which garnered particular interest during the exhibition.
Photo1 NTT R&D Forum 2018 (Autumn) venue packed with guests
These exhibits introduce accessibility information collection technology (MaPiece) and guide message generation technology to realize the concept of diversity navigation, which supports the safety and convenience for mobility of people from the elderly to persons with disabilities, parents with children, and tourists from foreign countries.
MaPiece is made up of three technological approaches—MaPiece for surveying, MaPiece for posting, and MaPiece for sensing. MaPiece for surveying was developed first to let volunteers survey surroundings without any specialized knowledge. This service is already in use as a collection tool for barrier information. New to the platform this year are MaPiece for posting, which collects barrier-free information from postings about accessibility information discovered in daily life, and MaPiece for sensing, which collects road surface information from smartphones with the dedicated app installed.
In addition, guidance message generation technology helps visually impaired people move safely and confidently with easy-to-understand guide messages generated automatically using the road surface information that is collected.
First, as a demonstration of MaPiece for sensing, the exhibit showed the collection of road surface information, such as stairs along a road, by passing the barrier in a stroller on a simulated urban roadway. Analysis of the sensor data that was collected and sent to a server can estimate the state of the road surface.
As a demonstration of the guide message generation technology, the exhibit also showed a visually impaired person walk without tripping on stairs and steps without a warning through braille blocks thanks to voice navigation based on the road surface information that was collected.
In the future, this research will press toward the realization of diversity navigation.
Photo2 Pushing a stroller with a smartphone attached (left)/Collecting road surface information such as stairs and braille blocks (right)
Photo3 A visually impaired person moving without tripping by using voice navigation
Kirari! ultra-realistic communication technology aims to transmit and reconstruct the entirety of sporting and live performance spaces far away in real time over a network.
In this exhibition, each elemental technology supporting Kirari! was introduced. First, “Kirari! for Arena” was the most eye-catching technology for the visitors. Kirari! for Arena is a technology that extracts objects from backgrounds in real time without a green screen by using real-time image extraction technology, transmits simultaneously with positional information using Advanced MMT, generates pseudo 3D images corresponding to the perspective of the destination, and creates a sporting arena anywhere in the world.
Existing Kirari! for Arena had not supported live broadcasts, but the development of a system that tracks the position of objects in real time using distance sensors and integrates video data while simultaneously transmitting the information via Advanced MMT has made live broadcasts possible. This transmission format is under process of standardization in ITU-T SG16 Immersive Live Experience with the aim of the recommendation by 2020.
On the opposite side of Kirari! for Arana, “surround video stitching and synchronous transmission technology” and “wave field synthesis technology” were exhibited. Surround video stitching and synchronous transmission technology synthesize wide-angle video from images captured by multiple 4K cameras and transmit these images in real-time. Furthermore, their new technology adopts a parallel processing using GPU and it enables to decrease processing time and reducing the number of servers. The exhibit also conducted a demonstration of wave field synthesis sound technology that reflects an audio signal emitted from spherical speakers as a beam off of the ceiling to use the ceiling as a virtual speaker. This technology enables horizontal and vertical surround sound, even at venues where loud speakers cannot be placed on the ceiling.
In addition, the booth also exhibited a concept of digital content supply chain configuration using blockchains and a demonstration of remote reception desk that uses real-time image extraction technology.
Photo4 Exhibit demonstrations of Kirari! for Arena; Displaying a juggler (left) in the pseud 3D display in the next room with depth perspective (right)
Photo5 Using the ceiling as a virtual speaker by reflecting an audio signal from right and left spherical speakers
This exhibit present technology that provides OMOTENASHI (hospitality) services to users by simply “holding” the CUzo Card in front of real scenery to display a wide range of information on the transparent display. Applications in development as OMOTENASHI services include hold for guidance, translation, informational distribution and navigation services.
The hold for guidance services display information by holding a transparent display over objects to obtain information. The exhibit provided a demonstration to display available seating at a restaurant and lounge by holding the transparent display up to an informational board similar to those at an airport.
People can converse with one another face-to-face in different languages by talking while holding the transparent display up between the people speaking for translation.
In addition, users can look through the device at video for informational broadcasts to display Japanese subtitles for English narration as well as use sensors for navigation to receive guidance to the booth inside the venue.
Both of these services simplify the device through technology that distributes resources via cloud or edge computing necessary for data processing and content generation.
Usage scenarios include tourist information for visitors from foreign countries, information about articles displayed at art galleries and museums, and operational support for airport and train station staff.
Photo6 Displaying seating availability of a café and bar by holding the display over a restaurant icon on an information sign
Photo7 Displaying a translated sentence of words spoken to the display
This exhibit presented a projection mapping technique, called Ukuzo, to give visual depth impressions to two-dimensional real objects such as printed images, pictures and even hand-written text by projecting a shadow pattern on the objects.
To give the depth impression, the system first recognizes the shape of an object. Based on the recognize shape, the system can immediately create a shadow pattern that can give depth impressions to the object when projected. Ukuzo is able to adjust the level of depth impressions by changing the distance between an object of interest and the projected shadow pattern.
The application scenarios include advertisements, art, and playful entertainment for children.
Photo8 Ukuzo system
Photo9 Giving depth impressions to hand-written letters immediately without advance preparation
The main showcase for Smart World was divided into six categories— “Smart City”, “Smart Mobility”, “Smart Agriculture”, “Smart Shipping”, “Smart Factory”, and “Smart Healthcare” —to present exhibits related to the realization of a Smart World together with research and development partners using primarily AI/IoT.
Crowds caused by a large number of spectators moving place-to-place at the same time before and after popular events often cause people to need a lot of time to reach the train station and other destinations. This congestion not only costs people a lot of time but also comes with dangers, such as people falling over like dominoes due to the overcrowding. Formulating plans to control the flow of people is vital to avoid these types of situations.
The technology in this exhibit proposes people-flow control plans that combine various guidances. Defining coefficients can customize optimization according to the objective, such as increasing the congestion coefficient to prioritize greater safety or increasing the arrival coefficient to prioritize an earlier arrival.
Usually, due to the enormous number of potential guidance combinations when considering these people-flow control plans, optimizing the best result from all of the potential combinations requires a vast amount of time. However, machine learning can obtain close to the most optimal guidance in a limited amount of time.
In the future, this technology aims to expand to a wide range of fields from security, transportation and tourism to urban planning and distribution.
Photo10 Set the evaluation measures according to the objective, such as guidance emphasizing safety (left) or guidance emphasizing efficiency (right)
The hitoe® functional material has been demonstrated through medical, sports and various other example applications even at the R&D forum up until now. This exhibit introduced an example for use of hitoe® in an activity monitoring system for rehabilitation.
The combination of posture, walking and other activity estimation logic with vital data collected by hitoe® facilitates the visualization of continuous heart rate and daily activity data by acquiring continuous heart rate and activity data over the long term for patients in rehabilitation.
The use of this technology for patients in rehabilitation has expanded the apparel applications of hitoe® from conventional synthetic fibers to blended cotton fabrics. The blended fabrics provide greater sweat absorption than conventional synthetic fabrics to stay dryer even while sweating during rehabilitation.
hitoe® originally transferred data from a transmitter to smartphones. However, this system has been improved to transfer data via IoT gateways installed in buildings. This innovation enhances the comfort of patients by eliminating the need to always carry a smartphone.
The data obtained from this monitoring system is collected and stored on a server via a network for doctors, physical therapists, nurses and other medical professionals to view in order to offer the appropriate feedback to patients.
In the future, hitoe® aims to reduce the burden on patients and medical professionals in not only rehabilitation but also home care assistance for remote rehabilitation.
Photo11 Potential use of hitoe® for patients in rehabilitation
Photo12 Vital data collected via hitoe® (left) and feedback to patients based on the activity estimation logic (right)
The AI booths were divided into three categories— “AI Supporting People”, “AI Supporting Society”, and “AI Infrastructure Technologies” —under the main AI theme to present AI technologies from the NTT Group that realize new value creation.
The IoT booths were divided into three categories— “Sense, Connect & Drive”, “Data & Software Logistics”, and “Analytics & Prediction” —under the main IoT theme to present IoT technologies that accelerate the digital transformation of customers as value partners.
Angle-free Object Information Retrieval recognizes 3D objects in high accuracy even from only a small number of reference images. In this exhibition, Angle-free Rigid and Non-rigid Object Information Retrieval was presented. The new functionality to recognize an object with an unknown deformation as the same object as the non-deformed object image in the database is added.
This technology uses Rigid and Non-Rigid Object Image Matching that accurately specifies the correct correspondence of the image features by applying the geometric constraint to multiple partial regions instead of applying it to the entire object. This innovation allows the accurate recognition of deformed objects. Achieving the same functionality in conventional commercial technology is time-consuming due to the need for preparation of many database images capturing multiple deformation patterns.
The potential applications of this technology include support for product picking and inventory management in warehouses in addition to unattended registers in convenience stores.
Photo13 Comparison of Angle-free Object Informational Retrieval and conventional technologies
Photo14 Recognizing products with soft packaging such as snack foods and jelly drinks
This exhibit presented totto, an android robot of Tetsuko Kuroyanagi. The totto production committee developed totto as an android robot in 2017 and equipped it with a spoken dialogue system in 2018 to engage in autonomous conversation. The provided an even more natural conversation style to totto.
“Where are you from?” The conversation begins with an interview style unique to Tetsuko Kuroyanagi before naturally conversing with respondents on the base of their answers while chiming in with responses such as, “Oh really?”
The interaction is so smooth that it has no synthetic feel. However, equipping an android robot with a new spoken dialogue system in fact leveraged a vast range of technology.
The clear voice and characteristics which provide the experience of talking with the real-life Tetsuko Kuroyanagi were achieved by using speech synthesis techniques realized through the study of old broadcast content via a neural network rather than recording Tetsuko Kuroyanagi speaking.
We leveraged a wide variety of technologies for conversations, including mannerisms and other techniques to interact with people generated as patterns according to language comprehension, techniques to generate questions and expressions to agree with remarks of the person the android robot is speaking with based on learning through a neural network, in addition to techniques to select responses with high accuracy through analysis results of conversations extracted from broadcast content via a neural network.
Furthermore, the use of techniques to automatically generate physical movements tailored to the utterance of the speaker, such as head movement, gaze, facial expression hand gestures and posture, as well as techniques to stop and wait to speak when a dialogue suddenly starts or is interrupted achieved even smoother verbal interactions.
In the future, totto aims for potential applications that include services to converse with virtual Youtubers or existing characters that use the features adopted by totto as the dialogue engine.
Photo15 totto having a natural conversation while incorporating both body and hand gestures
This exhibit demonstrated the SpeakerBeam technology for extracting the voice of a target speaker from the voices of multiple people interacting in a conversation.
SpeakerBeam needs to register in advance audio data of the voice of the target speaker to be able to extract the voice of that speaker. However, it does not require many hours of audio data of the target speaker. Only about ten seconds are sufficient. We use a deep learning based AI system trained on a broad range of Japanese audio data to isolate the characteristics of the voice of the target speaker from the registered audio data then extract the target speaker based on these characteristics and regardless of the position of the speakers. Moreover, by integrating microphone array technologies, we can achieve high quality audio.
In this exhibit, we demonstrated extraction of the voice of a researcher from a recording of the researcher spoking at the same time as several other people. The voice of the researcher was extremely hard to hear over the other voices before the extraction. However, the voice of the researcher after extraction was almost completely isolated, resulting in extremely clear audio.
Application scenarios include robot or home assistants that react only to the target speaker as well as hearing aids or voice recorders that pick-up only the voice of a target speaker.
Photo16 SpeakerBeam framework
Photo17 Demonstration showing only the voice of the researcher extracted from a recording of multiple speakers
When a Japanese person learning English sets the AI assistant built into smartphones to English, the AI assistant is often unable to recognize the English as spoken.
The technology in this exhibit displays text more accurately representing spoken English using AI which has learned the traits of English spoken by Japanese people. Using this technology for language learning, such as speaking practice and tests, heightens motivation to learn a language by encouraging people who are just starting to learn English to speak.
The demonstration introduced English spoken by someone intentionally leaving off the “s/es” of the third person singular as well as “a/the” as articles as examples of common mistakes new English learners make when speaking English. The AI displayed the English in text accurately as spoken with corrections indicated in red.
Photo18 Application screen for language learning using this technology
The networking booths were divided into three categories— “Achieving Flexible and High-speed Networks”, “Network with Rapid Service Delivery and Recovery”, and “Network and IT Solutions” —under the main network theme to present more network technologies that realize economization and flexibility.
Laser processing refers to processing of an object through the thermal energy produced when exposed to a laser beam. This type of processing is widely used in automotive, aircraft and other manufacturing for applications that include welding, cutting, and drilling.
This laser processing system provides a single-mode laser beam and a multi-mode laser beam. The single-mode laser beam provides a narrow output beam for highly precise processing, but the transmission distance is limited to several meters. On the other hand, the multi-mode laser beam provides long-distance transition up to several hundred meters, but the processing precision is inferior due to a wide output beam.
This exhibit presented a photonic crystal fiber that can transmit a kW-class single-mode laser beam from several tens to hundreds of meters, KTN crystal that flexibly controls the [Direction] of the output beam, and a computer holography that flexibly controls the [Shape] of the output beam.
Leveraging these technologies can dramatically increase operational efficiency because the system is not influenced as much by limitations, such as the place or size of object for processing.
Photo19 Image of the new laser processing system
Photo20 Display of KTN crystal (left) and photonic crystal fiber preform (right)
Flexible Access System Architecture (FASA) is a new architecture for an access system able to modularize and softwarize access system functions. The advantages of FASA are the ability to configure an access system based on general-purpose hardware, and provide flexible, low-cost services according to customer requirements.
In this exhibit, it is presented the box-type OLT and module-type OLT as two unique types of options based on FASA. An Optical Line Terminal (OLT) is an endpoint device at the central office for FTTH services, which currently employ service-specific equipment.
The box-type OLT is assumed to be used in environments such as those at the central office of carriers. In addition to conventional FTTH services, applications such as mobile base station accommodation is also expected.
The module-type OLT provides compact modules implemented only the the hardware-oriented functions among the conventional OLT functions. The access system well suits LANs in facilities such as factories, campuses and office buildings thanks to the cooperation of a general-purpose server implemented the softwarized OLT functions.
Both of these OLT options implemented a common API that enables to software module replacement of the bandwidth allocation function that manages system performance. By developing software modules that configure this common API, FASA can support various service applications.
Photo21 Display of box-type OLT based on FASA
Research and development is advancing the fifth generation of cellular mobile communications (5G), aiming to launch services in 2020. 5G is expected to dramatically improve network performance and bring about attractive new services. However, the technology in this exhibition aims to go beyond a 5G network.
Presently, testing has proven that an ultra-high-speed IC operating on an unused terahertz frequency band can provide wireless transmission of 100 Gbit/s with one wave. Imaging how fast this transmission actually is difficult, but the extraordinary speed can download an entire DVD worth of data in less than one second.
Even though this ultra-high-speed IC is still in the testing phase, the transmission capacity is ten times that of a 5G network. Research and development continues to bring out the full potential of device technology to come after the 5G network.
Photo22 Testing 100 Gbit/s wireless transmission on the 300-GHz band
Photo23 Transmitter (left) and receiver module (right) running on the 300-GHz band
The security booths presented advanced security technologies that contributed to enhancing measures against cyber attacks and early realization of a society leveraging data under the main security theme.
The adoption of Anonymously Processed Information has been raised as one important revision to the Amended Act on the Protection of Personal Information put into full force in 2017. The Act on the Protection of Personal Information before revision required individuals agree to the use of personal information outside the purpose of use or the provision of personal information to a third party. However, the Amended Act on the Protection of Personal Information allows the use of personal information outside of the purpose of use or provision of personal information to a third party as long as the personal information has been anonymized. This revision allows businesses to take advantage of personal information.
This exhibition introduced personal data anonymization software that processes personal information according to the criteria for creating Anonymously Processed Information. The processing implements anonymization that satisfies k-anonymity, which is a major indicator of anonymity.
In addition to standard methods, such as deletion to exclude data of rare individuals and generalization to make the values of items more common, the personal data anonymization software specifically categorizes use according to the type of data and purpose of analysis, such as the Pk-anonymity uniquely developed by NTT. Furthermore, the anonymity and utility of anonymized data can be evaluated based on multiple indicators.
Application scenarios include anonymized medical data unitization for drug discovery or healthcare AI research as well as anonymized purchasing data sharing for marketing and product development.
Photo24 Screenshots of Personal data anonymization software (Left: for anonymization; Right: for evaluation)
Recently, the growing number of people doing work remotely has made remotely accessing a terminal at a company using a Virtual Private Network (VPN) from home or another remote place common. Encryption technology secures remote access for security on a VPN.
However, the VPN server has to use a large number of server resources if a number of VPN connection requests are received all at once, such as at the start of the workday, because of the exchange of encryption keys with an enormous number of users. The extra server load reduces the quality of the entire VPN service from slowing the VIP connection of users already working remotely to requiring more time to establish a VPN connection for new users logging on to work remotely.
Multicast Key Distribution technology handles user authentication on the VPN server and safely outsources the key exchange process to users who have already received an encryption key. This outsourcing reduces the load on the VPN server and securely processes a larger number of connection requests while maintaining the stability of the VPN.
This exhibit showed a comparison of the CPU loads of VPN servers to show almost no load on the CPU when using Multicast Key Distribution technology while the conventional SSL-VPN connection reached 100% each time. The status comparison of SSL-VPN connections when 50 users access the network simultaneously demonstrated conventional technology drive up the CPU load on the VPN server and time out close to half of the users, which resulted in connection failure. On the other hand, Multicast Key Distribution technology had already succeeded in establishing the connections and was able to authenticate users to establish stable VPN connections.
Applications scenarios include connection processing when users try to access popular video content all at once in addition to corporate VPN connections.
Photo25 Lower CPU load on the VPN server for SSL-VPN connections via Multicast Key Distribution (red line)
Photo26 Close to half of the failed connections for conventional SSL-VPN connections (left screen)
Basic research booths presented the latest product of basic research into innovations to propel society into the future under the main basic research theme.
Today, we are surrounded by all different kinds of devices. In the future, the presence of these devices could become an eyesore if the number continue to grow. This research presented a transparent battery as part of the goal to develop devices that adapt to the surroundings.
A transparent battery is achieved by selecting a material that easily suppresses light absorption for the electrode in addition to fabricating an electrode so that the structure easily suppresses the absorption and reflection of light. As a result, the light transmittance of the battery is 23%, which is comparable to the transmittance of ordinary sunglasses.
The performance of the battery has realized an average battery voltage of 1.7 V and a discharge capacity of 0.03 mAh at a current density of 0.01 mA/cm2. A battery size of approximately 4.5 m2 would have capacity equivalent to a CR1025 coin battery. The transparent battery also operates as a rechargeable secondary battery which can light an LED even after being discharged and charged 100 times.
Moreover, this battery realizes flexibility in addition to transparency by forming the electrodes on conductive films and gelating electrolytes. The transparent battery is expected to have a wide range of potential applications.
Applications include wearable devices, informational displays and integration into construction materials such as the windows of buildings. The balance between battery performance and transparency according to application is a challenge to overcome in future research.
Photo27 Flexible transparent film battery (back left) and transparent glass plate battery (front right)
Here present the Coherent Ising Machine ‘LASOLV’ to solve difficult problems using laser light. Its middleware and applications to run the hardware is also exhibited.
LASOLV is different to digital computers which is made from semiconductors and electronic circuits and solves problems with algorithms. LASOLV employs the physical phenomenon to solve problems. Specifically, it is stable states of light pulses circulating around the loop of the optical fiber. The phase (zero or p) of the optical pulse is regarded as a spin equivalent to the N or S pole of the magnet.
LASOLV is good at solving combinatorial optimization problems. They are matter of finding the best option from a very large number of options. Combinatorial optimization problems include various problems such as max-cut problems and graph coloring problems. These are difficult problems to the current computer because they require a lot of calculation time to solve.
The previous exhibit demonstrated solving max-cut problems. This time shows a live demonstration to solve an example of graph coloring problems for painting neighboring prefectures of Japan with different colors. LASOLV actually calculated in about five milliseconds to find the answer, its demonstration show with an animation in about ten seconds.
This exhibit also presents the ability of environment accessible for use by not only researchers but also programmers. Also increasing variations of problems that LASOLV can solve by providing libraries and SDK are shown. In the future, the research and development will further enhance the integration between the hardware and software to bring out the full potential of LASOLV and facilitate application to various combinatorial optimization problems of society and the industrial world.
Photo28 Image illustrating the operation of the Coherent Ising Machine
Photo29 Solving a graph coloring problem using a map of Japan (Left: Before coloring; Right: After coloring)