Top of page
Content Area

NTT Press Releases

August 10, 2015

NTT Develops High-Precision Scene Search Technology with Frame Accurate Video Identification
- Technology dramatically improves efficiency of video production workflows by eliminating visual comparisons -

Nippon Telegraph and Telephone Corporation (headquartered in Chiyoda-ku, Tokyo, Japan, with President and CEO Hiroo Unoura, hereinafter “NTT”) has developed a high-precision moving picture search technology that can flawlessly identify the precise frames used in an edited video from a large collection of video data. This new technology can reduce manual visual comparisons in the moving picture production workflow and thus dramatically improve the efficiency of cinema and television production process.
NTT started conducting field tests of this technology in July 2013 in collaboration with PIX SYSTEM, LLC (hereinafter “PIX”)1 , whose clients include major Hollywood movie studios. The field tests evaluated the speed and accuracy with which this technology could find edited sequences from daily movie footage to assess its practicality within practical production workflows. NTT will continue to conduct field tests and develop technology for making moving picture production workflows more efficient.

1.Video Production Issues

Filmmaking ordinarily involves shooting far more content than will make the final cut; in addition each scene is often shot many times. This original footage often includes content that is extremely similar, including sequences without very little movement. The post-production process generates even more sequences modified by color adjustments, special effects, graphic overlays, and so on.
Though metadata2 plays an important role in the management of large video collections, it is often insufficient. Video is repeatedly replaced and processed during the editing process, making it extremely complicated to keep track of media source assets. As a result, filmmakers currently must perform manual visual examination to determine which video was the original-and which frames were used. This work entails significant costs in terms of both time and money.

  • 1PIX System, LLC provides content collaboration systems and media asset management services for cinema and television production. PIX clients include major Hollywood studios and production companies in the U.S, Europe and other countries. Further details can be found on the company’s website at 
  • 2“Metadata” refers to information that describes the content of a video. It covers a wide range of values and measurements, including a video’s title, timestamp, location, scene/cut number, data format, codecs, resolution, and frame rate.

2.Overview of the Development Process

NTT has developed a high-precision scene search technology that uses feature data extracted from motion picture footage to quickly and flawlessly identify-down to the frame-which of a large collection of similar shots was used as the source of an edited query video. By allowing tedious visual comparisons to be eliminated from video production workflows, this technology can make it substantially more efficient to search through videos.
NTT has conducted field tests of this technology in collaboration with PIX since July 2013. By demonstrating the technology to, and collecting feedback from, actual film and video production professionals, NTT and PIX evaluated the speed and accuracy of the technology against practical standards for searching through videos. The technology demonstrated benefits in sequential video editing workflows, such as the one shown in Figure 1 (video is recorded on location, editing instructions are given by the director, production work is done offshore, and then the edited video is checked by the director and producer).

Figure 1: Use of video search in a video production workflow during field testsFigure 1: Use of video search in a video production workflow during field tests

3.Technical Points: High-Precision Scene Search

To allow filmmakers to search for similar videos from the same script or production, examining both the originals and post-processed versions, NTT developed an extension to its Robust Media Search (RMS)3 technology that it has been developing and refining for many years. As shown in Figure 2, the technology generates feature data at different sampling levels and has a multistage process for narrowing down the search space, allowing it to speed up searches while conserving memory.
NTT also developed a new method for detecting frame positions regardless of a video’s type or quality. Most existing methods to identify shots by frame, which take the statistical mode of similar frames, haven’t been able to detect edited video created from a combination of similar videos. However, by focusing on the amount by which video is shifted to synchronize frames and determine whether they match, the new method achieved a practical detection accuracy of at least 90%. Furthermore, by setting adaptive threshold values when detecting the beginning and end of a shot, the new method is also able to detect minute changes in position while reducing false positives detected outside of replaced sections.
The technology involves a system that extracts two types of feature data with different densities from video data in advance; these two types of feature data are then used separately as part of a multistage search process for narrowing down the search space. The technology can also be applied to audio data: by comparing feature data extracted from audio signals, it can identify sections containing the same audio data.

Figure 2: Implementation of a high-precision scene searchFigure 2: Implementation of a high-precision scene search

  • 3Robust Media Search (RMS) is a technology developed by NTT. Using audio and video fragments as keys, RMS looks for matches in a large database of audio and video data. One of RMS’s defining characteristics is that it is more robust than ordinary media search technologies. A press release on RMS can be found at the following URL:

4.Future Plans

NTT will continue to improve search technology with support for an increasing variety of video editing and processing techniques while continuing to add functionality to collaborative remote production pipelines using communication networks.
The NTT Data Corporation (headquartered in Koto-ku, Tokyo, with president and CEO Toshio Iwamoto) is planning to conduct trials for businesses using this technology.

Information is current as of the date of issue of the individual press release.
Please be advised that information may be outdated after that point.

NTT Press Releases Index