1. Background |
 |
In parallel with the evolution of broadband networks and digital
audio equipments, information rates for delivery and storage have risen rapidly
owing to the demands for high- quality audio signals (high sampling rates, high
word resolution, and multi-channel). NTT Communication Science Labs recognized
the importance of lossless compression technology for audio signals and its standardization,
considering interoperability, long-term maintenance, and clear IPR status. The
Laboratories took the initiative in promoting this technology as the standard
in the ISO/IEC*3 MPEG group. |
2. Progress for international standardization |
 |
For this standardization work, NTT initiated discussions on its
need and requirements and prepared the technical call for the technologies. In
line with the normal standardization process, a number of improvement and integration
works were carried out on top of the initial reference model. Partners in this
standardization work included the Technical University of Berlin (Germany), RealNetwoks
Corp. (USA), and I2R (Singapore)
After the specification had been tentatively defined, it was voted on twice by
23 national bodies. The last ballot closed in last week, and it has been disclosed
that the standard has been affirmed. This means specification of the lossless
coding has now been officially established as [14496-3 3rd ED AMD 2 (ALS: Audio
Lossless)*4].
It is expected that this standard will be used in common tools for various applications,
that it will continued to be maintained so that compressed files can be perfectly
decoded even after 100 years. The MPEG group will continue working on the reference
software and conformance testing. It is also expected a consortium of essential
patent holders will be organized for the collection and delivery of patent royalties. |
3. Technical merits |
| |
| [Main Points] |
| - |
Assured perfect reconstruction even after the compression |
| - |
State-of- the-art compression performance |
| - |
Significant reduction of transmission and storage cost with minor decoding
time |
|
|
|
 |
It is known that we have already used some of standard audio
coding schemes such as MP3 and AAC*5 or one for minidisc. These are all perceptual
coding that offer a high compression ratio at the penalty of minor waveform distortion
at the decoder. These approaches carefully control the quantization distortion
based on the characteristics of human hearing. The waveform is different from
the original, although perceptually very close to it.
In contrast to perceptual coding, lossless coding assures perfect reconstruction
of the waveform without a single bit of difference. This is very important for
applications such as waveform editing and archiving high-quality audio signals.
At the cost of perfect reconstruction, the compression ratio is limited and the
compressed file size varies from 15 to 70 % of the original depending on the
statistical properties of the original waveform.
The compression performance, however, outperforms ZIP*6. Figure
1 compares the
compression performance of MPEG-4ALS with other available compression tools for
audio signals. The vertical axis denotes the compression ratio (the file size
divided by the original size: the smaller, the less cost), and the horizontal
axis shows the decoding time (the faster, the more convenient). The standardized
specification offers a wide range of flexibility in selecting the operation mode
at the encoder. One can select a very fast mode with lower performance or very
high compression mode at the cost of slow encoding and decoding. The proprietary
decoder can improve the speed. We can see that the standardized specification
provides the state-of-the-art technology. |
| |
This MPEG-4ALS accepts variety of input formats:
| - |
Sampling rates of up to 192 kHz (44.1 kHz for CD) |
| - |
Various integer PCM formats up to 32 bit per sample (16 bit for CD) |
| - |
32-bit floating point data in the IEEE754 format (integer for CD) |
| - |
Up to 65536 channels (2 channels for CD) |
|
|
|
| |
It can be used for almost all applications. Decoding is generally very fast
and at least 10 times faster than the playback time of the music. It is obvious
that the file compression can reduce the size of archive files. It is also useful
for downloading compressed files, since download time can be significantly reduced
and the decoding time is much smaller than the playback or download time. |
| |
The specification features a number of technologies for reducing the rate. In
particular, NTT contributed to the development of the following elementary tools.
| - |
Time domain linear prediction based on PARCOR coefficients. |
| - |
Multi-channel coding (collaborative work with NTT and the University of Tokyo) |
| - |
Long-term prediction (collaborative work with NTT and the University of Tokyo) |
| - |
Common factor coding and masked compression for floating-point data |
| - |
Progressive order prediction for random accessibility. |
|
|
|
| |
In parallel with its standardization activities, NTT labs have developed
the proprietary technologies for efficient algorithms and efficient implementation
while maintaining compliancy to the standard. |
4. Future task |
 |
NTT Communication Science Labs will continue to support the standardization
of the conformance and reference software and the enhancement of the encoder
performance.
In parallel, NTT Communications Corp. will design and provide integrated delivery
or archiving systems by making use of practical software compliant to this standard.
In addition, NTT group companies will produced with collaborative work with partners
or with licensing for various applications, including professional audio editing
tools, portable music players and editing or archiving medical or environmental
data. |
<Terminology> |
| 1. MPEG |
 |
Moving Picture Expert Group:standardization group in ISO/IEC
JTC1/SC29/WG11. This group has established number of important compression schemes
for video, and audio since 1978. |
2. PARCOR coefficient |
 |
Partial Auto Correlation:A set of predictive parameters
invented by NTT Musashino Lab in 1972. This set has property of stability and
easy quantization, and therefore widely used for speech coding and synthesis,
and other signal processing areas. |
3. ISO/IEC |
 |
ISO (International Organization for Standardization) and IEC
(International Electro technical Commission) are organization that seek to establish
international standards for various fields. |
4. 14496-3 3rd ED AMD 2 (ALS) |
 |
MPEG-4 audio 3rd edition amendment
2. It is usually called as ALS. |
5. AAC |
 |
Advanced Audio Coder :Efficient multi-channel audio coder established
in 1997. Perceptual quality is better than that of MP3. The coder is used in
the Japanese digital broadcasting system and some of portable music players. |
6. ZIP |
 |
General purpose lossless compression tool, which adaptively updates
the codebook depending on the input sequence. It can compress text and program
sources and has been incorporated in the OS.. |