www.belle-nuit.com archive |
|||
title |
video compression for desktop applications |
||
author |
lawrence a. rowe |
||
source |
www.bmrc.berkeley.edu |
||
date |
10.7.98 |
Berkeley Multimedia Research
Center
Published: April 1995
Berkeley, CA
USA
http://www.bmrc.berkeley.edu
1.1 Introduction
1.2 Current State of the Art
1.2.1 MPEG on Every Desktop
1.2.2 Motion JPEG for Editing
1.2.3 H.261 for Video
Conferencing
1.2.4 What's the User to Do
1.3 Research Problems
1.3.1 Multiple Format Stored
Representations
1.3.2 Perceptual Coding
1.3.3 Multiple CPU/Chip
Implementations
1.3.4 Continuous Media
Infrastructure
1.4 Wireless Audio/Video
Compression
1.5 Conclusions
1.6 References
This paper discusses the current state of compression for digital
video on the desktop. Today there are many choices for video
compression that yield different performance in terms of compression
factor, quality, bitrate, and cost. Users want a single low cost
solution which, unfortunately, today is non-existent. Consequently,
users will have to develop applications in an environment with
multiple representations for digital video unless PC's can be
assigned to dedicated applications. Alternatively, programmable
compression/decompression boards can be used to solve the problem.
Eventually, special-purpose hardware solutions will be replaced by
general-purpose software running on desktop parallel processors which
will be implemented by multiple CPU's per chip.
1.2 Current State of the Art
Back to Top
This paper presents my opinion of the current state of the art for compression for desktop digital video applications. Put simply, there are too many compression algorithms and standards and too few low-cost boards that implement the major standards.
1.1 Introduction
1.3 Research Problems
Back to Top
There are numerous video compression algorithms including: Apple's
Roadpizza, Supermac's CINEPACK, Fractals, H.261, Intel's INDEO,
motion JPEG (MJPEG), MPEG-1, MPEG-2, Sun's CELLB, and Wavelets. Users
are confused by all these choices. They want to know which technology
to use so they can make intelligent investment decisions.
Unfortunately, the current situation is not very good because there
is no single technology that can be used for all applications. For
example, Apple's Roadpizza and Supermac's CINEPACK are designed for
playback applications with software-only decoding, H.261 is designed
for video teleconferencing, MPEG-1 is designed for low bitrate (e.g.,
1.5Mbs) audio and video playback applications, and MPEG-2 is designed
for high bitrate, high quality playback applications with full-sized
images (e.g., CCIR 601 with studio quality at 4-10 Mbits/sec).
Users want one solution, but one solution does not exist. In the next
couple of years, I see the following trends.
Low cost MPEG-1 decoder chips will be on every desktop. Add-in
boards cost around $350 today, and the next generation multimedia PC
will have audio and video decoder chips on the motherboard.
Manufacturers of video games and CD-ROM titles will use MPEG-1 video
to add excitement to their products.
MPEG hardware for workstations will be less readily available and
more costly because these manufacturers can provide creditable
software-only decoders for MPEG. Early experiments on software-only
MPEG decoding showed that small-sized images (e.g., QCIF which is
160x120) can be decoded in real-time and medium-sized images (e.g.,
CIF which is 320x240) can be decoded in near real-time (16 fps
compared to 24 fps) on RISC processors [Rowe93].
Subsequent work by DEC showed that tuning the decoder to a specific
processor can achieve real-time decoding of CIF images [Ho94].
Recently, HP released a software-only MPEG audio and video decoder
for their HP Snake processors that runs in real-time on CIF images
[Lee94]. The HP software uses
special-purpose instructions added to the architecture that speedup
Huffman decoding and 8-bit arithmetic operations (using saturation
arithmetic). And, they use hardware to convert YCRCB to RGB and
dither to an 8-bit color map. Color space conversion was done in
software in the other cases which can be as much as 30% of the
computation. Nevertheless, the HP software is impressive.
These experiments illustrate that software-only decoders will
eventually replace all hardware decoders. I believe that it will be
at least 4-6 years before hardware decoders for MPEG-1 are out-dated.
By that time, hardware decoders for MPEG-2 which supports higher
quality video and audio at higher bitrates will be widely available.
Some users will upgrade to higher quality rather than continue with
low quality at no cost. A general-purpose processor capable of MPEG-2
decoding on full-sized images (e.g., 640x480 or 768x576) will require
multiple processors.
The biggest problem with MPEG is the cost of encoders. High quality,
real-time encoders cost between $50K and $500K. Almost all high end
encoders use parallel processors, either general-purpose
supercomputers (e.g., IBM) or custom-designed video processors (e.g.,
CCube). Lower quality real-time encoders for PC platforms that use
fewer processors cost around $20K (e.g., FutureTel, Optibase,
Optivision, etc.). While the cost of these low end systems will
decline over the next couple of years, they will still be too
expensive for most users.
Non-linear video editors are typically used in broadcast TV,
commercial post production, and high-end corporate media departments.
Low bitrate MPEG-1 quality is unacceptable to these customers, and it
is difficult to edit video sequences that use inter-frame
compression. Consequently, non-linear editors (e.g., AVID, Matrox,
FAST, etc.) will continue to use motion JPEG with low compression
factors (e.g., 6:1 to 10:1).
Motion JPEG compression has also been used in some desktop video
conferencing applications (e.g., Insoft) because affordable
workstation boards that support real-time encoding and decoding have
been available. Typical boards cost $4K to $10K. Motion JPEG boards
are now being sold for PC's that cost $1K to $4K.
Video conferencing has been an active research and product area for many years. Although most commercial room-sized conferencing systems use proprietary standards, they are now adopting the H.261 ITU standard for video conferencing*. Moreover, most desktop video conferencing systems are using H.261 (e.g., AT&T, Compression Labs, Intel, PictureTel, etc.). Most of these systems use ISDN lines, although a few are starting to support packet-switched networks. And, several research laboratories are developing software that uses H.261 boards on PC's and workstations.
What is the user to do who wants to provide ubiquitous digital video, that is, video in all applications including email, documents, conferencing, hypermedia courseware, and databases? Users have two choices:
Select one compression standard and try to acquire applications that will use it. Acknowledge that you need support for multiple compression standards.
My opinion is that users will have to make the second choice which means either a programmable compression/decompression board or multiple compression boards. Programmable boards exist, but they are not widely available, and they are expensive. In addition, vendors do not yet provide microcode for the variety of compression standards needed, but I believe that eventually the software will be readily available and relatively inexpensive. The question is will the software be available for programmable boards before parallel processors for desktops are available that can run general-purpose software.
1.2 Current State of the Art
1.4 Wireless Audio/Video Compression
Back to Top
This section discusses some possible research problems. Some
researchers argue we need improved compression technology such as
wavelet-based algorithms. Except in the case of wireless
communication discussed below, I disagree. I believe that research
should be directed to improving the existing technologies and
developing improved implementations, systems infrastructure, and
applications. Unless a new technology can provide significantly
better performance (i.e., at least 2:1 improvement in space) than the
current JPEG, MPEG, and H.261 standards, users will be better served
by improving the existing techniques and applications.
Some proposed compression standards provide other services such as
multiresolution sequences (i.e., different applications can request
different sized images at different bitrates from the same compressed
representation) and variable quality (i.e., different quality at
different bitrates). While these features are reasonable to request,
I do not believe you need a completely different compression
technology to support them. The MPEG-2 standard has provisions,
albeit somewhat controversial, for image size, quality (S/N ratio),
and frame rate scalability. I believe it makes more sense to develop
the technology supporting these standards than it does to propose a
completely different technology unless you get the compression
improvement mentioned above.
Suppose you wanted to develop a video server for a heterogenous computing environment that included desktop computers with different decompression capabilities (e.g., motion JPEG, H.261, and MPEG-1). The problem is what representation do you store. You could store one of these representations and then provide a real-time transcoder somewhere on the network that will convert between the different representations. Another alternative is to store a representation that makes it easy to generate any of these sequences. For example, there are differences in the block and macroblock structure of these streams, but it should be possible to devise a stored representation that can easily generate any of the representations. Here are a couple ideas:
Store several motion vectors for a macroblock. For example, MPEG vectors can be arbitrary far away from the origin of the source block, they can be on half-pixel boundaries, and, in the case of B frames that can be forward, backward, or an average of a forward and backward block. H.261 motion vectors can only be +/- 15 pixels, they cannot be on half-pixel boundaries, and they can only be backward blocks. So, the idea is to store two motion vectors for blocks whose MPEG vector is not valid for H.261 and select the appropriate one when constructing the stream to be transmitted. Store the huffman encoded representations of frames and create the rest of the stream syntax on the fly. For example, an H.261 stream can skip up to 2 frames between every frame displayed and although there is a requirement to refresh every block within some number of frames, there is no requirement to include the equivalent of a complete frame (i.e., an MPEG I-frame). The H.261 stream could be easily generated from an appropriate MPEG-like frame structure similar to the one suggested above. Provide support for scalable H.261 and motion JPEG using the MPEG scalable representations.
A shrewd data structure and efficient algorithm implementation (e.g., possibly using frequency domain operations [Smith94]) should produce a more flexible system.
Much work remains to be done understanding the human visual system and developing models that can be used to implement better coders. Surprisingly, perceptual coding of audio is ahead of perceptual coding of video [Jayant93]. Today, most researchers are working on best possible coding with infinite time to encode. The target bitrates are typically 1.2 Mbs for CD-ROM and 2, 3 or 6 Mbs for video-on-demand. There are many other points in the design space. For example, suppose you wanted to encode CIF images on a typical PC and you were willing to produce a statistical guarantee on bitrate. The idea is to relax the bitrate requirement because real-time transport protocols are being designed to provide statistical guarantees, so why should the coder work hard to satisfy a strict bitrate bound when it may mean a significantly poorer picture. The coding strategy for this implementation will be very different than the strategy used in current coders. This idea is only of several ways to change the basic model.
Future desktop computer architectures will use microprocessors
that support multiple CPU's per chip. For example, a RISC processor
requires 1M to 3M transistors. Chip technology will soon be able to
put 100M transistors on a chip. So the question is how to use the
transistors? One design will put many different processor
architectures on a chip so that a system can run different software.
Another design will put many copies of the same processor on the
chip.
An interesting research problem is to understand the effect of
different architectures on compression and decompression. One
possibility, which is probably already being done in industrial
research labs, is to look at high performance parallel decoders for
HDTV images (e.g., 1920x1080) using general purpose processors.
There is currently no portable toolkit for developing distributed
continuous media applications (i.e., digital audio and video) such as
desktop conferencing systems, distance learning systems, distributed
video playback systems. Many excellent research systems have been
developed, but they are typically not distributed, and they support
few hardware platforms and audio/video boards [Anderson91,
Gibbs91, Hamakawa92, Koegel93, Rossum93, Steinmetz91, Trehan93,
Hewlett-Packard93]. There are several standards groups and
large companies trying to establish common architectures and
protocols for developing distributed applications, but these efforts
have yet to succeed. The consequence is that anyone who wants to
develop an application faces the problem of developing the
infrastructure.
Our research group has developed such an infrastructure, called the
Berkeley Continuous Media Toolkit, that supports motion JPEG
and MPEG video boards, several audio standards, and runs on a variety
of platforms. It is based on the Tcl scripting language, the Tk
interface toolkit, and the Tcl-DP package for distributed
client/server computing. We have developed a network playback system
[Rowe92] and desktop video conferencing
system using the toolkit [Chaffee94].
You might wonder how a research project at a university can compete
with large companies. The answer is we cannot. However, by
distributing our source code and working with other researchers we
can build a common infrastructure. This approach has worked for CAD
tools, Tcl/Tk, and the INGRES relational DBMS to name three examples
from Berkeley.
However, we still need the equivalent of the PBMPLUS library for
manipulating digital video data. The idea is to develop tools and
libraries so that different researchers can experiment with
components of the infrastructure and with applications built using
it.
1.3 Research Problems
1.5 Conclusions
Back to Top
Wireless computing links are very different than conventional
communication links. First, bandwidth is limited (e.g., approximately
2 Mbs aggregate bandwidth in a cell). And, communication errors are
inversely proportional to the power used on the portable device.
Power is the scarce resource so algorithms and implementations that
perform adequately with less power are better. Some researchers argue
that portable devices should have limited computational power to
reduce power requirements which means that audio and video
compression must be very simple [Broderson93].
Compression algorithms that work well in this environment are an
interesting challenge. Some people are looking at pyramid and subband
coding using vector quantization. Vector quantization is simple to
decode and pyramid and subband coding can be used to partition the
stream into high priority data that will be sent with more power to
reduce errors and low priority data that will be sent with less
power.
Needless to say, this architecture will create many problems if the
rest of the digital video infrastructure is dominated by the block
transform coding standards as I believe it will be.
1.4 Wireless Audio/Video Compression
1.6 References
Back to Top
Compression researchers have developed numerous technologies that
have been used to develop a series of compression standards that will
dominate desktop digital video. Today, and for at least the next 5-10
years, application developers and users face a difficult choice of
which hardware and software to use. Eventually, desktop parallel
processors will allow many different compression algorithms,
implemented in general-purpose software, to be used.
Many research problems remain but my opinion is that effort should be
directed to improving existing implementations, software systems
infrastructure, and applications.
[Anderson91] D.P. Anderson and P. Chan, "Toolkit Support for Multiuser Audio/Video Applications," Proc. 2nd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, Heidelberg, Germany, November 1991.
[Broderson94] R. Broderson, "The Infopad Project's Home Page," World-Wide Web Page, http://infopad.eecs.berkeley.edu/.
[Chaffee94] G. Chaffee, personal communication, May 1994.
[Gibbs91] S. Gibbs, et.al., "A Programming Environment for Multimedia Applications," Proc. 2nd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, Heidelberg, Germany, November 1991.
[Hamakawa92] R. Hamakawa, et.al., "Audio and Video Extensions to Graphical User Interface Toolkits," Proc. 3rd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, San Diego, CA, November 1992.
[Hewlett-Packard93] Hewlett-Packard, IBM, and Sunsoft, "Multimedia Systems Services (Version 1.0)," response to Multimedia System Services Request for Technology, Interactive Multimedia Association, 1993.
[Ho94] S. Ho, personal communication, February 1994.
[Jayant93] N. Jayant, J. Johnston, and R. Safranek, "Signal Compression Based on Models of Human Perception," Proc. of the IEEE, Vol 81, No. 10, October 1993, p1385-1422.
[Koegel93] J.F. Koegel, et.al., "HyOctane: A HyTime Engine for an MMIS," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.
[Lee94] R. Lee, personal communication, May 1994.
[Rossum93] G. van Rossum, et.al., "CMIFed: A Presentation Environment for Portable Hypermedia Documents," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.
[Rowe92] L.A. Rowe and B.C. Smith, "A Continuous Media Player" Proc. 3rd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, San Diego, CA, November 1992.
[Rowe93] L.A. Rowe, K. Patel and B.C. Smith, "Performance of a Software MPEG Video Decoder," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.
[Smith94] B.C. Smith, "Fast Software Processing of Motion JPEG Video," to appear ACM Multimedia 94, October 1994.
[Steinmetz91] R. Steinmetz and J.C. Fritzsche, "Abstractions for Continuous-Media Programming," Proc. 2nd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, Heidelberg, Germany, November 1991.
[Trehan93] R. Trehan, et.al., "Toolkit for Shared Hypermedia on a Distributed Object Oriented Architecture," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.
www.belle-nuit.com archive |
10.7.98 |