www.belle-nuit.com archive

recherche | home

title

video compression for desktop applications

author

lawrence a. rowe

source

www.bmrc.berkeley.edu

date

10.7.98

Berkeley Multimedia Research Center
Published: April 1995
Berkeley, CA
USA
http://www.bmrc.berkeley.edu

Video Compression for Desktop Applications

Sections

1.1 Introduction
1.2 Current State of the Art
1.2.1 MPEG on Every Desktop
1.2.2 Motion JPEG for Editing
1.2.3 H.261 for Video Conferencing
1.2.4 What's the User to Do
1.3 Research Problems
1.3.1 Multiple Format Stored Representations
1.3.2 Perceptual Coding
1.3.3 Multiple CPU/Chip Implementations
1.3.4 Continuous Media Infrastructure
1.4 Wireless Audio/Video Compression
1.5 Conclusions
1.6 References


1 Video Compression for Desktop Applications


Lawrence A. Rowe
University of California, Berkeley, CA, USA

 

 



This paper discusses the current state of compression for digital video on the desktop. Today there are many choices for video compression that yield different performance in terms of compression factor, quality, bitrate, and cost. Users want a single low cost solution which, unfortunately, today is non-existent. Consequently, users will have to develop applications in an environment with multiple representations for digital video unless PC's can be assigned to dedicated applications. Alternatively, programmable compression/decompression boards can be used to solve the problem. Eventually, special-purpose hardware solutions will be replaced by general-purpose software running on desktop parallel processors which will be implemented by multiple CPU's per chip.


1.1 Introduction

1.2 Current State of the Art
Back to Top

This paper presents my opinion of the current state of the art for compression for desktop digital video applications. Put simply, there are too many compression algorithms and standards and too few low-cost boards that implement the major standards.


1.2 Current State of the Art

1.1 Introduction
1.3 Research Problems
Back to Top

There are numerous video compression algorithms including: Apple's Roadpizza, Supermac's CINEPACK, Fractals, H.261, Intel's INDEO, motion JPEG (MJPEG), MPEG-1, MPEG-2, Sun's CELLB, and Wavelets. Users are confused by all these choices. They want to know which technology to use so they can make intelligent investment decisions.
Unfortunately, the current situation is not very good because there is no single technology that can be used for all applications. For example, Apple's Roadpizza and Supermac's CINEPACK are designed for playback applications with software-only decoding, H.261 is designed for video teleconferencing, MPEG-1 is designed for low bitrate (e.g., 1.5Mbs) audio and video playback applications, and MPEG-2 is designed for high bitrate, high quality playback applications with full-sized images (e.g., CCIR 601 with studio quality at 4-10 Mbits/sec).
Users want one solution, but one solution does not exist. In the next couple of years, I see the following trends.

1.2.1 MPEG on Every Desktop

1.2 Current State of the Art

Low cost MPEG-1 decoder chips will be on every desktop. Add-in boards cost around $350 today, and the next generation multimedia PC will have audio and video decoder chips on the motherboard. Manufacturers of video games and CD-ROM titles will use MPEG-1 video to add excitement to their products.
MPEG hardware for workstations will be less readily available and more costly because these manufacturers can provide creditable software-only decoders for MPEG. Early experiments on software-only MPEG decoding showed that small-sized images (e.g., QCIF which is 160x120) can be decoded in real-time and medium-sized images (e.g., CIF which is 320x240) can be decoded in near real-time (16 fps compared to 24 fps) on RISC processors [Rowe93]. Subsequent work by DEC showed that tuning the decoder to a specific processor can achieve real-time decoding of CIF images [Ho94]. Recently, HP released a software-only MPEG audio and video decoder for their HP Snake processors that runs in real-time on CIF images [Lee94]. The HP software uses special-purpose instructions added to the architecture that speedup Huffman decoding and 8-bit arithmetic operations (using saturation arithmetic). And, they use hardware to convert YCRCB to RGB and dither to an 8-bit color map. Color space conversion was done in software in the other cases which can be as much as 30% of the computation. Nevertheless, the HP software is impressive.
These experiments illustrate that software-only decoders will eventually replace all hardware decoders. I believe that it will be at least 4-6 years before hardware decoders for MPEG-1 are out-dated. By that time, hardware decoders for MPEG-2 which supports higher quality video and audio at higher bitrates will be widely available. Some users will upgrade to higher quality rather than continue with low quality at no cost. A general-purpose processor capable of MPEG-2 decoding on full-sized images (e.g., 640x480 or 768x576) will require multiple processors.
The biggest problem with MPEG is the cost of encoders. High quality, real-time encoders cost between $50K and $500K. Almost all high end encoders use parallel processors, either general-purpose supercomputers (e.g., IBM) or custom-designed video processors (e.g., CCube). Lower quality real-time encoders for PC platforms that use fewer processors cost around $20K (e.g., FutureTel, Optibase, Optivision, etc.). While the cost of these low end systems will decline over the next couple of years, they will still be too expensive for most users.

1.2.2 Motion JPEG for Editing

1.2 Current State of the Art

Non-linear video editors are typically used in broadcast TV, commercial post production, and high-end corporate media departments. Low bitrate MPEG-1 quality is unacceptable to these customers, and it is difficult to edit video sequences that use inter-frame compression. Consequently, non-linear editors (e.g., AVID, Matrox, FAST, etc.) will continue to use motion JPEG with low compression factors (e.g., 6:1 to 10:1).
Motion JPEG compression has also been used in some desktop video conferencing applications (e.g., Insoft) because affordable workstation boards that support real-time encoding and decoding have been available. Typical boards cost $4K to $10K. Motion JPEG boards are now being sold for PC's that cost $1K to $4K.

1.2.3 H.261 for Video Conferencing

1.2 Current State of the Art

Video conferencing has been an active research and product area for many years. Although most commercial room-sized conferencing systems use proprietary standards, they are now adopting the H.261 ITU standard for video conferencing*. Moreover, most desktop video conferencing systems are using H.261 (e.g., AT&T, Compression Labs, Intel, PictureTel, etc.). Most of these systems use ISDN lines, although a few are starting to support packet-switched networks. And, several research laboratories are developing software that uses H.261 boards on PC's and workstations.

1.2.4 What's the User to Do

1.2 Current State of the Art

What is the user to do who wants to provide ubiquitous digital video, that is, video in all applications including email, documents, conferencing, hypermedia courseware, and databases? Users have two choices:

Select one compression standard and try to acquire applications that will use it. Acknowledge that you need support for multiple compression standards.

My opinion is that users will have to make the second choice which means either a programmable compression/decompression board or multiple compression boards. Programmable boards exist, but they are not widely available, and they are expensive. In addition, vendors do not yet provide microcode for the variety of compression standards needed, but I believe that eventually the software will be readily available and relatively inexpensive. The question is will the software be available for programmable boards before parallel processors for desktops are available that can run general-purpose software.


*Actually, H.261 is just the video standard. A video conferencing system must also support the appropriate audio standards (e.g., G.72x) and system-level standards.
In the meantime, users must develop applications that are open so that new compression technology can be introduced and so that real-time conversion is supported. For example, Quicktime from Apple and Video-for-Windows from Microsoft are the dominant storage systems for PC video. Both systems support multiple compression standards.
Better support is needed in applications to convert between different representations because most applications are closed. For example, a desktop video conferencing system should allow video transmitted in H.261 format to be converted to an MPEG stream so that PC users can view remote presentations.


1.3 Research Problems

1.2 Current State of the Art
1.4 Wireless Audio/Video Compression
Back to Top

This section discusses some possible research problems. Some researchers argue we need improved compression technology such as wavelet-based algorithms. Except in the case of wireless communication discussed below, I disagree. I believe that research should be directed to improving the existing technologies and developing improved implementations, systems infrastructure, and applications. Unless a new technology can provide significantly better performance (i.e., at least 2:1 improvement in space) than the current JPEG, MPEG, and H.261 standards, users will be better served by improving the existing techniques and applications.
Some proposed compression standards provide other services such as multiresolution sequences (i.e., different applications can request different sized images at different bitrates from the same compressed representation) and variable quality (i.e., different quality at different bitrates). While these features are reasonable to request, I do not believe you need a completely different compression technology to support them. The MPEG-2 standard has provisions, albeit somewhat controversial, for image size, quality (S/N ratio), and frame rate scalability. I believe it makes more sense to develop the technology supporting these standards than it does to propose a completely different technology unless you get the compression improvement mentioned above.

1.3.1 Multiple Format Stored Representations

1.3 Research Problems

Suppose you wanted to develop a video server for a heterogenous computing environment that included desktop computers with different decompression capabilities (e.g., motion JPEG, H.261, and MPEG-1). The problem is what representation do you store. You could store one of these representations and then provide a real-time transcoder somewhere on the network that will convert between the different representations. Another alternative is to store a representation that makes it easy to generate any of these sequences. For example, there are differences in the block and macroblock structure of these streams, but it should be possible to devise a stored representation that can easily generate any of the representations. Here are a couple ideas:

 

Store several motion vectors for a macroblock. For example, MPEG vectors can be arbitrary far away from the origin of the source block, they can be on half-pixel boundaries, and, in the case of B frames that can be forward, backward, or an average of a forward and backward block. H.261 motion vectors can only be +/- 15 pixels, they cannot be on half-pixel boundaries, and they can only be backward blocks. So, the idea is to store two motion vectors for blocks whose MPEG vector is not valid for H.261 and select the appropriate one when constructing the stream to be transmitted. Store the huffman encoded representations of frames and create the rest of the stream syntax on the fly. For example, an H.261 stream can skip up to 2 frames between every frame displayed and although there is a requirement to refresh every block within some number of frames, there is no requirement to include the equivalent of a complete frame (i.e., an MPEG I-frame). The H.261 stream could be easily generated from an appropriate MPEG-like frame structure similar to the one suggested above. Provide support for scalable H.261 and motion JPEG using the MPEG scalable representations.

A shrewd data structure and efficient algorithm implementation (e.g., possibly using frequency domain operations [Smith94]) should produce a more flexible system.

1.3.2 Perceptual Coding

1.3 Research Problems

Much work remains to be done understanding the human visual system and developing models that can be used to implement better coders. Surprisingly, perceptual coding of audio is ahead of perceptual coding of video [Jayant93]. Today, most researchers are working on best possible coding with infinite time to encode. The target bitrates are typically 1.2 Mbs for CD-ROM and 2, 3 or 6 Mbs for video-on-demand. There are many other points in the design space. For example, suppose you wanted to encode CIF images on a typical PC and you were willing to produce a statistical guarantee on bitrate. The idea is to relax the bitrate requirement because real-time transport protocols are being designed to provide statistical guarantees, so why should the coder work hard to satisfy a strict bitrate bound when it may mean a significantly poorer picture. The coding strategy for this implementation will be very different than the strategy used in current coders. This idea is only of several ways to change the basic model.

1.3.3 Multiple CPU/Chip Implementations

1.3 Research Problems

Future desktop computer architectures will use microprocessors that support multiple CPU's per chip. For example, a RISC processor requires 1M to 3M transistors. Chip technology will soon be able to put 100M transistors on a chip. So the question is how to use the transistors? One design will put many different processor architectures on a chip so that a system can run different software. Another design will put many copies of the same processor on the chip.
An interesting research problem is to understand the effect of different architectures on compression and decompression. One possibility, which is probably already being done in industrial research labs, is to look at high performance parallel decoders for HDTV images (e.g., 1920x1080) using general purpose processors.

1.3.4 Continuous Media Infrastructure

1.3 Research Problems

There is currently no portable toolkit for developing distributed continuous media applications (i.e., digital audio and video) such as desktop conferencing systems, distance learning systems, distributed video playback systems. Many excellent research systems have been developed, but they are typically not distributed, and they support few hardware platforms and audio/video boards [Anderson91, Gibbs91, Hamakawa92, Koegel93, Rossum93, Steinmetz91, Trehan93, Hewlett-Packard93]. There are several standards groups and large companies trying to establish common architectures and protocols for developing distributed applications, but these efforts have yet to succeed. The consequence is that anyone who wants to develop an application faces the problem of developing the infrastructure.
Our research group has developed such an infrastructure, called the Berkeley Continuous Media Toolkit, that supports motion JPEG and MPEG video boards, several audio standards, and runs on a variety of platforms. It is based on the Tcl scripting language, the Tk interface toolkit, and the Tcl-DP package for distributed client/server computing. We have developed a network playback system [Rowe92] and desktop video conferencing system using the toolkit [Chaffee94].
You might wonder how a research project at a university can compete with large companies. The answer is we cannot. However, by distributing our source code and working with other researchers we can build a common infrastructure. This approach has worked for CAD tools, Tcl/Tk, and the INGRES relational DBMS to name three examples from Berkeley.
However, we still need the equivalent of the PBMPLUS library for manipulating digital video data. The idea is to develop tools and libraries so that different researchers can experiment with components of the infrastructure and with applications built using it.


1.4 Wireless Audio/Video Compression

1.3 Research Problems
1.5 Conclusions
Back to Top

Wireless computing links are very different than conventional communication links. First, bandwidth is limited (e.g., approximately 2 Mbs aggregate bandwidth in a cell). And, communication errors are inversely proportional to the power used on the portable device. Power is the scarce resource so algorithms and implementations that perform adequately with less power are better. Some researchers argue that portable devices should have limited computational power to reduce power requirements which means that audio and video compression must be very simple [Broderson93].
Compression algorithms that work well in this environment are an interesting challenge. Some people are looking at pyramid and subband coding using vector quantization. Vector quantization is simple to decode and pyramid and subband coding can be used to partition the stream into high priority data that will be sent with more power to reduce errors and low priority data that will be sent with less power.
Needless to say, this architecture will create many problems if the rest of the digital video infrastructure is dominated by the block transform coding standards as I believe it will be.


1.5 Conclusions

1.4 Wireless Audio/Video Compression
1.6 References
Back to Top

Compression researchers have developed numerous technologies that have been used to develop a series of compression standards that will dominate desktop digital video. Today, and for at least the next 5-10 years, application developers and users face a difficult choice of which hardware and software to use. Eventually, desktop parallel processors will allow many different compression algorithms, implemented in general-purpose software, to be used.
Many research problems remain but my opinion is that effort should be directed to improving existing implementations, software systems infrastructure, and applications.


1.6 References

1.5 Conclusions
Back to Top

[Anderson91] D.P. Anderson and P. Chan, "Toolkit Support for Multiuser Audio/Video Applications," Proc. 2nd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, Heidelberg, Germany, November 1991.

[Broderson94] R. Broderson, "The Infopad Project's Home Page," World-Wide Web Page, http://infopad.eecs.berkeley.edu/.

[Chaffee94] G. Chaffee, personal communication, May 1994.

[Gibbs91] S. Gibbs, et.al., "A Programming Environment for Multimedia Applications," Proc. 2nd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, Heidelberg, Germany, November 1991.

[Hamakawa92] R. Hamakawa, et.al., "Audio and Video Extensions to Graphical User Interface Toolkits," Proc. 3rd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, San Diego, CA, November 1992.

[Hewlett-Packard93] Hewlett-Packard, IBM, and Sunsoft, "Multimedia Systems Services (Version 1.0)," response to Multimedia System Services Request for Technology, Interactive Multimedia Association, 1993.

[Ho94] S. Ho, personal communication, February 1994.

[Jayant93] N. Jayant, J. Johnston, and R. Safranek, "Signal Compression Based on Models of Human Perception," Proc. of the IEEE, Vol 81, No. 10, October 1993, p1385-1422.

[Koegel93] J.F. Koegel, et.al., "HyOctane: A HyTime Engine for an MMIS," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.

[Lee94] R. Lee, personal communication, May 1994.

[Rossum93] G. van Rossum, et.al., "CMIFed: A Presentation Environment for Portable Hypermedia Documents," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.

[Rowe92] L.A. Rowe and B.C. Smith, "A Continuous Media Player" Proc. 3rd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, San Diego, CA, November 1992.

[Rowe93] L.A. Rowe, K. Patel and B.C. Smith, "Performance of a Software MPEG Video Decoder," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.

[Smith94] B.C. Smith, "Fast Software Processing of Motion JPEG Video," to appear ACM Multimedia 94, October 1994.

[Steinmetz91] R. Steinmetz and J.C. Fritzsche, "Abstractions for Continuous-Media Programming," Proc. 2nd Int'l. Workshop on Network and Operating System Support for Digital Audio and Video, Heidelberg, Germany, November 1991.

[Trehan93] R. Trehan, et.al., "Toolkit for Shared Hypermedia on a Distributed Object Oriented Architecture," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.

 

www.belle-nuit.com archive

10.7.98

recherche | home