www.belle-nuit.com archive |
|||
title |
english video glossary |
||
source |
compilation of various sources |
||
date |
10.7.98 |
A ratio of sampling frequencies used to digitise the luminance and colour difference components (Y, R-Y, B-Y) of a video signal. It is generally used as shorthand for ITU-R 601. The term 4:2:2 describes that for every four samples of Y, there are 2 samples each of R-Y and B-Y, giving more chrominance bandwidth in relation to luminance compared to 4:1:1 sampling.
ITU-R 601, 4:2:2 is the standard for digital studio equipment and
the terms ë4:2:2í and ë601í are commonly (but
technically incorrectly) used synonymously. The sampling frequency of
Y is 13.5 MHz and that of R-Y and B-Y is each 6.75 MHz providing a
maximum colour bandwidth of 3.37 MHz - enough for high quality chroma
keying.
See also: CCIR 601
Although sometimes used interchangeably, advanced and high-definition television (HDTV) are not one and the same. Advanced television (ATV) would distribute wide-screen television signals with resolution substantially better than current systems. It requires changes to current emission regulations, including transmission standards. In addition, ATV would offer at least two-channel, CD-quality audio.
The a:b:c notation for sampling ratios, as found in the CCIR-601 specifications, has the following meaning :
Not only is this notation not internally consistent, but it is incapable of being extended to represent any unusual sampling ratios, eg different ratios for the Cb and Cr channels.
Perhaps the major drawback to each of the Huffman encoding techniques is their poor performance when processing texts where one symbol has a probability of occurrence approaching unity. Although the entropy associated with such symbols is extremely low, each symbol must still be encoded as a discrete value.
Arithmetic coding removes this restriction by representing messages as intervals of the real numbers between 0 and 1. Initially, the range of values for coding a text is the entire interval [0, 1]. As encoding proceeds, this range narrows while the number of bits required to represent it expands. Frequently occurring characters reduce the range less than characters occurring infrequently, and thus add fewer bits to the length of an encoded message.
ATM (Asynchronous Transfer Mode) is a switching/transmission technique where data is transmitted in small, fixed sized cells (5 byte header, 48 byte payload). The cells lend themselves both to the time-division- multiplexing characteristics of the transmission media, and the packet switching characteristics desired of data networks. At each switching node, the ATM header identifies a virtual path or virtual circuit that the cell contains data for, enabling the switch to forward the cell to the correct next-hop trunk. The virtual path is set up through the involved switches when two endpoints wish to communicate. This type of switching can be implemented in hardware, almost essential when trunk speed range from 45Mb/s to 1Gb/s.
The human visual system has much less acuity for spatial variation of colour than for brightness. Rather than conveying RGB, it is advantageous to convey luma in one channel, and colour information that has had luma removed in the two other channels. In an analog system, the two colour channels can have less bandwidth, typically one-third that of luma. In a digital system each of the two colour channels can have considerably less data rate (or data capacity) than luma.
Green dominates the luma channel: about 59% of the luma signal comprises green information. Therefore it is sensible, and advantageous for signal-to-noise reasons, to base the two colour channels on blue and 1red. The simplest way to remove luma from each of these is to subtract it to form the difference between a primary colour and luma. Hence, the basic video colour-difference pair is (B-Y), (R-Y) [pronounced "B minus Y, R minus Y"].
The (B-Y) signal reaches its extreme values at blue (R=0, G=0, B=1; Y=0.114; B-Y=+0.886) and at yellow (R=1, G=1, B=0; Y=0.886; B-Y=-0.886). Similarly, the extrema of (R-Y), +-0.701, occur at red and cyan. These are inconvenient values for both digital and analog systems. The colour spaces YPbPr, YCbCr, PhotoYCC and YUV are simply scaled versions of (Y, B-Y, R-Y) that place the extrema of the colour difference channels at more convenient values.
Bridges are devices that connect similar and dissimilar LANs at the data link layer (OSIlayer 2), regardless of the physical layer protocols or media being used. Bridges require that the networks have consistent addressing schemes and packet frame sizes. Current introductions have been termed learning bridges since they are capable of updating node address (tracking) tables as well as overseeing the transmission of data between two Ethernet LANs.
Brouters are bridge/router hybrid devices that offer the best capabilities of both devices in one unit. Brouters are actually bridges capable of intelligent routing and therefore are used as generic components to integrate workgroup networks . The bridge function filters information that remains internal to the network and is capable of supporting multiple higher-level protocols at once.
The router component maps out the optimal paths for the movement of data from one point on the network to another. Since the brouter can handle the functions of both bridges and routers, as well as bypass the need for the translation across application protocols with gateways, the device offers significant cost reductions in network development and integration.
Commite' Consultatif International de Telecommunications et Telegraphy A committee of the International Telecommunications Union responsible for making technical recommendations about telephone and data communication systems for PTTs and suppliers. Plenary sessions are held every four years to adopt new standards.
CD-DA (Compact Disc-Digital Audio), are standard music CDs. CD-DA began CD-ROM when people realized that you could store a whole bunch of computer data on a 12cm optical disc (650mb). CD-ROM drives are simply another kind of digital storage media for computers, albeit read-only. They are peripherals just like hard disks and floppy drives. (Incidentally, the convention is that when referring to magnetic media, it is spelled disk. Optical media like CDs, LaserDisc, and all the other formats are spelled disc)
CD-I means Compact Disc Interactive. It is meant to provide a standard platform for mass consumer interactive multimedia applications. So it is more akin to CD-DA, in that it is a full specification for both the data/code and standalone playback hardware: a CD-I player has a CPU, RAM, ROM, OS, and audio/video/(MPEG) decoders built into it. Portable players add an LCD screen and speakers/phonejacks. It has limited motion video and still image compression capabilities. It was announced in 1986, and was in beta test by Spring 1989
This is a consumer electronics format that uses the optical disc in combination with a computer to provide a home entertainment system that delivers music, graphics, text, animation, and video in the living room. Unlike a CD-ROM drive, a CD-I player is a standalone system that requires no external computer. It plugs directly into a TV and stereo system and comes with a remote control to allow the user to interact with software programs sold on discs. It looks and feels much like a CD player except that you get images as well as music out of it and you can actively control what happens. In fact, it is a CD-DA player and all of your standard music CDs will play on a CD-I player; there is just no video in that case.
For a CD-I disk, there may be as few as 1 or as many as 99 data tracks. The sector size in the data tracks of a CD-I disk is approximately 2 kbytes. Sectors are randomly accessible, and, in the case of CD-I, sectors can be multiplexed in up to 16 channels for audio and 32 channels for all other data types. For audio these channels are equivalent to having 16 parallel audio data channels instantly accessible during the playing of a disk.
If you want information about Philips CD-I products, you can call these numbers:
CD-ROM means "Compact Disc Read Only Memory". A CD-ROM is physically identical to a Digital Audio Compact Disc used in a CD player, but the bits recorded on it are interpreted as computer data instead of music. You need to buy a CD-ROM Drive and attach it to your computer in order to use CD-ROMs.
A CD-ROM has several advantages over other forms of data storage, and a few disadvantages. A CD-ROM can hold about 650 megabytes of data, the equivalent of thousands of floppy discs. CD-ROMs are not damaged by magnetic fields or the xrays in airport scanners. The data on a CD-ROM can be accessed much faster than a tape, but CD-ROMs are 10 to 20 times slower than hard discs.
You cannot write to a CD-ROM. You buy a disc with the data already recorded on it. There are thousands of titles available.
CD-XA is a CD-ROM extension being designed to support digital audio and still images.
Announced in August 1988 by Microsoft, Philips, and Sony, the CD-ROM XA (for Extended Architecture) format incorporates audio from the CD-I format. It is consistent with ISO 9660, (the volume and the structure of CD-ROM), is an application extension of the Yellow Book, and draws on the Green Book.
CD-XA defines another way of formatting sectors on a CD-ROM, including headers in the sectors that describe the type (audio, video, data) and some additional info (markers, resolution in case of a video or audio sector, file numbers, etc).
The data written on a CD-XA can still be in ISO9660 file system format and therefore be readable by MSCDEX and Unix CD-ROM file system translators. A CD-I player can also read CD-XA discs even if its own `Green Book' file system only resembles ISO9660 and isn't fully compatible. However, when a disc is inserted in a CD-I player, the player tries to load an executable application from the CD-XA, normally some 68000 application in the /CDI directory. Its name is stored in the disc's primary volume descriptor. CD-XA bridge discs, like Kodak's PhotoCDs, do have such an application, ordinary CD-XA discs don't.
A CD-DA drive is a CD-ROM drive but with some of the compressed audio capabilities found in a CD-I player (called ADPCM). This allows interleaving of audio and other data so that an XA drive can play audio and display pictures (or other things) simultaneously. There is special hardware in an XA drive controller to handle the audio playback. This format came from a desire to inject some of the features of CD-I back into the professional market.
Cell is a compression technique developed by SMI. The compression algorithms, the bit-stream definition, and the decompression algorithms are open. That is Sun will tell anybody who is interested about them . Cell compression is similar to MPEG and H.261 in that there is a lot of room for value-add on the compressor end. Getting the highest quality image from a given bit count at a reasonable amount of compute is an art. In addition the bit-stream completely defines the compression format and defines what the decoder must do and there is less art in the docoder.
There are two flavors of Cell: the original called Cell or CellA, and a newer flavor called CellB. CellA is designed for use many times video, where one does not mind that the encoder runs at less than real time. For example, CD-ROM playback, distance learning, video help for applications. CellB is designed for use once video where the encoder must run at real time (interactive) rates. For example, video mail and video conferencing.
Both flavors of cell use the same basic technique of representing each 4x4 pixel block with a 16-bit bitmask and two 8-bit vector quantized codebook indices. This produces a compression of 12-1 (or 8-1) since each 16 pixel block is represented by 32 bits (16-bit mask, and two 8-bit codebook indices). In both flavors, further compression is accomplished by checking current blocks against the spatially equivalent block in the previous frame. If the new block is "close enough" to the old block, the block is coded as a skip code. Consecutive skip codes are run-length encoded for further compression. Modifying the definition of close enough allows one to trade off quality and compression rates. Both version of Cell typically compress video images down to about .75 to .5 bits/pixel.
Both flavors have many similar steps in the early part of compression. For each 4x4 block, the compressor calculates the average luma of the 16 pixels. It then partions the pixels into two groups, those whose luma is above the average and those whose luma is below the average. The compressor sets the 16-bit bitmask based on which pixels are in each partition. The compressor then calculates a color to represent each partition.
In Cell, the compressor calculates an average color of each partion, it then does a vector quantization against the Cell codebook (which is just a color-map). The encoded block is the 16-bit mask and the two 8-bit colormap indices. The compressor maintains statistics about how much error each codebook entry is responsible for and how many times each codebook entry is used. It uses these numbers to adaptively refine the codebook on each frame. Changed codebooks are sent in the bitstream.
In CellB, the compressor calculates the average luma for each partition and the average chroma for the entire block. This gives two colors [Y_lo, Cb_ave, Cr_ave] and [Y_hi, Cb_ave, Cr_ave]. The pair [Y_lo, Y_hi] is vector quantized against the Y/Y codebook and the pair [Cb_ave, Cr_ave] is vector quantized against the Cr/Cb codebook. Here the encoded block is the 16-bit mask and the two 8-bit VQ indices. Both of CellB's codebooks are fixed. This allows both the compressor and decompressor to run at high-speed by using table lookups. Both codebooks are designed with the human visual system in mind. They are not just uniform partition of the Y/Y or Cr/Cb space. Each codebook has fewer than 256 entries.
Cell (or CellA) is supported in XIL 1.0 from SMI. It is part of Solaris 2.2. CellB is supported in XIL 1.1 from SMI. It will be part of Solaris 2.3 when that becomes available. Complete bitstream definitions for both flavors of cell are in the XIL 1.1 programmer's guide. There is some discussion of the CellA bitstream in the XIL 1.0 programmer's guide.
CellB was used for the SMI Scott McNealy holiday broadcast, where he talk to the company in real-time over Sun Wide Area Network. This broadcast reach from Tokyo Japan to Munich Germany with over 3000 known viewers.
Common Image Format. The standardization of the structure of the samples that represent the picture information of a single frame in digital HDTV, independent of frame rate and sync/blank structure.
The uncompressed bit rates for transmitting CIF at 29.97 frames/sec is 36.45 Mbit/sec.
The ratio of the data in the uncompressed digital video signal to the compressed version. Modern compression techniques start with the CCIR 601 component digital television signal so the amount of data of the uncompressed video is well defined - 75 Gbytes/hour for 625/50 and 76 Gbytes/hour for the 525/60 standard.
The compression ratio should not be used as the only method to assess the quality of a compressed signal. For a given technique, greater compression can be expected to result in worse quality; but different techniques give widely differing quality of results for the same compression ratio. At the same time results will vary depending on picture content. The only sure method of judgement is to make a very close inspection of the resulting pictures. See also: JPEG , Storage capacity
A format for digital video tape recording working to the CCIR 601, 4:2:2 standard using 19 mm wide tape, allowing up to 94 minutes to be recorded on a cassette. Second generation equipment offers new features including stunt modes - slow, fast and reverse motion etc.
Being a component recording system it is ideal for studio or post
production work with its high chrominance bandwidth allowing
excellent chroma keying. At the same time multiple generations are
possible with little degradation and D1 equipment can integrate
without transcoding to most digital effects systems, telecines,
graphics devices, disk recorders, etc. Being component there are no
colour framing requirements. Despite the advantages, D1 equipment is
not extensively used in general areas of TV production - at least
partly due to its high cost.
See also: DVTR
Differential pulse code modulation (DPCM) is a source coding scheme that was developed for encoding sources with memory.
The reason for using the DPCM structure is that for most sources of practical interest, the variance of the prediction error is substantially smaller than that of the source.
Digital Video Cassette. The next generation of consumer VCRs is under development, due to appear in 1995 and is a cooperation between Hitachi, JVC, Matsushita, Mitsubishi, Philips, Sanyo, Sharp, Thomson and Toshiba. It will be digital and use 6.35 mm wide tape to record 525/60, 625/50 and HDTV television. The proposed format uses digital intra field DCT compression (about 5:1) to record 13.5 MHz, 8 bit 4:1:1 (525/60) or 4:2:0 (625/50) video plus two 16 bit/48 or 44.1 kHz audio channels, onto a 4.5 hour standard cassette (14.6 x 78 x 125 mm) or smaller 1 hour cassette (12.2 x 48 x 66 mm). The video recording rate is 25 Mbits/sec.
Digital Video Interactive (DVI) technology brings television to the microcomputer. DVI's concept is simple: information is digitized and stored on a random-access device such as a hard disk or a CD-ROM, and is accessed by a computer. DVI requires extensive compression and real-time decompression of images. Until recently this capability was missing. DVI enables new applications. For example, a DVI CD-ROM disk on twentieth-century artists might consist of 20 minutes of motion video; 1,000 high-res still images, each with a minute of audio; and 50,000 pages of text. DVI uses the YUV system, which is also used by the European PAL color television system. The Y channel encodes luminance and the U and V channels encode chrominance. For DVI, we subsample 4-to-1 both vertically and horizontally in U and V, so that each of these components requires only 1/16 the information of the Y component. This provides a compression from the 24-bit RGB space of the original to 9-bit YUV space.
The DVI concept originated in 1983 in the inventive environment of the David Sarnoff Research Center in Princeton, New Jersey, then also known as RCA Laboratories. The ongoing research and development of television since the early days of the Laboratories was extending into the digital domain, with work on digital tuners, and digital image processing algorithms that could be reduced to cost-effective hardware for mass-market consumer television.
DVTR - Digital Video Tape Recorder. The first DVTR for commercial use was shown in 1986, working to the CCIR 601 component digital standard and the associated D1 standard for DVTRs. It uses 19 mm cassettes recording 34, 78 or (using thinner tape) 94 minutes.
Today many DVTR formats are available. D2 and D3, both recording composite signals, are designed mainly to replace C format analogue machines; DCT and Digital Betacam both make use of mild data compression (around 2:1) to record the CCIR 601 video. D5, like D1, records the full, un-compressed CCIR 601 signal but on 1/2 inch tape cassettes. At least one more format is on the horizon - DVC will record compressed CCIR 601 onto 6.35 mm tape.
Multiple generations on DVTRs do not suffer from degradation due
to tape noise, moire, etc. However the tape is subject to wear and
tear. The possibility of this producing errors and drop-outs
necessitates error concealment circuitry. In extreme cases multiple
passes can introduce cumulative texturing or other artifacts.
Also see: D1 , DVC
European Association of Consumer Electronics Manufacturers
Extended [or Enhanced] Definition Television. A television system that offers picture quality substantially improved over conventional 525-line or 625-line receivers, by employing techniques at the transmitter and at the receiver that are transparent to (and cause no visible quality degradation to) existing 525-line or 625-line receivers. One example of EDTV is the improved separation of luminance and colour components by pre-combing the signals prior to transmission, using techniques that have been suggested by Faroudja, Central Dynamics and Dr William Glenn
Entropy, the average amount of information represented by a symbol in a message, is a function of the model used to produce that message and can be reduced by increasing the complexity of the model so that it better reflects the actual distribution of source symbols in the original message.
Entropy is a measure of the information contained in message, it's the lower bound for compression.
Economics and Statistics Advisory Committee
European Strategic Programme for Research and Development in Information Technology
European Telecommunication Standard Institute
Gateways provide functional bridges between networks by receiving protocol transactions on a layer-by-layer basis from one protocol (SNA) and transforming them into comparable functions for the other protocol (OSI). In short, the gateway provides a connection with protocol translation between networks that use different protocols. Interestingly enough, gateways, unlike the bridge, do not require that the networks have consistent addressing schemes and packet frame sizes. Most proprietary gateways (such as IBM SNA gateways) provide protocol converter functions up through layer six of the OSI, while OSI gateways perform protocol translations up through OSI layer seven.
Recognizing the need for providing ubiquitous video services using the Integrated Services Digital Network (ISDN), CCITT (International Telegraph and Telephone Consultative Committee) Study Group XV established a Specialist Group on Coding for Visual Telephony in 1984 with the objective of recommending a video coding standard for transmission at m x 384 kbit/s (m=1,2,..., 5). Later in the study period after new discoveries in video coding techniques, it became clear that a single standard, p x 64 kbit/s (p = 1,2,..., 30), can cover the entire ISDN channel capacity. After more than five years of intensive deliberation, CCITT Recommendation H.261, Video Codec for Audiovisual Services at p x 64 kbit/s, was completed and approved in December 1990. A slightly modified version of this Recommendation was also adopted for use in North America.
The intended applications of this international standard are for videophone and videoconferencing. Therefore, the recommended video coding algorithm has to be able to operate in real time with minimum delay. For p = 1 or 2, due to severely limited available bit rate, only desktop face-to-face visual communication (often referred to as videophone) is appropriate. For p>=6, due to the additional available bit rate, more complex pictures can be transmitted with better quality. This is, therefore, more suitable for videoconferencing.
High-Definition Television. A television system with approximately twice the horizontal and twice the vertical resolution of current 525-line and 625-line systems, component colour coding (e.g. RGB or YCbCr) a picture aspect ratio of 16:9 and a frame rate of at least 24 Hz. Currently there are a number of proposed HDTV standards, including HD-MAC, HiVision and others.
In the archetypal hybrid coder, an estimate of the next frame to be processed is formed from the current frame and the difference is then encoded by some purely intraframe mechanism. In recent years, the most attention has been paid to the motion-compensated DCT coder where the estimate is formed by a two-dimensional warp of the previous frame and the difference is encoded using a block transform (the Discrete Cosine Transform).
This system is the basis for international standards for videotelephony, is used for some HDTV demonstrations, and is the prototype from which MPEG was designed. Its utility has been demonstrated for video sequence, and the DCT concentrates the remaining energy into a small number of transform coefficients that can be quantized and compactly represented.
The key feature of this coder is the presence of a complete decoder within it. The difference between the current frame as represented as the receiver and the incoming frame is processed. In the basis design, therefore, the receiver must track the transmitter precisely, the decoder at the receiver and the decoder at the transmitter must match. The system is sensitive to channel errors and does not permit random access. However, it is on the order of three to four times as efficient as one that uses no prediction.
In practice, this coder is modified to suit specific application. The standard telephony model uses a forced update of the decoded frame so that channel errors do not propagate. When a participant enters the conversation late or alternates between image sources, residual errors die out and a clear image is obtained after a few frames. Similar techniques are used in versions of this coder being developed for direct satellite television broadcasting.
For a given character distribution, by assigning short codes to frequently occurring characters and longer codes to infrequently occurring characters, Huffman's minimum redundancy encoding minimizes the average number of bytes required to represent the characters in a text.
Static Huffman encoding uses a fixed set of codes, based on a representative sample of data, for processing texts. Although encoding is achieved in a single pass, the data on which the compression is based may bear little resemblance to the actual text being compressed.
Dynamic Huffman encoding, on the other hand, reads each text twice; once to determine the frequency distribution of the characters in the text and once to encode the data. The codes used for compression are computed on the basis of the statistics gathered during the first pass with compressed texts being prefixed by a copy of the Huffman encoding table for use with the decoding process.
By using a single-pass technique, where each character is encoded on the basis of the preceding characters in a text, Gallager's adaptive Huffman encoding avoids many of the problems associated with either the static or dynamic method.
Improved Definition Television. A television system that offers picture quality substantially improved over conventional receivers, for signals originated in standard 525-line or 625-line format, by processing that involves the use of field store and/or frame store (memory) techniques at the receiver . One example is the use of field or frame memory to implement de-interlacing at the receiver in order to reduce interline twitter compared to that of an interlaced display . IDTV techniques are implemented entirely at the receiver and involve no change to picture origination equipment and no change to emission standards
International Electrotechnic Committee. A standardisation body at the same level as ISO
Interactive video-disc is another video related technology, using an analog approach. It has been available since the early 1980s, and is supplied in the U.S. primarily by Pioneer, Sony, and IBM.
ISDN stands for "Integrated Services Digital Networks", and it's a CCITT term for a relatively new telecommunications service package. ISDN is basically the telephone network turned all-digital end to end, using existing switches and wiring (for the most part) upgraded so that the basic call is a 64 kbps end-to-end channel, with bit-diddling as needed (but not when not needed!). Packet and maybe frame modes are thrown in for good measure, too, in some places. It's offered by local telephone companies, but most readily in Australia, France, Japan, and Singapore, with the UK and Germany somewhat behind, and USA availability rather spotty.
A Basic Rate Interface (BRI) is two 64K bearer ("B") channels and a single delta ("D") channel. The B channels are used for voice or data, and the D channel is used for signaling and/or X.25 packet networking. This is the variety most likely to be found in residential service. Another flavor of ISDN is Primary Rate Interface (PRI). Inside the US, this consists of 24 channels, usually divided into 23 B channels and 1 D channel, and runs over the same physical interface as T1. Outside of the US then PRI has 31 user channels, usually divided into 30 B channels and 1 D channel. It is typically used for connections such as one between a PBX and a CO or IXC.
This standard defines the encoding parameters of digital television for studios. It is the international standard for digitising component television video in both 525 and 625 line systems and is derived from the SMPTE RP125 and the EBU Tech. 3246-E. ITU-R 601 deals with both colour difference (Y, R-Y, B-Y) and RGB video, and defines sampling systems, RGB/Y, R-Y, B-Y matrix values and filter characteristics. It does not actually define the electro-mechanical interface - see ITU-R 656.
ITU-R 601 is normally taken to refer to colour difference component digital video (rather than RGB), for which it defines 4:2:2 sampling at 13.5 MHz with 720 luminance samples per active line and 8 or 10 bit digitising.
Some headroom is allowed with black at level 16 and white at level
235 - to minimise clipping of noise and overshoots. Using 8 bit
digitising approximately 16 million unique colours are possible: 2
The sampling frequency of 13.5 MHz was chosen to provide a politically acceptable common sampling standard between 525/60 and 625/50 systems, being a multiple of 2.25 MHz, the lowest common frequency to provide a static sampling pattern for both.
See also: 4:2:2
Interfaces for digital component video signals in 525-line and
625-line television systems. The international standard for
interconnecting digital television equipment operating to the 4:2:2
standard defined in ITU-R 601, derived from the SMPTE RP125 and EBU
Tech 3246-E. It defines blanking, embedded sync words, the video
multiplexing formats used by both the parallel and serial interfaces,
the electrical characteristics of the interface and the mechanical
details of the connectors.
Joint Photographic Experts Group, ISO/ITU-T. JPEG is a standard for the data compression of still pictures (intra field). In particular its work has been involved with pictures coded to the CCIR 601 standard. JPEG uses DCT and offers data compression of between 5 and 100 times and three levels of processing are defined: the baseline, extended and 'lossless' encoding.
In general, compression can be expected to impose some form of
loss or degradation on the picture, its degree depending on the
algorithm used as well as the compression ratio and the contents of
the picture itself.
See also: Compression ratio
A television system that limits the recording or transmission of useful picture information to about three-quarters of the available vertical picture height of the distribution format (e.g. 525-line) in order to offer program material that has a wide picture aspect ratio
Video originates with linear-light (tristimulus) RGB primary components, conventionally contained in the range 0 (black) to +1 (white). From the RGB triple, three gamma-corrected primary signals are computed; each is essentially the 0.45-power of the corresponding tristimulus value, similar to a square-root function.
In a practical system such as a television camera, however, in order to minimize noise in the dark regions of the picture it is necessary to limit the slope (gain) of the curve near black. It is now standard to limit gain to 4.5 below a tristimulus value of +0.018, and to stretch the remainder of the curve to place the Y-intercept at -0.099 in order to maintain function and tangent continuity at the breakpoint:
The luma coefficients are also a function of the white point (or chromaticity of reference whitex). Computer users commonly have a white point with a colour temperature in the range of 9300 K, which contains twice as much blue as the daylight reference CIE D65 used in television. This is reflected in pictures and monitors that look too blue.
Although television primaries have changed over the years since the adoption of the NTSC standard in 1953, the coefficients of the luma equation for 525 and 625 line video have remained unchanged. For HDTV, the primaries are different and the luma coefficients have been standardized with somewhat different values.
Algorithm used by the Unix compress command to reduce the size of files, eg. for archival or transmission. The algorithm relies on repetition of byte sequences (strings) in its input. It maintains a table mapping input strings to their associated output codes. The table initially contains mappings for all possible strings of length one. Input is taken one byte at a time to find the longest initial string present in the table. The code for that string is output and then the string is extended with one more input byte, b. A new entry is added to the table mapping the extended string to the next unused code (obtained by incrementing a counter). The process repeats, starting from byte b. The number of bits in an output code, and hence the maximum number of entries in the table is usually fixed and once this limit is reached, no more entries are added.
Communicating a higher-level model of the image than pixels is an active area of research. The idea is to have the transmitter and receiver agree on the basic model for the image; the transmitter then sends parameters to manipulate this model in lieu of picture elements themselves. Model-based decoders are similar to computer graphics rendering programs.
The model-based coder trades generality for extreme efficiency in its restricted domain. Better rendering and extending of the domain are research themes.
An electronic device for converting between serial data (typically RS-232) from a computer and an audio signal suitable for transmission over telephone lines. The audio signal is usually composed of silence (no data) or one of two frequencies representing 0 and 1. Modems are distinguished primarily by the baud rates they support which can range from 75 baud up to 19200 and beyond.
Data to the computer is sometimes at a lower rate than data from the computer on the assumption that the user cannot type more than a few characters per second. Various data compression and error algorithms are required to support the highest speeds. Other optional features are auto-dial (auto-call) and auto-answer which allow the computer to initiate and accept calls without human intervention.
National Association of Broadcasters
Nippon Hoso Kyokai, principal japanese broadcaster
USA video standard with image format 4:3, 525 lines, 60 Hz and 4 Mhz video bandwidth with a total 6 Mhz of video channel width. NTSC uses YIQ
The Open Systems Interconnection Reference Model was formally initiated by the International Organization for Standardization (ISO) in March, 1977, in response to the international need for an open set of communications standards. OSI's objectives are:
The model is similar in structure to that of SNA. It consists of seven architectural layers: the physical layer; the data link layer, the network layer; the transport layer; the session layer; the presentation layer; the application layer.
The physical and data link layers provide the same functions as their SNA counterparts (physical control and data link control layers). The network layer selects routing services, segments blocks and messages, and provides error detection, recovery, and notification.
The transport layer controls point-to-point information interchange, data packet size determination and transfer, and the connection/disconnection of session entities.
The session layer serves to organize and synchronize the application process dialog between presentation entities, manage the exchange of data (normal and expedited) during the session, and monitor the establishment/release of transport connections as requested by session entities.
The presentation layer is responsible for the meaningful display of information to application entities.
More specifically, the presentation layer identifies and negotiates the choice of communications transfer syntax and the subsequent data conversion or transformation as required. The application layer affords the interfacing of application processes to system interconnection facilities to assist with information exchange. The application layer is also responsible for the management of application processes including initialization, maintenance and termination of communications, allocation of costs and resources, prevention of deadlocks, and transmission security.
European video standard with image format 4:3, 625 lines, 50 Hz and 4 Mhz video bandwidth with a total 8 Mhz of video channel width. PAL uses YUV.
Quarter Common source Intermediate Format (1/4 CIF , e.g. 1180*144)
The uncompressed bit rates for transmitting QCIF at 29.97 frames/sec is 9.115 Mbit/s.
Region Coding has received attention because of the ease with which it can be decoded and the fact that a coder of this type is used in Intel's Digital Video Interactive system (DVI), the only commercially available system designed expressly for low-cost, low-bandwidth multimedia video. Its operation is relatively simple. The basic design is due to Kunt.
Envision a decoder that can reproduce certain image primitives well. A typical set might consist of rectangular areas of constant color, smooth shaded patches and some textures. The image is analyzed into regions that can be expressed in terms of these primitives. The analysis is usually performed using a tree-structured decomposition where each part of the image is successively divided into smaller regions until a patch that meets either the bandwidth constraints or the quality desired can be fitted. Only the tree description and the parameters for each leaf need then be transmitted. Since the decoder is optimized for the reconstruction of these primitives, it is relatively simple to build.
To account for image data that does not encode easily using the available primitives, actual image data can also be encoded and transmitted, but this is not as efficient as fitting a patch.
This coder can also be combined with prediction (as it is in DVI), and the predicted difference image can then be region coded. A key element in the encoding operation is a region growing step where adjacent image patches that are distinct leaves of the tree are combined into a single patch. This approach has been considered highly asymmetric in that significantly more processing is required for encoding/analysis than for decoding. It is harder to grow a tree than to climb one.
While hardware implementations of the hybrid DCT coder have been built for extremely low bandwidth teleconferencing and for HDTV, there is no hardware for a region coder. However, such an assessment is deceptive since much of the processing used in DVI compression is in the motion predictor, a function common to both methods. In fact, all compression schemes are asymmetric, the difference is a matter of degree rather than one of essentials.
Repeaters are transparent devices used to interconnect segments of an extended network with identical protocols and speeds at the physical layer (OSI layer 1). An example of a repeater connection would be the linkage of two carrier sense multiple access/collision detection (CSMA/CD) segments within a network.
Routers connect networks at OSI layer 3. Routers interpret packet contents according to specified protocol sets, serving to connect networks with the same protocols (DECnet to DECnet, TCP/IP (Transmission Control Protocol/Internet Protocol) to TCP/IP). Routers are protocol-dependent; therefore, one router is needed for each protocol used by the network. Routers are also responsible for the determination of the best path for data packets by routing them around failed segments of the network.
European video standard with image format 4:3, 625 lines, 50 Hz and 6 Mhz video bandwidth with a total 8 Mhz of video channel width.
SMPTE is the Society of Motion Picture and Television Engineers. There is an SMPTE time code standard (hr:min:sec:frame) used to identify video frames.
Systems network Architecture entered the market in 1974 as a hierarchical, single-host network structure. Since then, SNA has developed steadily in two directions. The first direction involved tying together mainframes and unintelligent terminals in a master-to-slave relationship. The second direction transformed the SNA architecture to support a cooperative-processing environment, whereby remote terminals link up with mainframes as well as each other in a peer-to-peer relationship (termed Low Entry Networking (LEN) by IBM). LEN depends on the implementation of two protocols: Logical Unit 6.2, also known as APPC, and Physical Unit 2.1 which affords point-to-point connectivity between peer nodes without requiring host computer control.
The SNA model is concerned with both logical and physical units. Logical units (LUs) serve as points of access by which users can utilize the network. LUs can be viewed as terminals that provide users access to application programs and other services on the network. Physical units (PUs) like LUs are not defined within SNA architecture, but instead, are representations of the devices and communication links of the network.
Any country have national standard body where experts from industry and universities develop standards for all kinds of engineering problems. Among them are, for instance,
The International Organization for Standardization, ISO, in Geneva is the head organization of all these national standardization bodies. Together with the International Electrotechnical Commission, IEC, ISO concentrates its efforts on harmonizing national standards all over the world. The results of these activities are published as ISO standards. Among them are, for instance, the metric system of units, international stationery sizes, all kinds of bolt nuts, rules for technical drawings, electrical connectors, security regulations, computer protocols, file formats, bicycle components, ID cards, programming languages, International Standard Book Numbers (ISBN), ... Over 10,000 ISO standards have been published so far and you surely get in contact with a lot of things each day that conform to ISO standards you never heard of. By the way, "ISO" is not an acronym for the organization in any language. It's a wordplay based on the English/French initials and the Greek-derived prefix "iso-" meaning "same".
Within ISO, ISO/IEC Joint Technical Committee 1 (JTC1) deals with information technology.
The International Telecommunication Union, ITU, is the United Nations specialized agency dealing with telecommunications. At present there are 164 member countries. One of its bodies is the International Telegraph and Telephone Consultative Committee, CCITT. A Plenary Assembly of the CCITT, which takes place every few years, draws up a list of 'Questions' about possible improvements in international electronic communication. In Study Groups, experts from different countries develop 'Recommendations' which are published after they have been adopted. Especially relevant to computing are the V series of recommendations on modems (e.g. V.32, V.42), the X series on data networks and OSI (e.g. X.25, X.400), the I and Q series that define ISDN, the Z series that defines specification and programming languages (SDL, CHILL), the T series on text communication (teletext, fax, videotext, ODA) and the H series on digital sound and video encoding.
Since 1961, the European Computer Manufacturers Association, ECMA, has been a forum for data processing experts where agreements have been prepared and submitted for standardization to ISO, CCITT and other standards organizations.
Using the CCIR 601 standard each picture occupies a large amount of storage space - especially when related to computer storage devices such as DRAM and disks - so much so that the numbers can become confusing unless a few benchmark statistics are remembered. Fortunately the units of mega, giga and tera make it easy to express the very large numbers involved. The capacities can all be worked out directly from the 601 standard. Bearing in mind that sync words and blanking can be re-generated and added at the output, only the active picture area need be stored.
For the 625 line TV standard the active picture is:
720 pixels (Y) + 360 pixels (Cr) + 360 pixels (Cb) = 1440
pixels/line. 576 active lines/picture means 1440 x 576 = 829,440
pixels per picture. Sampling at 8 bits the picture takes 829,440
bytes, or 830 kbytes, of storage. 1 second takes 830 x 25 = 20,750
kbytes, or 21 Mbytes.
For the 525 line TV standard the line data is:
720 pixels (Y) + 360 pixels (Cr) + 360 pixels (Cb) = 1440
pixels/line. 487 active lines/picture means 1440 x 487 = 701,280
pixels per picture.
Sampling at 8 bits the picture takes 701,280 bytes, or 701.3 kbytes of storage. 1 second takes 701.3 x 30 = 21,039 kbytes, or 21 Mbytes.
Thus both 625 and 525 line systems require approximately the same amount of storage for a given time.
1 minute takes 21 x 60 = 1,260 Mbytes, or 1.26 Gbytes
1 hour takes 1.26 x 60 = 76 Gbytes. This will accommodate both
standards.
It is also useful to remember that 1 Gbyte will hold 47 seconds of
video.
Note that the above figures apply to uncompressed video.
Sub-band coding for images has roots in work done in the 1950s by Bedford and on Mixed Highs image compression done by Kretzmer in 1954. Schreiber and Buckley explored general two channel coding of still pictures where the low spatial frequency channel was coarsely sampled and finely quantized and the high spatial frequency channel was finely sampled and coarsely quantized. More recently, Karlsson and Vetterli have extended this to multiple subbands. Adelson et al. have shown how a recursive subdivision called a pyramid decomposition can be used both for compression and other useful image processing tasks.
A pure sub-band coder performs a set of filtering operations on an image to divide it into spectral components. Usually, the result of the analysis phase is a set of sub-images, each of which represents some region in spatial or spatio-temporal frequency space. For example, in a still image, there might be a small sub-image that represents the low-frequency components of the input picture that is directly viewable as either a minified or blurred copy of the original. To this are added successively higher spectral bands that contain the edge information necessary to reproduce the original sharpness of the original at successively larger scales. As with DCT coder, to which it is related, much of the image energy is concentrated in the lowest frequency band.
For equal visual quality, each band need not be represented with the same signal-to-noise ratio; this is the basis for sub-band coder compression. In many coders, some bands are eliminated entirely, and others are often compressed with a vector or lattice quantizer. Succeedingly higher frequency bands are more coarsely quantized, analogous to the truncation of the high frequency coefficients of the DCT. A sub-band decomposition can be the intraframe coder in a predictive loop, thus minimizing the basic distinctions between DCT-based hybrid coders and their alternatives.
The T1Q1.5 Video Teleconferencing/Video Telephony (VTC/VT) ANSI Subworking Group (SWG) was formed to draft a performance standard for digital video. Important questions were asked, relating to video digital performance characteristics of video teleconferencing/video telephony :
The VTC/VT Subworking Group's goal is to answer these questions. It has become a first step to the process of constructing the performance standard.
Trellis coding is a source coding technique that has resulted in numerous publications and some very effective source codes. Unfortunately, the computational burden of these codes is tremendous and grows exponentially with the encoding rate.
A trellis is a transition diagram, that takes time into account, for a finite state machine. Populating a trellis means specifying output symbols for each branch, specifying an initial state yields a set of allowable output sequences.
A trellis coder is defined as follows: given a trellis populated with symbols from an output alphabet and an input sequence x of length n, a trellis coder outputs the sequence of bits corresponding to the output sequence x that maximizes the SNR of the encoding.
A standard networking protocol suite approved by the CCITT and ISO. This protocol suite defines standard physical, link, and networking layers (OSI layers 1 through 3). X.25 networks are in use throughout the world.
The set of CCITT communications standards covering mail services provided by data networks.
Kodak's PhotoYCC colour space (for PhotoCD) is similar to YCbCr, except that Y is coded with lots of headroom and no footroom, and the scaling of Cb and Cr is different from that of Rec. 601-1 in order to accommodate a wider colour gamut:
The international standard CCIR-601-1 specifies eight-bit digital coding for component video, with black at luma code 16 and white at luma code 235, and chroma in eight-bit two's complement form centred on 128 with a peak at code 224. This coding has a slightly smaller excursion for luma than for chroma: luma has 219 risers compared to 224 for Cb and Cr. The notation CbCr distinguishes this set from PbPr where the luma and chroma excursions are identical.
For Rec. 601-1 coding in eight bits per component,
CCIR-601-1 Rec. calls for two-to-one horizontal subsampling of Cb and Cr, to achieve 2/3 the data rate of RGB with virtually no perceptible penalty. This is denoted 4:2:2. A few digital video systems have utilized horizontal subsampling by a factor of four, denoted 4:1:1. JPEG and MPEG normally subsample Cb and Cr two-to-one horizontally and also two-to-one vertically, to get 1/2 the data rate of RGB. No standard nomenclature has been adopted to describe vertical subsampling. To get good results using subsampling you should not just drop and replicate pixels, but implement proper decimation and interpolation filters.
YCbCr coding is employed by D-1 component digital video equipment.
If three components are to be conveyed in three separate channels with identical unity excursions, then the Pb and Pr colour difference components are used:
YPbPr is part of the CCIR Rec. 709 HDTV standard, although different luma coefficients are used, and it is denoted E'Pb and E'Pr with subscript arrangement too complicated to be written here.
YPbPr is employed by component analog video equipment such as M-II and BetaCam; Pb and Pr bandwidth is half that of luma.
The U and V signals above must be carried with equal bandwidth, albeit less than that of luma. However, the human visual system has less spatial acuity for magenta-green transitions than it does for red-cyan. Thus, if signals I and Q are formed from a 123 degree rotation of U and V respectively [sic], the Q signal can be more severely filtered than I (to about 600 kHz, compared to about 1.3 MHz) without being perceptible to a viewer at typical TV viewing distance. YIQ is equivalent to YUV with a 33 degree rotation and an axis flip in the UV plane. The first edition of W.K. Pratt "Digital Image Processing", and presumably other authors that follow that bible, has a matrix that erroneously omits the axis flip; the second edition corrects the error.
Since an analog NTSC decoder has no way of knowing whether the encoder was encoding YUV or YIQ, it cannot detect whether the encoder was running at 0 degree or 33 degree phase. In analog usage the terms YUV and YIQ are often used somewhat interchangeably. YIQ was important in the early days of NTSC but most broadcasting equipment now encodes equiband U and V.
The D-2 composite digital DVTR (and the associated interface standard) conveys NTSC modulated on the YIQ axes in the 525-line version and PAL modulated on the YUV axes in the 625-line version.
In composite NTSC, PAL or S-video systems, it is necessary to scale (B-Y) and (R-Y) so that the composite NTSC or PAL signal (luma plus modulated chroma) is contained within the range -1/3 to +4/3. These limits reflect the capability of composite signal recording or transmission channel. The scale factors are obtained by two simultaneous equations involving both B-Y and R-Y, because the limits of the composite excursion are reached at combinations of B-Y and R-Y that are intermediate to primary colours. The scale factors are as follows:
It is conventional for an NTSC luma signal in a composite environment (NTSC or S-video) to have 7.5% setup :
The two signals Y (or Y_setup) and C can be conveyed separately across an S-video interface, or Y and C can be combined (encoded) into composite NTSC or PAL:
Digital compression techniques are used to reduce the very large storage requirements and data speeds required for CCIR 601 digital video. Their applications include allowing PCs to handle video (as seen with off-line editing), >digital transmission and in some digital VTRs as a means of both reducing the required data rate and increasing capacity. The quality of the results varies widely. For digital equipment, using CCIR 601 has become a mark of the highest quality, to the point where quality issues disappear, but they are re-opened by compression. The user now needs to understand what compromises are made, or rather, what quality of result can be expected when using compression.
Compression is not new to TV. The PAL, NTSC and SECAM colour systems were all devised to compress the full bandwidth R, G and B signals from cameras and telecines into a single 5.5 or 4.2 MHz channel. Experience with such signals shows they are generally good for recording and transmission but have a more limited use in production and especially post production. Likewise digitally compressed pictures suffer some loss or degradation to the extent that many have ruled out their use in post production.
Digital image compression is relatively new. Early talk that innovative techniques, always just around the corner, would bring better and better quality led some to believe that perfection may be achieved. However, just as greater computing power and improved techniques will always fail to make weather forecasting 100% accurate, so it will always be with compression. Digital compression analyses pictures to resolve them into a series of patterns and colours. It makes assumptions about pictures just as forecasting makes assumptions about weather patterns - when both pictures and weather are naturally random.
In some applications the quality of the pictures is not too much of an issue. For example in off-line editing many regard it as only necessary to recognise the pictures to be able to make cutting decisions. In others, especially those which are in the main picture path from origination to final viewer, quality is carefully supervised. So how can the quality of compressed pictures be measured? At the moment there is no instrumental test. It is up to you and your eye to look at the pictures and make up your mind - is it good enough? But that's not all. Because pictures are random and compression is not, different pictures will produce different quality results. So where a system may produce quite acceptable results on slightly soft and clean pictures, sharp and noisy footage may look far worse. Then again, first generation compression may be acceptable but what happens if second, third, and fourth generations are used? If the same pictures are subjected to more than one type of compression what is the result? Some have claimed 'Betacam quality' but it is clear from the above that such a statement cannot be taken literally. MOSAIC (Methods for Optimisation and Subjective Assessment in Image Communications) is a European Union project with one of its aims being to produce test picture sequences and define assessment methods for subjectively analysing compressed and processed digital video systems. This is needed.
No techniques have been developed for directly processing compressed pictures. With intra frame compression, using only the data from one frame, as in JPEG, cuts can be made directly but a dissolve, DVE or any other effect requires the picture to be decompressed and the result re-compressed. If the compression is inter frame, taking data from several frames, such as MPEG , then even cuts will require the data to be decoded and re-coded. Not only does this raise the issue of multi-generation quality - just as it did with analogue VTRs - but it means the appropriate coder and decoder must be available.
It is clear that using compression adds significant equipment overheads so there have to be strong reasons for using it - and there are. In early applications compression provided a way for PCs to handle video. They are incapable of handling the continuous data rate of 21 Mbytes/second needed for CCIR 601 video - the world's digital television standard. The cost of non-compressed disk storage would make them an unattractive proposition for off-line use in nonlinear editing. In such applications, intra field compression ratios typically of between 10:1 and 100:1 are used. In some cases only a few TV fields are displayed per second.
Both Digital Betacam and DCT VTRs use compression, or data reduction, to reduce the quantity of video data to be recorded onto tape. Here the reduction is quite mild, at around 2:1, and excellent results are obtained but neither VTR system is instrumentally perfect.
Now the user has a choice from absolute quality of uncompressed recorders, D1 or D5 on tape, or disk based systems such as Henry,Edit Box or Hal, to various levels of compression to fit applications - and budgets. For example news footage is often shot on small format cameras and in conditions not best suited to high quality results. News Box uses a small amount of compression to provide on-line non-linear editing for news - where speed and flexibility are especially important.
As technology develops so ever higher capacity disk drives continue to become available and prices fall. It could be that the need to compress to save disk storage will fall away leaving only uncompressed systems in the future.
The one area that is destined to rely on compression is that of TV transmission - either by cable, fibre, or over the air by satellite or terrestrial means. Here there are real advantages in that several digital compressed channels can be transmitted in the bandwidth of one analogue channel. For HDTV it is compression that will be used to bring the pictures to our homes. As compression will be used in this area there is a case for avoiding it in the main production processes of quality programmes.
Compressed pictures will play a major part in the explosion of multimedia products now heading our way. To fit the pictures into the storage, down the data highways and into our PCs depends absolutely on compression. Whereas there are strong reasons why the use of compression may decline in the mainstream of TV production, it will become standard in transmission and multimedia.
www.belle-nuit.com archive |
10.7.98 |