Single-pass constant- and variable-bit-rate MPEG-2 video compression

IBM Journal of Research and Development, Jul 1999 by Mohsenian, N, Rajagopalan, R, Gonzales, C A

The robustness of any type of constant-bit-rate control algorithm is directly dependent on the speed with which it can respond to the changes of the video content of an incoming stream. For real-time applications in which an efficient hardware implementation must be realized with dedicated integrated circuits, the veracity of the RC algorithm may be severely tested under stressful conditions. This is because such customized solutions do not use any preprocessing functions to pre-analyze the nature of the stream; consequently, the availability of the true encoding parameters of a picture always lags the assigned values. A real-time RC algorithm must process and memorize a few GOPs of the same complexity before it can reach its optimality. We argue, on the basis of the following observations, that our new approach to CBR video compression, as formulated in Equation (22), offers a quicker response and better results than the conventional method of CBR encoding.

Consider an encoder that has processed a few "difficultto-encode" pictures and, further, that this new group

of pictures belong to the same GOP. The increase in picture difficulty may be due to the sudden (or gradual) appearance of a high degree of spatial image detail, an increase in the velocity of many moving objects of different scales, directional changes of objects, or some form of higher-order combinations. At the start of a new GOP, we must encode an I-picture. Since we have already compressed one difficult I-picture, there is a high probability that the next I-picture is also "difficult to encode," as reflected by the complexity of the previously analyzed picture. Our CBR scheme can adjust to this picture much more quickly by knowing that the I-picture is significantly different from the rest of the video sequence. This observation is made by comparing the output of the IIR filter against the estimated I-complexity. As a result, the I-picture consumes more bits than the normal bit allocation generated by a conventional method of CBR encoding. Therefore, a higher-quality I-picture is reconstructed at the decoder output. I-pictures are used as references to predict a block of pixels in P- and B-pictures. A better prediction is now obtained for the non-intracoded pictures, resulting in better reconstruction of such pictures at the cost of a small number of bits. Hence, we improve the perceptual quality of a GOP while still adhering to the constant bit rate of the stream. If an easy picture is to be encoded, it consumes fewer bits, and the remaining bit budget is used to encode the future pictures of a GOP. Overall, the number of bits and the picture quality average out over an "easy-to-encode" GOP. It should be noted that the human observer finds image distortions in "difficult-to-encode" pictures most annoying; for long video programs, a small degradation in "easy-toencode" pictures is tolerated. Further assessment of the argument presented in this section and comparison with the method described in Section 2 are provided in Section 5, which discusses the simulation results. The method in (26) is intended to provide an upper bound on the quality of pictures which belong to soft video segments. When soft segments are analyzed, it is very likely that we have (Qh, > Q). Therefore, a quantization sealer, larger than the values normally assigned by the CBR encoder, is assigned to pictures in soft segments. A VBR stream is produced by distributing the surplus bits among the hard segments of the video. Qfi, may be obtained through experimentation with a large number of sequences, but finding a near-optimum value

is difficult if not impossible. A better scheme can be formulated by calculating a Q,, sealer in real time using prior image statistics [7]. The concept behind the method in [7] is to combine the Q,, approach of [81 with that of the CBR RC algorithm.

Our real-time single-pass VBR encoder exploits an R-Q compression model to differentiate the degree of "hardness" or "softness" of video segments, each segment corresponding to a particular hyperbola similar to the one defined by (2). The actual encoding parameters of the video segments are computed along this hyperbola. We also use an R-Q perceptual model to prioritize the video segments in tern f visual importance. To satisfy the average rate of the 'BR stream, the position of the perceptual model must be changed over time. In this paper we specify two methods for meeting this condition. The perceptual model and the VBR RC algorithms are described next.

VBR rate-control algorithms

The efficiency of a single-pass VBR encoder is assessed by the speed with which its rate-control algorithm can learn and adjust itself to the "softness" or "hardness" of the video stream. For regions where image discontinuities or special effects occur, degradations in picture quality should be minimized. Since for single-pass encoding, image statistics are limited by the previously analyzed pictures, the learning rate of the RC algorithm must be adequate to predict the content of the future video intervals, yet not aggressive enough to result in algorithmic instabilities. One way to solve the twofold problem is to adjust the quality of the encoded stream for every time interval and let the RC algorithm learn the local content of each picture within that time interval.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with ProQuest