Motion compensation
From Wikipedia, the free encyclopedia
| The introduction to this article provides insufficient context for those unfamiliar with the subject. Please help improve the article with a good introductory style. |
| This article does not cite any references or sources. (December 2007) Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. |
| The text in this article or section may be incoherent or very hard to understand, and should be reworded if the intended meaning can be determined. See the talk page for details. |
One method used by various video formats to reduce file size is motion compensation. For many frames of a movie, the only difference between one frame and another is the result of either the camera moving or an object in the frame moving. In reference to a video file, this means much of the information that represents one frame will be the same as the information used in the next frame. Motion compensation takes advantage of this to provide a way to create frames of a movie from a reference frame.[1] For example, in principle, if a movie is shot at 24 frames per second, motion compensation would allow the movie file to store the full information for every fourth frame. The only information stored for the frames in between would be the information needed to transform the previous frame into the next frame. If a frame of information is one MB in size, then uncompressed, one second of this film would be 24 MB in size. Using motion compensation, the file size for one second of the film could be reduced to a little over 6 MB.
More formally: in video compression, motion compensation is a technique for describing a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. When images can be accurately synthesized from previously transmitted/stored images then the compression efficiency can be improved.
[edit] Motion Compensation in MPEG
In MPEG, images are predicted from previous frames (P frames) or bidirectionally from previous and future frames (B frames). B frames are not so popular because the image sequence must be transmitted/stored out of order so that the future frame is available to generate the B frames.[2]
After predicting frames using motion compensation, the coder finds the error (residual) which is then compressed using the DCT and transmitted.
[edit] Global motion compensation
In global motion compensation, the motion model basically reflects camera motions such as dolly (forward, backwards), track (left, right), boom (up, down), pan (left, right), tilt (up, down) and roll (along the view axis). It works best for still scenes without moving objects. There are several advantages of global motion compensation:
- It models the dominant motion usually found in video sequences with just a few parameters. The share in bit-rate of these parameters is negligible.
- It does not partition the frames. This avoids artifacts at partition borders.
- A straight line (in the time direction) of pixels with equal spatial positions in the frame corresponds to a continuously moving point in the real scene. Other MC schemes introduce discontinuities in the time direction.
MPEG-4 ASP supports GMC with three reference points, although some implementations can only make use of one. A single reference point only allows for translational motion which for its relatively large performance cost provides little advantage over block based motion compensation.
Moving objects within a frame are not sufficiently represented by global motion compensation. Thus, local motion estimation is also needed.
[edit] Block motion compensation
In block motion compensation (BMC), the frames are partitioned in blocks of pixels (e.g. macroblocks of 16×16 pixels in MPEG). Each block is predicted from a block of equal size in the reference frame. The blocks are not transformed in any way apart from being shifted to the position of the predicted block. This shift is represented by a motion vector.
To exploit the redundancy between neighboring block vectors, (e.g. for a single moving object covered by multiple blocks) it is common to encode only the difference between the current and previous motion vector in the bit-stream. The result of this differencing process is mathematically equivalent to a global motion compensation capable of panning. Further down the encoding pipeline, an entropy coder will take advantage of the resulting statistical distribution of the motion vectors around the zero vector to reduce the output size.
It is possible to shift a block by a non-integer number of pixels, which is called sub-pixel precision. The in-between pixels are generated by interpolating neighboring pixels. Commonly, half-pixel or quarter pixel precision (Qpel, used by H.264 and MPEG-4/ASP) is used. The computational expense of sub-pixel precision is much higher due to the extra processing required for interpolation and on the encoder side, a much greater number of potential source blocks to be evaluated.
The main disadvantage of block motion compensation is that it introduces discontinuities at the block borders (blocking artifacts). These artifacts appear in the form of sharp horizontal and vertical edges which are easily spotted by the human eye and produce ringing effects (large coefficients in high frequency sub-bands) in the Fourier-related transform used for transform coding of the residual frames.
Block motion compensation divides up the current frame into non-overlapping blocks, and the motion compensation vector tells where those blocks come from (a common misconception is that the previous frame is divided up into non-overlapping blocks, and the motion compensation vectors tell where those blocks move to). The source blocks typically overlap in the source frame. Some video compression algorithms assemble the current frame out of pieces of several different previously-transmitted frames.
Frames can also be predicted from future frames. The future frames then need to be encoded before the predicted frames and thus, the encoding order does not necessarily match the real frame order. Such frames are usually predicted from two directions, i.e. from the I- or P-frames that immediately precede or follow the predicted frame. These bidirectionally predicted frames are called B-frames. A coding scheme could, for instance, be IBBPBBPBBPBB.
[edit] Variable block-size motion compensation
Variable block-size motion compensation (VBSMC) is the use of BMC with the ability for the encoder to dynamically select the size of the blocks. When coding video, the use of larger blocks can reduce the number of bits needed to represent the motion vectors, while the use of smaller blocks can result in a smaller amount of prediction residual information to encode. Older designs such as H.261 and MPEG-1 video typically use a fixed block size, while newer ones such as H.263, MPEG-4 Part 2, H.264/MPEG-4 AVC, and VC-1 give the encoder the ability to dynamically choose what block size will be used to represent the motion.
[edit] Overlapped block motion compensation
Overlapped block motion compensation (OBMC) is a good solution to these problems because it not only increases prediction accuracy but also avoids blocking artifacts. When using OBMC, blocks are typically twice as big in each dimension and overlap quadrant-wise with all 8 neighbouring blocks. Thus, each pixel belongs to 4 blocks. In such a scheme, there are 4 predictions for each pixel which are summed up to a weighted mean. For this purpose, blocks are associated with a window function that has the property that the sum of 4 overlapped windows is equal to 1 everywhere.
Studies of methods for reducing the complexity of OBMC have shown that the contribution to the window function is smallest for the diagonally-adjacent block. Reducing the weight for this contribution to zero and increasing the other weights by an equal amount leads to a substantial reduction in complexity without a large penalty in quality. In such a scheme, each pixel then belongs to 3 blocks rather than 4, and rather than using 8 neighboring blocks, only 4 are used for each block to be compensated. Such a scheme is found in the H.263 Annex F Advanced Prediction mode
[edit] Quarter Pixel (QPel) and Half Pixel motion compensation
In Motion Compensation, quarter or half samples are actually interpolated sub-samples caused by fractional motion vectors. Based on the vectors and full-samples, the sub-samples can be calculated by using bicubic or bilinear 2-D filtering. See subclause 8.4.2.2 "Fractional sample interpolation process" of the H.264 standard.
[edit] 3D Image Coding Techniques
In video, time is often considered as the third dimension. Still image coding techniques can be expanded to an extra dimension.
JPEG2000 uses wavelet and these can also used to encode the motion without gaps between blocks and in an adaptive way. Fractional pixel affine transformation leads to bleeding between adjacent pixels. If no higher internal resolution is used the delta images mostly fight against the image smearing out. The delta image can also be encoded as a wavelet, so that the borders of the adaptive blocks match.
Expanding the 8x8 JPEG blocks into the third dimension that is into 8x8x8 cubes and modifying the DCT more into a DFT enables compression of linear translations with speeds below and around one pixel per frame (sub-pixel precision).
[edit] External links
- A New FFT Architecture and Chip Design for Motion Compensation based on Phase Correlation
- DCT and DFT coefficients are related by simple factors
- DCT better than DFT also for video
- why DCT is better than DFT
- John Wiseman, An Introduction to MPEG Video Compression
- DCT and motion compensation
- Compatibility between DCT, motion compensation and other methods
[edit] Applications
- video compression
- change of framerate for playback of 24 frames per second movies on 60 Hz LCDs or 100 Hz interlaced cathode ray tubes
[edit] References
- ^ Video Coding Using Motion Compensation, Yao Wang, Polytechnic University, March 2006, presentation: http://eeweb.poly.edu/~yao/EL612/videocoding.pdf
- ^ Why do some people hate B-pictures?
[edit] External links
- Temporal Rate Conversion - article giving an overview of motion compensation techniques.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||

