Abstract:Motion estimation (ME) plays an important role in modern video coders since it consumes approximately 60–80% of the entire encoder’s computations. In this paper, three novel techniques are proposed to effectively speed up the ME process. First, a smart prediction technique for effectively deciding an initial search center is proposed. Second, a zero motion prejudgment technique is proposed to accurately decide whether the pre-estimated ISC can be considered as a best match motion vector (MV) and consequently save the required computations for the MV refinement process. Finally, a variable padding pixels ME technique is proposed to adaptively determine the number of padding pixels required for the search window for more computational cost savings. The three techniques are combined and applied to the block-based ME for a superior computational complexity savings in the ME process. The performance of the proposed techniques is tested in both the pixel domain ME and the frequency domain ME in terms of their quantitative visual quality (peak signal-to-noise ratio, PSNR), their computational complexity, and their bit rate. Experimental results demonstrate that the proposed fast ME technique is able to achieve approximately a 99.4% reduction in ME time compared to the conventional full search block-based ME (FSSBB-ME) with negligible degradation in both the PSNR and the bit rate. Additionally, the experimental results prove the effectiveness of the proposed techniques if they are combined with any block based ME technique such as the fast extended diamond enhanced predictive zonal search. Experimental results also demonstrate that there is at least an additional savings of 72% in ME time using the conventional discrete cosine transform phase correlation ME (DCT-PC-ME) in the frequency domain compared to the conventional FSBB-ME technique in pixel domain. Compared to the conventional DCT-PC-ME, applying the proposed novel techniques to the DCT-PC-ME saves up to 89% in ME time.
Abstract: H.264/AVC is a new video coding standard of the ITU-T Video Coding Expert Group which has a significant improvement in the rate distortion efficiency compared with the previous standards. However, there is an exhaustive motion search across multiple block sizes and multiple reference frames leading to a linear increase in processing time. Although, the encoding quality is improved, the complexity of the encoder and computational cost are also increased at the same time. In this paper, we reduce the computational cost by reducing the number of candidate pixels required for the sum of absolute difference for each block (SAD) using two early stop search techniques. These techniques are applied on two scan search patterns (raster and spiral search patterns) and compared with the conventional Full Search (FS), Three Step Search (TSS), and Diamond Search (DS) algorithms. Results show that there is at least 98% reduction in computations with a maximum loss of 0.1dB compared with the conventional FS algorithm.
 
Abstract: For JPEG2000 real-time applications, Embedded Block Coding with Optimized Truncation (EBCOT) is a time consuming part and becoming a bottleneck for the entire system throughput. Since Arithmetic Encoder (AE) is one part of EBCOT, low performance of AE can significantly degrade the performance of EBCOT. AE is inherently a serial process with high dependency and parallelization of AE is difficult. To achieve high system throughput, some pipelined AE architectures were proposed. No matter what the pipelined architectures in all the previous work are, one thing is kept the same: only one context is processed in one clock cycle and the system throughput is the same as the clock rate. In this paper, a partial parallel algorithm for AE is proposed. One distinct characteristic of the proposed algorithm is that two contexts can be processed in one clock cycle. Based on the proposed algorithm, a pipelined architecture is implemented. Experimental results, with standard test image benchmarks, show that the proposed algorithm and architecture achieves about 24% improvement in the system throughput by comparison with the architecture based on the original AE algorithm.
Abstract:Power consumption is very critical for portable video applications. During compression, the motion estimation unit consumes the largest portion of power since it performs a huge amount of computation. Different low power architectures for implementing the full-search block-matching (FSBM) motion estimation are discussed. Also, architectural enhancements to further reduce the power consumed during FSBM motion estimation without sacrificing throughput or optimality are presented. The proposed approach achieves these power savings by disabling portions of the architecture that perform unnecessary computations. A comparison between the different architectures including our enhancements and others is presented using simulation and analytical analysis. Different benchmarks are used to test and compare the discussed architectures. Analytical and simulation results show the effectiveness of the enhancements.