Mercurial > libavcodec.hg
view h264dspenc.c @ 12492:58a960d6e34c libavcodec
Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.
Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.
author | rbultje |
---|---|
date | Tue, 14 Sep 2010 13:36:26 +0000 |
parents | 7dd2a45249a9 |
children |
line wrap: on
line source
/* * H.264/MPEG-4 Part 10 (Base profile) encoder. * * DSP functions * * Copyright (c) 2006 Expertisecentrum Digitale Media, UHasselt * * FFmpeg is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * FFmpeg is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with FFmpeg; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ /** * @file * H.264 encoder related DSP utils * */ #include "dsputil.h" #define H264_DCT_PART1(X) \ a = block[0][X]+block[3][X]; \ c = block[0][X]-block[3][X]; \ b = block[1][X]+block[2][X]; \ d = block[1][X]-block[2][X]; \ pieces[0][X] = a+b; \ pieces[2][X] = a-b; \ pieces[1][X] = (c<<1)+d; \ pieces[3][X] = c-(d<<1); #define H264_DCT_PART2(X) \ a = pieces[X][0]+pieces[X][3]; \ c = pieces[X][0]-pieces[X][3]; \ b = pieces[X][1]+pieces[X][2]; \ d = pieces[X][1]-pieces[X][2]; \ block[0][X] = a+b; \ block[2][X] = a-b; \ block[1][X] = (c<<1)+d; \ block[3][X] = c-(d<<1); /** * Transform the provided matrix using the H.264 modified DCT. * @note * we'll always work with transposed input blocks, to avoid having to make a * distinction between C and mmx implementations. * * @param block transposed input block */ static void h264_dct_c(DCTELEM block[4][4]) { DCTELEM pieces[4][4]; DCTELEM a, b, c, d; H264_DCT_PART1(0); H264_DCT_PART1(1); H264_DCT_PART1(2); H264_DCT_PART1(3); H264_DCT_PART2(0); H264_DCT_PART2(1); H264_DCT_PART2(2); H264_DCT_PART2(3); } av_cold void ff_h264dspenc_init(DSPContext* c, AVCodecContext *avctx) { c->h264_dct = h264_dct_c; }