libavcodec.hg: snow.h annotate

annotate snow.h @ 12197:fbf4d5b1b664 libavcodec

Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag, FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that have been checked specifically on such CPUs and are actually faster than their MMX counterparts. In addition, use this flag to enable particular VP8 and LPC SSE2 functions that are faster than their MMX counterparts. Based on a patch by Loren Merritt <lorenm AT u washington edu>.

author	rbultje
date	Mon, 19 Jul 2010 22:38:23 +0000
parents	0f0cd6b5791f
children

rev	line source
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	1 /*
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	2 * Copyright (C) 2004 Michael Niedermayer <michaelni@gmx.at>
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	3 * Copyright (C) 2006 Robert Edele <yartrebo@earthlink.net>
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	4 *
3947 c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library' diego parents: 3582 diff changeset	5 * This file is part of FFmpeg.
c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library' diego parents: 3582 diff changeset	6 *
c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library' diego parents: 3582 diff changeset	7 * FFmpeg is free software; you can redistribute it and/or
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	8 * modify it under the terms of the GNU Lesser General Public
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	9 * License as published by the Free Software Foundation; either
3947 c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library' diego parents: 3582 diff changeset	10 * version 2.1 of the License, or (at your option) any later version.
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	11 *
3947 c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library' diego parents: 3582 diff changeset	12 * FFmpeg is distributed in the hope that it will be useful,
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	13 * but WITHOUT ANY WARRANTY; without even the implied warranty of
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	14 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	15 * Lesser General Public License for more details.
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	16 *
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	17 * You should have received a copy of the GNU Lesser General Public
3947 c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library' diego parents: 3582 diff changeset	18 * License along with FFmpeg; if not, write to the Free Software
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	19 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	20 */
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	21
7760 c4a4495715dd Globally rename the header inclusion guard names. stefano parents: 5830 diff changeset	22 #ifndef AVCODEC_SNOW_H
c4a4495715dd Globally rename the header inclusion guard names. stefano parents: 5830 diff changeset	23 #define AVCODEC_SNOW_H
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	24
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	25 #include "dsputil.h"
11485 0f0cd6b5791f Separate DWT from snow and dsputil mru parents: 11460 diff changeset	26 #include "dwt.h"
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	27
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	28 #define MID_STATE 128
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	29
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	30 #define MAX_PLANES 4
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	31 #define QSHIFT 5
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	32 #define QROOT (1<<QSHIFT)
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	33 #define LOSSLESS_QLOG -128
5587 3ae03eacbe9f use 16bit IDWT (a SIMD implementation of it should be >2x faster then with michael parents: 5565 diff changeset	34 #define FRAC_BITS 4
3314 aea2230e6033 Snow multiple reference frames lorenm parents: 3223 diff changeset	35 #define MAX_REF_FRAMES 8
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	36
3206 c1add9fe5c65 Snow mmx + sse2 part 2 corey parents: 3198 diff changeset	37 #define LOG2_OBMC_MAX 8
3198 6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	38 #define OBMC_MAX (1<<(LOG2_OBMC_MAX))
6b9f0c4fbdbe First part of a series of speed-enchancing patches. gpoirier parents: diff changeset	39
3223 8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	40 /* C bits used by mmx/sse2/altivec */
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	41
5587 3ae03eacbe9f use 16bit IDWT (a SIMD implementation of it should be >2x faster then with michael parents: 5565 diff changeset	42 static av_always_inline void snow_interleave_line_header(int * i, int width, IDWTELEM * low, IDWTELEM * high){
3223 8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	43 (*i) = (width) - 2;
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	44
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	45 if (width & 1){
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	46 low[(i)+1] = low[((i)+1)>>1];
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	47 (*i)--;
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	48 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	49 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	50
5587 3ae03eacbe9f use 16bit IDWT (a SIMD implementation of it should be >2x faster then with michael parents: 5565 diff changeset	51 static av_always_inline void snow_interleave_line_footer(int * i, IDWTELEM * low, IDWTELEM * high){
3223 8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	52 for (; (i)>=0; (i)-=2){
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	53 low[(i)+1] = high[(i)>>1];
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	54 low[i] = low[(i)>>1];
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	55 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	56 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	57
5587 3ae03eacbe9f use 16bit IDWT (a SIMD implementation of it should be >2x faster then with michael parents: 5565 diff changeset	58 static av_always_inline void snow_horizontal_compose_lift_lead_out(int i, IDWTELEM * dst, IDWTELEM * src, IDWTELEM * ref, int width, int w, int lift_high, int mul, int add, int shift){
3223 8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	59 for(; i<w; i++){
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	60 dst[i] = src[i] - ((mul * (ref[i] + ref[i + 1]) + add) >> shift);
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	61 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	62
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	63 if((width^lift_high)&1){
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	64 dst[w] = src[w] - ((mul * 2 * ref[w] + add) >> shift);
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	65 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	66 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	67
5587 3ae03eacbe9f use 16bit IDWT (a SIMD implementation of it should be >2x faster then with michael parents: 5565 diff changeset	68 static av_always_inline void snow_horizontal_compose_liftS_lead_out(int i, IDWTELEM * dst, IDWTELEM * src, IDWTELEM * ref, int width, int w){
3223 8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	69 for(; i<w; i++){
5565 93082c591c8b Change rounding of the horizontal DWT to match the vertical one. michael parents: 5552 diff changeset	70 dst[i] = src[i] + ((ref[i] + ref[(i+1)]+W_BO + 4 * src[i]) >> W_BS);
3223 8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	71 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	72
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	73 if(width&1){
5565 93082c591c8b Change rounding of the horizontal DWT to match the vertical one. michael parents: 5552 diff changeset	74 dst[w] = src[w] + ((2 * ref[w] + W_BO + 4 * src[w]) >> W_BS);
3223 8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	75 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	76 }
8f048c3295ff altivec support for snow lu_zero parents: 3206 diff changeset	77
7760 c4a4495715dd Globally rename the header inclusion guard names. stefano parents: 5830 diff changeset	78 #endif /* AVCODEC_SNOW_H */

Mercurial > libavcodec.hg

annotate snow.h @ 12197:fbf4d5b1b664 libavcodec