annotate libpostproc/postprocess_template.c @ 2601:c31a28f27d9a libavcodec

increasing precission of the quantization parameter this is needed as the quantization stepsize for each subband is also in this precission and insignificant changes to the wavelet like scaling its coefficients slightly differently would lead to wildly variing PSNR and bitrate note, a encoder could also simply choose to leave the least significant bits of the quantization parameters zero which would give the exact previous behaviour except a y very tiny number of bits in the header
author michael
date Sat, 09 Apr 2005 22:15:48 +0000
parents ace6e273f318
children 240e17c3cb2d
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1 /*
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
2 Copyright (C) 2001-2002 Michael Niedermayer (michaelni@gmx.at)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
4 This program is free software; you can redistribute it and/or modify
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
5 it under the terms of the GNU General Public License as published by
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
6 the Free Software Foundation; either version 2 of the License, or
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
7 (at your option) any later version.
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
8
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
9 This program is distributed in the hope that it will be useful,
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
10 but WITHOUT ANY WARRANTY; without even the implied warranty of
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
12 GNU General Public License for more details.
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
13
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
14 You should have received a copy of the GNU General Public License
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
15 along with this program; if not, write to the Free Software
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
16 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
17 */
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
18
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
19 /**
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
20 * @file postprocess_template.c
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
21 * mmx/mmx2/3dnow postprocess code.
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
22 */
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
23
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
24
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
25 #ifdef ARCH_X86_64
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
26 # define REGa rax
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
27 # define REGc rcx
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
28 # define REGd rdx
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
29 # define REG_a "rax"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
30 # define REG_c "rcx"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
31 # define REG_d "rdx"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
32 # define REG_SP "rsp"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
33 # define ALIGN_MASK "$0xFFFFFFFFFFFFFFF8"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
34 #else
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
35 # define REGa eax
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
36 # define REGc ecx
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
37 # define REGd edx
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
38 # define REG_a "eax"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
39 # define REG_c "ecx"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
40 # define REG_d "edx"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
41 # define REG_SP "esp"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
42 # define ALIGN_MASK "$0xFFFFFFF8"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
43 #endif
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
44
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
45
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
46 #undef PAVGB
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
47 #undef PMINUB
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
48 #undef PMAXUB
104
9607b48e2c2d Cleanup:
arpi
parents: 102
diff changeset
49
9607b48e2c2d Cleanup:
arpi
parents: 102
diff changeset
50 #ifdef HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
51 #define REAL_PAVGB(a,b) "pavgb " #a ", " #b " \n\t"
104
9607b48e2c2d Cleanup:
arpi
parents: 102
diff changeset
52 #elif defined (HAVE_3DNOW)
2295
rfelker
parents: 2293
diff changeset
53 #define REAL_PAVGB(a,b) "pavgusb " #a ", " #b " \n\t"
104
9607b48e2c2d Cleanup:
arpi
parents: 102
diff changeset
54 #endif
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
55 #define PAVGB(a,b) REAL_PAVGB(a,b)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
56
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
57 #ifdef HAVE_MMX2
2c469e390117 dering in c
michael
parents: 133
diff changeset
58 #define PMINUB(a,b,t) "pminub " #a ", " #b " \n\t"
2c469e390117 dering in c
michael
parents: 133
diff changeset
59 #elif defined (HAVE_MMX)
2c469e390117 dering in c
michael
parents: 133
diff changeset
60 #define PMINUB(b,a,t) \
2c469e390117 dering in c
michael
parents: 133
diff changeset
61 "movq " #a ", " #t " \n\t"\
2c469e390117 dering in c
michael
parents: 133
diff changeset
62 "psubusb " #b ", " #t " \n\t"\
2c469e390117 dering in c
michael
parents: 133
diff changeset
63 "psubb " #t ", " #a " \n\t"
2c469e390117 dering in c
michael
parents: 133
diff changeset
64 #endif
2c469e390117 dering in c
michael
parents: 133
diff changeset
65
2c469e390117 dering in c
michael
parents: 133
diff changeset
66 #ifdef HAVE_MMX2
2c469e390117 dering in c
michael
parents: 133
diff changeset
67 #define PMAXUB(a,b) "pmaxub " #a ", " #b " \n\t"
2c469e390117 dering in c
michael
parents: 133
diff changeset
68 #elif defined (HAVE_MMX)
2c469e390117 dering in c
michael
parents: 133
diff changeset
69 #define PMAXUB(a,b) \
2c469e390117 dering in c
michael
parents: 133
diff changeset
70 "psubusb " #a ", " #b " \n\t"\
2c469e390117 dering in c
michael
parents: 133
diff changeset
71 "paddb " #a ", " #b " \n\t"
2c469e390117 dering in c
michael
parents: 133
diff changeset
72 #endif
2c469e390117 dering in c
michael
parents: 133
diff changeset
73
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
74 //FIXME? |255-0| = 1 (shouldnt be a problem ...)
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
75 #ifdef HAVE_MMX
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
76 /**
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
77 * Check if the middle 8x8 Block in the given 8x16 block is flat
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
78 */
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
79 static inline int RENAME(vertClassify)(uint8_t src[], int stride, PPContext *c){
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
80 int numEq= 0, dcOk;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
81 src+= stride*4; // src points to begin of the 8x8 Block
119
b2f0e40866b1 optimizations (+2% speedup)
michael
parents: 118
diff changeset
82 asm volatile(
1331
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
83 "movq %0, %%mm7 \n\t"
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
84 "movq %1, %%mm6 \n\t"
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
85 : : "m" (c->mmxDcOffset[c->nonBQP]), "m" (c->mmxDcThreshold[c->nonBQP])
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
86 );
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
87
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
88 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
89 "lea (%2, %3), %%"REG_a" \n\t"
119
b2f0e40866b1 optimizations (+2% speedup)
michael
parents: 118
diff changeset
90 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
91 // %1 eax eax+%2 eax+2%2 %1+4%2 ecx ecx+%2 ecx+2%2 %1+8%2 ecx+4%2
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
92
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
93 "movq (%2), %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
94 "movq (%%"REG_a"), %%mm1 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
95 "movq %%mm0, %%mm3 \n\t"
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
96 "movq %%mm0, %%mm4 \n\t"
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
97 PMAXUB(%%mm1, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
98 PMINUB(%%mm1, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
99 "psubb %%mm1, %%mm0 \n\t" // mm0 = differnece
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
100 "paddb %%mm7, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
101 "pcmpgtb %%mm6, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
102
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
103 "movq (%%"REG_a",%3), %%mm2 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
104 PMAXUB(%%mm2, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
105 PMINUB(%%mm2, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
106 "psubb %%mm2, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
107 "paddb %%mm7, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
108 "pcmpgtb %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
109 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
110
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
111 "movq (%%"REG_a", %3, 2), %%mm1 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
112 PMAXUB(%%mm1, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
113 PMINUB(%%mm1, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
114 "psubb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
115 "paddb %%mm7, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
116 "pcmpgtb %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
117 "paddb %%mm2, %%mm0 \n\t"
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
118
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
119 "lea (%%"REG_a", %3, 4), %%"REG_a" \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
120
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
121 "movq (%2, %3, 4), %%mm2 \n\t"
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
122 PMAXUB(%%mm2, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
123 PMINUB(%%mm2, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
124 "psubb %%mm2, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
125 "paddb %%mm7, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
126 "pcmpgtb %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
127 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
128
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
129 "movq (%%"REG_a"), %%mm1 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
130 PMAXUB(%%mm1, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
131 PMINUB(%%mm1, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
132 "psubb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
133 "paddb %%mm7, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
134 "pcmpgtb %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
135 "paddb %%mm2, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
136
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
137 "movq (%%"REG_a", %3), %%mm2 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
138 PMAXUB(%%mm2, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
139 PMINUB(%%mm2, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
140 "psubb %%mm2, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
141 "paddb %%mm7, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
142 "pcmpgtb %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
143 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
144
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
145 "movq (%%"REG_a", %3, 2), %%mm1 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
146 PMAXUB(%%mm1, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
147 PMINUB(%%mm1, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
148 "psubb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
149 "paddb %%mm7, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
150 "pcmpgtb %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
151 "paddb %%mm2, %%mm0 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
152 "psubusb %%mm3, %%mm4 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
153
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
154 " \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
155 #ifdef HAVE_MMX2
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
156 "pxor %%mm7, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
157 "psadbw %%mm7, %%mm0 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
158 #else
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
159 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
160 "psrlw $8, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
161 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
162 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
163 "psrlq $16, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
164 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
165 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
166 "psrlq $32, %%mm0 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
167 "paddb %%mm1, %%mm0 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
168 #endif
1331
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
169 "movq %4, %%mm7 \n\t" // QP,..., QP
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
170 "paddusb %%mm7, %%mm7 \n\t" // 2QP ... 2QP
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
171 "psubusb %%mm7, %%mm4 \n\t" // Diff <= 2QP -> 0
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
172 "packssdw %%mm4, %%mm4 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
173 "movd %%mm0, %0 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
174 "movd %%mm4, %1 \n\t"
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
175
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
176 : "=r" (numEq), "=r" (dcOk)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
177 : "r" (src), "r" ((long)stride), "m" (c->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
178 : "%"REG_a
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
179 );
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
180
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
181 numEq= (-numEq) &0xFF;
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
182 if(numEq > c->ppMode.flatnessThreshold){
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
183 if(dcOk) return 0;
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
184 else return 1;
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
185 }else{
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
186 return 2;
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
187 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
188 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
189 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
190
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
191 /**
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
192 * Do a vertical low pass filter on the 8x16 block (only write to the 8x8 block in the middle)
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
193 * using the 9-Tap Filter (1,1,2,2,4,2,2,1,1)/16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
194 */
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
195 #ifndef HAVE_ALTIVEC
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
196 static inline void RENAME(doVertLowPass)(uint8_t *src, int stride, PPContext *c)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
197 {
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
198 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
199 src+= stride*3;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
200 asm volatile( //"movv %0 %1 %2\n\t"
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
201 "movq %2, %%mm0 \n\t" // QP,..., QP
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
202 "pxor %%mm4, %%mm4 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
203
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
204 "movq (%0), %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
205 "movq (%0, %1), %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
206 "movq %%mm5, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
207 "movq %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
208 "psubusb %%mm6, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
209 "psubusb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
210 "por %%mm5, %%mm2 \n\t" // ABS Diff of lines
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
211 "psubusb %%mm0, %%mm2 \n\t" // diff <= QP -> 0
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
212 "pcmpeqb %%mm4, %%mm2 \n\t" // diff <= QP -> FF
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
213
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
214 "pand %%mm2, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
215 "pandn %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
216 "por %%mm2, %%mm6 \n\t"// First Line to Filter
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
217
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
218 "movq (%0, %1, 8), %%mm5 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
219 "lea (%0, %1, 4), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
220 "lea (%0, %1, 8), %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
221 "sub %1, %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
222 "add %1, %0 \n\t" // %0 points to line 1 not 0
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
223 "movq (%0, %1, 8), %%mm7 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
224 "movq %%mm5, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
225 "movq %%mm7, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
226 "psubusb %%mm7, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
227 "psubusb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
228 "por %%mm5, %%mm2 \n\t" // ABS Diff of lines
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
229 "psubusb %%mm0, %%mm2 \n\t" // diff <= QP -> 0
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
230 "pcmpeqb %%mm4, %%mm2 \n\t" // diff <= QP -> FF
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
231
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
232 "pand %%mm2, %%mm7 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
233 "pandn %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
234 "por %%mm2, %%mm7 \n\t" // First Line to Filter
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
235
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
236
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
237 // 1 2 3 4 5 6 7 8
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
238 // %0 %0+%1 %0+2%1 eax %0+4%1 eax+2%1 ecx eax+4%1
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
239 // 6 4 2 2 1 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
240 // 6 4 4 2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
241 // 6 8 2
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
242
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
243 "movq (%0, %1), %%mm0 \n\t" // 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
244 "movq %%mm0, %%mm1 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
245 PAVGB(%%mm6, %%mm0) //1 1 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
246 PAVGB(%%mm6, %%mm0) //3 1 /4
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
247
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
248 "movq (%0, %1, 4), %%mm2 \n\t" // 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
249 "movq %%mm2, %%mm5 \n\t" // 1
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
250 PAVGB((%%REGa), %%mm2) // 11 /2
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
251 PAVGB((%0, %1, 2), %%mm2) // 211 /4
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
252 "movq %%mm2, %%mm3 \n\t" // 211 /4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
253 "movq (%0), %%mm4 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
254 PAVGB(%%mm4, %%mm3) // 4 211 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
255 PAVGB(%%mm0, %%mm3) //642211 /16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
256 "movq %%mm3, (%0) \n\t" // X
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
257 // mm1=2 mm2=3(211) mm4=1 mm5=5 mm6=0 mm7=9
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
258 "movq %%mm1, %%mm0 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
259 PAVGB(%%mm6, %%mm0) //1 1 /2
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
260 "movq %%mm4, %%mm3 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
261 PAVGB((%0,%1,2), %%mm3) // 1 1 /2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
262 PAVGB((%%REGa,%1,2), %%mm5) // 11 /2
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
263 PAVGB((%%REGa), %%mm5) // 211 /4
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
264 PAVGB(%%mm5, %%mm3) // 2 2211 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
265 PAVGB(%%mm0, %%mm3) //4242211 /16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
266 "movq %%mm3, (%0,%1) \n\t" // X
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
267 // mm1=2 mm2=3(211) mm4=1 mm5=4(211) mm6=0 mm7=9
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
268 PAVGB(%%mm4, %%mm6) //11 /2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
269 "movq (%%"REG_c"), %%mm0 \n\t" // 1
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
270 PAVGB((%%REGa, %1, 2), %%mm0) // 11/2
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
271 "movq %%mm0, %%mm3 \n\t" // 11/2
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
272 PAVGB(%%mm1, %%mm0) // 2 11/4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
273 PAVGB(%%mm6, %%mm0) //222 11/8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
274 PAVGB(%%mm2, %%mm0) //22242211/16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
275 "movq (%0, %1, 2), %%mm2 \n\t" // 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
276 "movq %%mm0, (%0, %1, 2) \n\t" // X
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
277 // mm1=2 mm2=3 mm3=6(11) mm4=1 mm5=4(211) mm6=0(11) mm7=9
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
278 "movq (%%"REG_a", %1, 4), %%mm0 \n\t" // 1
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
279 PAVGB((%%REGc), %%mm0) // 11 /2
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
280 PAVGB(%%mm0, %%mm6) //11 11 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
281 PAVGB(%%mm1, %%mm4) // 11 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
282 PAVGB(%%mm2, %%mm1) // 11 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
283 PAVGB(%%mm1, %%mm6) //1122 11 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
284 PAVGB(%%mm5, %%mm6) //112242211 /16
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
285 "movq (%%"REG_a"), %%mm5 \n\t" // 1
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
286 "movq %%mm6, (%%"REG_a") \n\t" // X
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
287 // mm0=7(11) mm1=2(11) mm2=3 mm3=6(11) mm4=1(11) mm5=4 mm7=9
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
288 "movq (%%"REG_a", %1, 4), %%mm6 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
289 PAVGB(%%mm7, %%mm6) // 11 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
290 PAVGB(%%mm4, %%mm6) // 11 11 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
291 PAVGB(%%mm3, %%mm6) // 11 2211 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
292 PAVGB(%%mm5, %%mm2) // 11 /2
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
293 "movq (%0, %1, 4), %%mm4 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
294 PAVGB(%%mm4, %%mm2) // 112 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
295 PAVGB(%%mm2, %%mm6) // 112242211 /16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
296 "movq %%mm6, (%0, %1, 4) \n\t" // X
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
297 // mm0=7(11) mm1=2(11) mm2=3(112) mm3=6(11) mm4=5 mm5=4 mm7=9
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
298 PAVGB(%%mm7, %%mm1) // 11 2 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
299 PAVGB(%%mm4, %%mm5) // 11 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
300 PAVGB(%%mm5, %%mm0) // 11 11 /4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
301 "movq (%%"REG_a", %1, 2), %%mm6 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
302 PAVGB(%%mm6, %%mm1) // 11 4 2 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
303 PAVGB(%%mm0, %%mm1) // 11224222 /16
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
304 "movq %%mm1, (%%"REG_a", %1, 2) \n\t" // X
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
305 // mm2=3(112) mm3=6(11) mm4=5 mm5=4(11) mm6=6 mm7=9
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
306 PAVGB((%%REGc), %%mm2) // 112 4 /8
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
307 "movq (%%"REG_a", %1, 4), %%mm0 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
308 PAVGB(%%mm0, %%mm6) // 1 1 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
309 PAVGB(%%mm7, %%mm6) // 1 12 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
310 PAVGB(%%mm2, %%mm6) // 1122424 /4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
311 "movq %%mm6, (%%"REG_c") \n\t" // X
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
312 // mm0=8 mm3=6(11) mm4=5 mm5=4(11) mm7=9
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
313 PAVGB(%%mm7, %%mm5) // 11 2 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
314 PAVGB(%%mm7, %%mm5) // 11 6 /8
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
315
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
316 PAVGB(%%mm3, %%mm0) // 112 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
317 PAVGB(%%mm0, %%mm5) // 112246 /16
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
318 "movq %%mm5, (%%"REG_a", %1, 4) \n\t" // X
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
319 "sub %1, %0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
320
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
321 :
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
322 : "r" (src), "r" ((long)stride), "m" (c->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
323 : "%"REG_a, "%"REG_c
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
324 );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
325 #else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
326 const int l1= stride;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
327 const int l2= stride + l1;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
328 const int l3= stride + l2;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
329 const int l4= stride + l3;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
330 const int l5= stride + l4;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
331 const int l6= stride + l5;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
332 const int l7= stride + l6;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
333 const int l8= stride + l7;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
334 const int l9= stride + l8;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
335 int x;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
336 src+= stride*3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
337 for(x=0; x<BLOCK_SIZE; x++)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
338 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
339 const int first= ABS(src[0] - src[l1]) < c->QP ? src[0] : src[l1];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
340 const int last= ABS(src[l8] - src[l9]) < c->QP ? src[l9] : src[l8];
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
341
2038
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
342 int sums[10];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
343 sums[0] = 4*first + src[l1] + src[l2] + src[l3] + 4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
344 sums[1] = sums[0] - first + src[l4];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
345 sums[2] = sums[1] - first + src[l5];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
346 sums[3] = sums[2] - first + src[l6];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
347 sums[4] = sums[3] - first + src[l7];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
348 sums[5] = sums[4] - src[l1] + src[l8];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
349 sums[6] = sums[5] - src[l2] + last;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
350 sums[7] = sums[6] - src[l3] + last;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
351 sums[8] = sums[7] - src[l4] + last;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
352 sums[9] = sums[8] - src[l5] + last;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
353
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
354 src[l1]= (sums[0] + sums[2] + 2*src[l1])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
355 src[l2]= (sums[1] + sums[3] + 2*src[l2])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
356 src[l3]= (sums[2] + sums[4] + 2*src[l3])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
357 src[l4]= (sums[3] + sums[5] + 2*src[l4])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
358 src[l5]= (sums[4] + sums[6] + 2*src[l5])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
359 src[l6]= (sums[5] + sums[7] + 2*src[l6])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
360 src[l7]= (sums[6] + sums[8] + 2*src[l7])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
361 src[l8]= (sums[7] + sums[9] + 2*src[l8])>>4;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
362
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
363 src++;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
364 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
365 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
366 }
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
367 #endif //HAVE_ALTIVEC
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
368
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
369 #if 0
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
370 /**
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
371 * Experimental implementation of the filter (Algorithm 1) described in a paper from Ramkishor & Karandikar
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
372 * values are correctly clipped (MMX2)
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
373 * values are wraparound (C)
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
374 * conclusion: its fast, but introduces ugly horizontal patterns if there is a continious gradient
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
375 0 8 16 24
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
376 x = 8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
377 x/2 = 4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
378 x/8 = 1
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
379 1 12 12 23
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
380 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
381 static inline void RENAME(vertRK1Filter)(uint8_t *src, int stride, int QP)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
382 {
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
383 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
384 src+= stride*3;
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
385 // FIXME rounding
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
386 asm volatile(
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
387 "pxor %%mm7, %%mm7 \n\t" // 0
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
388 "movq "MANGLE(b80)", %%mm6 \n\t" // MIN_SIGNED_BYTE
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
389 "leal (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
390 "leal (%%"REG_a", %1, 4), %%"REG_c" \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
391 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
392 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1 %0+8%1 ecx+4%1
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
393 "movq "MANGLE(pQPb)", %%mm0 \n\t" // QP,..., QP
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
394 "movq %%mm0, %%mm1 \n\t" // QP,..., QP
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
395 "paddusb "MANGLE(b02)", %%mm0 \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
396 "psrlw $2, %%mm0 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
397 "pand "MANGLE(b3F)", %%mm0 \n\t" // QP/4,..., QP/4
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
398 "paddusb %%mm1, %%mm0 \n\t" // QP*1.25 ...
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
399 "movq (%0, %1, 4), %%mm2 \n\t" // line 4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
400 "movq (%%"REG_c"), %%mm3 \n\t" // line 5
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
401 "movq %%mm2, %%mm4 \n\t" // line 4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
402 "pcmpeqb %%mm5, %%mm5 \n\t" // -1
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
403 "pxor %%mm2, %%mm5 \n\t" // -line 4 - 1
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
404 PAVGB(%%mm3, %%mm5)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
405 "paddb %%mm6, %%mm5 \n\t" // (l5-l4)/2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
406 "psubusb %%mm3, %%mm4 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
407 "psubusb %%mm2, %%mm3 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
408 "por %%mm3, %%mm4 \n\t" // |l4 - l5|
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
409 "psubusb %%mm0, %%mm4 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
410 "pcmpeqb %%mm7, %%mm4 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
411 "pand %%mm4, %%mm5 \n\t" // d/2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
412
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
413 // "paddb %%mm6, %%mm2 \n\t" // line 4 + 0x80
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
414 "paddb %%mm5, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
415 // "psubb %%mm6, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
416 "movq %%mm2, (%0,%1, 4) \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
417
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
418 "movq (%%"REG_c"), %%mm2 \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
419 // "paddb %%mm6, %%mm2 \n\t" // line 5 + 0x80
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
420 "psubb %%mm5, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
421 // "psubb %%mm6, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
422 "movq %%mm2, (%%"REG_c") \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
423
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
424 "paddb %%mm6, %%mm5 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
425 "psrlw $2, %%mm5 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
426 "pand "MANGLE(b3F)", %%mm5 \n\t"
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
427 "psubb "MANGLE(b20)", %%mm5 \n\t" // (l5-l4)/8
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
428
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
429 "movq (%%"REG_a", %1, 2), %%mm2 \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
430 "paddb %%mm6, %%mm2 \n\t" // line 3 + 0x80
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
431 "paddsb %%mm5, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
432 "psubb %%mm6, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
433 "movq %%mm2, (%%"REG_a", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
434
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
435 "movq (%%"REG_c", %1), %%mm2 \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
436 "paddb %%mm6, %%mm2 \n\t" // line 6 + 0x80
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
437 "psubsb %%mm5, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
438 "psubb %%mm6, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
439 "movq %%mm2, (%%"REG_c", %1) \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
440
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
441 :
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
442 : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
443 : "%"REG_a, "%"REG_c
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
444 );
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
445 #else
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
446 const int l1= stride;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
447 const int l2= stride + l1;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
448 const int l3= stride + l2;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
449 const int l4= stride + l3;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
450 const int l5= stride + l4;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
451 const int l6= stride + l5;
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
452 // const int l7= stride + l6;
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
453 // const int l8= stride + l7;
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
454 // const int l9= stride + l8;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
455 int x;
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
456 const int QP15= QP + (QP>>2);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
457 src+= stride*3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
458 for(x=0; x<BLOCK_SIZE; x++)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
459 {
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
460 const int v = (src[x+l5] - src[x+l4]);
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
461 if(ABS(v) < QP15)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
462 {
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
463 src[x+l3] +=v>>3;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
464 src[x+l4] +=v>>1;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
465 src[x+l5] -=v>>1;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
466 src[x+l6] -=v>>3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
467
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
468 }
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
469 }
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
470
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
471 #endif
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
472 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
473 #endif
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
474
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
475 /**
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
476 * Experimental Filter 1
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
477 * will not damage linear gradients
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
478 * Flat blocks should look like they where passed through the (1,1,2,2,4,2,2,1,1) 9-Tap filter
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
479 * can only smooth blocks at the expected locations (it cant smooth them if they did move)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
480 * MMX2 version does correct clipping C version doesnt
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
481 */
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
482 static inline void RENAME(vertX1Filter)(uint8_t *src, int stride, PPContext *co)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
483 {
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
484 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
485 src+= stride*3;
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
486
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
487 asm volatile(
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
488 "pxor %%mm7, %%mm7 \n\t" // 0
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
489 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
490 "lea (%%"REG_a", %1, 4), %%"REG_c" \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
491 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
492 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1 %0+8%1 ecx+4%1
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
493 "movq (%%"REG_a", %1, 2), %%mm0 \n\t" // line 3
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
494 "movq (%0, %1, 4), %%mm1 \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
495 "movq %%mm1, %%mm2 \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
496 "psubusb %%mm0, %%mm1 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
497 "psubusb %%mm2, %%mm0 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
498 "por %%mm1, %%mm0 \n\t" // |l2 - l3|
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
499 "movq (%%"REG_c"), %%mm3 \n\t" // line 5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
500 "movq (%%"REG_c", %1), %%mm4 \n\t" // line 6
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
501 "movq %%mm3, %%mm5 \n\t" // line 5
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
502 "psubusb %%mm4, %%mm3 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
503 "psubusb %%mm5, %%mm4 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
504 "por %%mm4, %%mm3 \n\t" // |l5 - l6|
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
505 PAVGB(%%mm3, %%mm0) // (|l2 - l3| + |l5 - l6|)/2
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
506 "movq %%mm2, %%mm1 \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
507 "psubusb %%mm5, %%mm2 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
508 "movq %%mm2, %%mm4 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
509 "pcmpeqb %%mm7, %%mm2 \n\t" // (l4 - l5) <= 0 ? -1 : 0
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
510 "psubusb %%mm1, %%mm5 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
511 "por %%mm5, %%mm4 \n\t" // |l4 - l5|
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
512 "psubusb %%mm0, %%mm4 \n\t" //d = MAX(0, |l4-l5| - (|l2-l3| + |l5-l6|)/2)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
513 "movq %%mm4, %%mm3 \n\t" // d
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
514 "movq %2, %%mm0 \n\t"
334
3912b37ba121 x1 deblocking filter bugfix
michael
parents: 224
diff changeset
515 "paddusb %%mm0, %%mm0 \n\t"
3912b37ba121 x1 deblocking filter bugfix
michael
parents: 224
diff changeset
516 "psubusb %%mm0, %%mm4 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
517 "pcmpeqb %%mm7, %%mm4 \n\t" // d <= QP ? -1 : 0
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
518 "psubusb "MANGLE(b01)", %%mm3 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
519 "pand %%mm4, %%mm3 \n\t" // d <= QP ? d : 0
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
520
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
521 PAVGB(%%mm7, %%mm3) // d/2
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
522 "movq %%mm3, %%mm1 \n\t" // d/2
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
523 PAVGB(%%mm7, %%mm3) // d/4
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
524 PAVGB(%%mm1, %%mm3) // 3*d/8
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
525
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
526 "movq (%0, %1, 4), %%mm0 \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
527 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l4-1 : l4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
528 "psubusb %%mm3, %%mm0 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
529 "pxor %%mm2, %%mm0 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
530 "movq %%mm0, (%0, %1, 4) \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
531
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
532 "movq (%%"REG_c"), %%mm0 \n\t" // line 5
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
533 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l5-1 : l5
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
534 "paddusb %%mm3, %%mm0 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
535 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
536 "movq %%mm0, (%%"REG_c") \n\t" // line 5
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
537
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
538 PAVGB(%%mm7, %%mm1) // d/4
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
539
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
540 "movq (%%"REG_a", %1, 2), %%mm0 \n\t" // line 3
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
541 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l4-1 : l4
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
542 "psubusb %%mm1, %%mm0 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
543 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
544 "movq %%mm0, (%%"REG_a", %1, 2) \n\t" // line 3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
545
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
546 "movq (%%"REG_c", %1), %%mm0 \n\t" // line 6
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
547 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l5-1 : l5
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
548 "paddusb %%mm1, %%mm0 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
549 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
550 "movq %%mm0, (%%"REG_c", %1) \n\t" // line 6
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
551
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
552 PAVGB(%%mm7, %%mm1) // d/8
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
553
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
554 "movq (%%"REG_a", %1), %%mm0 \n\t" // line 2
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
555 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l2-1 : l2
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
556 "psubusb %%mm1, %%mm0 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
557 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
558 "movq %%mm0, (%%"REG_a", %1) \n\t" // line 2
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
559
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
560 "movq (%%"REG_c", %1, 2), %%mm0 \n\t" // line 7
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
561 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l7-1 : l7
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
562 "paddusb %%mm1, %%mm0 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
563 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
564 "movq %%mm0, (%%"REG_c", %1, 2) \n\t" // line 7
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
565
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
566 :
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
567 : "r" (src), "r" ((long)stride), "m" (co->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
568 : "%"REG_a, "%"REG_c
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
569 );
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
570 #else
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
571
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
572 const int l1= stride;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
573 const int l2= stride + l1;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
574 const int l3= stride + l2;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
575 const int l4= stride + l3;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
576 const int l5= stride + l4;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
577 const int l6= stride + l5;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
578 const int l7= stride + l6;
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
579 // const int l8= stride + l7;
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
580 // const int l9= stride + l8;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
581 int x;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
582
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
583 src+= stride*3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
584 for(x=0; x<BLOCK_SIZE; x++)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
585 {
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
586 int a= src[l3] - src[l4];
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
587 int b= src[l4] - src[l5];
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
588 int c= src[l5] - src[l6];
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
589
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
590 int d= ABS(b) - ((ABS(a) + ABS(c))>>1);
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
591 d= MAX(d, 0);
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
592
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
593 if(d < co->QP*2)
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
594 {
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
595 int v = d * SIGN(-b);
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
596
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
597 src[l2] +=v>>3;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
598 src[l3] +=v>>2;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
599 src[l4] +=(3*v)>>3;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
600 src[l5] -=(3*v)>>3;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
601 src[l6] -=v>>2;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
602 src[l7] -=v>>3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
603
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
604 }
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
605 src++;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
606 }
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
607 #endif
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
608 }
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
609
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
610 #ifndef HAVE_ALTIVEC
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
611 static inline void RENAME(doVertDefFilter)(uint8_t src[], int stride, PPContext *c)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
612 {
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
613 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
614 /*
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
615 uint8_t tmp[16];
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
616 const int l1= stride;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
617 const int l2= stride + l1;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
618 const int l3= stride + l2;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
619 const int l4= (int)tmp - (int)src - stride*3;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
620 const int l5= (int)tmp - (int)src - stride*3 + 8;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
621 const int l6= stride*3 + l3;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
622 const int l7= stride + l6;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
623 const int l8= stride + l7;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
624
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
625 memcpy(tmp, src+stride*7, 8);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
626 memcpy(tmp+8, src+stride*8, 8);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
627 */
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
628 src+= stride*4;
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
629 asm volatile(
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
630
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
631 #if 0 //sligtly more accurate and slightly slower
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
632 "pxor %%mm7, %%mm7 \n\t" // 0
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
633 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
634 "lea (%%"REG_a", %1, 4), %%"REG_c" \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
635 // 0 1 2 3 4 5 6 7
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
636 // %0 %0+%1 %0+2%1 eax+2%1 %0+4%1 eax+4%1 ecx+%1 ecx+2%1
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
637 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
638
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
639
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
640 "movq (%0, %1, 2), %%mm0 \n\t" // l2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
641 "movq (%0), %%mm1 \n\t" // l0
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
642 "movq %%mm0, %%mm2 \n\t" // l2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
643 PAVGB(%%mm7, %%mm0) // ~l2/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
644 PAVGB(%%mm1, %%mm0) // ~(l2 + 2l0)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
645 PAVGB(%%mm2, %%mm0) // ~(5l2 + 2l0)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
646
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
647 "movq (%%"REG_a"), %%mm1 \n\t" // l1
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
648 "movq (%%"REG_a", %1, 2), %%mm3 \n\t" // l3
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
649 "movq %%mm1, %%mm4 \n\t" // l1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
650 PAVGB(%%mm7, %%mm1) // ~l1/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
651 PAVGB(%%mm3, %%mm1) // ~(l1 + 2l3)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
652 PAVGB(%%mm4, %%mm1) // ~(5l1 + 2l3)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
653
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
654 "movq %%mm0, %%mm4 \n\t" // ~(5l2 + 2l0)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
655 "psubusb %%mm1, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
656 "psubusb %%mm4, %%mm1 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
657 "por %%mm0, %%mm1 \n\t" // ~|2l0 - 5l1 + 5l2 - 2l3|/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
658 // mm1= |lenergy|, mm2= l2, mm3= l3, mm7=0
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
659
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
660 "movq (%0, %1, 4), %%mm0 \n\t" // l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
661 "movq %%mm0, %%mm4 \n\t" // l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
662 PAVGB(%%mm7, %%mm0) // ~l4/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
663 PAVGB(%%mm2, %%mm0) // ~(l4 + 2l2)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
664 PAVGB(%%mm4, %%mm0) // ~(5l4 + 2l2)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
665
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
666 "movq (%%"REG_c"), %%mm2 \n\t" // l5
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
667 "movq %%mm3, %%mm5 \n\t" // l3
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
668 PAVGB(%%mm7, %%mm3) // ~l3/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
669 PAVGB(%%mm2, %%mm3) // ~(l3 + 2l5)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
670 PAVGB(%%mm5, %%mm3) // ~(5l3 + 2l5)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
671
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
672 "movq %%mm0, %%mm6 \n\t" // ~(5l4 + 2l2)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
673 "psubusb %%mm3, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
674 "psubusb %%mm6, %%mm3 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
675 "por %%mm0, %%mm3 \n\t" // ~|2l2 - 5l3 + 5l4 - 2l5|/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
676 "pcmpeqb %%mm7, %%mm0 \n\t" // SIGN(2l2 - 5l3 + 5l4 - 2l5)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
677 // mm0= SIGN(menergy), mm1= |lenergy|, mm2= l5, mm3= |menergy|, mm4=l4, mm5= l3, mm7=0
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
678
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
679 "movq (%%"REG_c", %1), %%mm6 \n\t" // l6
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
680 "movq %%mm6, %%mm5 \n\t" // l6
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
681 PAVGB(%%mm7, %%mm6) // ~l6/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
682 PAVGB(%%mm4, %%mm6) // ~(l6 + 2l4)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
683 PAVGB(%%mm5, %%mm6) // ~(5l6 + 2l4)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
684
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
685 "movq (%%"REG_c", %1, 2), %%mm5 \n\t" // l7
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
686 "movq %%mm2, %%mm4 \n\t" // l5
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
687 PAVGB(%%mm7, %%mm2) // ~l5/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
688 PAVGB(%%mm5, %%mm2) // ~(l5 + 2l7)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
689 PAVGB(%%mm4, %%mm2) // ~(5l5 + 2l7)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
690
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
691 "movq %%mm6, %%mm4 \n\t" // ~(5l6 + 2l4)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
692 "psubusb %%mm2, %%mm6 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
693 "psubusb %%mm4, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
694 "por %%mm6, %%mm2 \n\t" // ~|2l4 - 5l5 + 5l6 - 2l7|/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
695 // mm0= SIGN(menergy), mm1= |lenergy|/8, mm2= |renergy|/8, mm3= |menergy|/8, mm7=0
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
696
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
697
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
698 PMINUB(%%mm2, %%mm1, %%mm4) // MIN(|lenergy|,|renergy|)/8
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
699 "movq %2, %%mm4 \n\t" // QP //FIXME QP+1 ?
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
700 "paddusb "MANGLE(b01)", %%mm4 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
701 "pcmpgtb %%mm3, %%mm4 \n\t" // |menergy|/8 < QP
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
702 "psubusb %%mm1, %%mm3 \n\t" // d=|menergy|/8-MIN(|lenergy|,|renergy|)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
703 "pand %%mm4, %%mm3 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
704
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
705 "movq %%mm3, %%mm1 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
706 // "psubusb "MANGLE(b01)", %%mm3 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
707 PAVGB(%%mm7, %%mm3)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
708 PAVGB(%%mm7, %%mm3)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
709 "paddusb %%mm1, %%mm3 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
710 // "paddusb "MANGLE(b01)", %%mm3 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
711
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
712 "movq (%%"REG_a", %1, 2), %%mm6 \n\t" //l3
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
713 "movq (%0, %1, 4), %%mm5 \n\t" //l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
714 "movq (%0, %1, 4), %%mm4 \n\t" //l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
715 "psubusb %%mm6, %%mm5 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
716 "psubusb %%mm4, %%mm6 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
717 "por %%mm6, %%mm5 \n\t" // |l3-l4|
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
718 "pcmpeqb %%mm7, %%mm6 \n\t" // SIGN(l3-l4)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
719 "pxor %%mm6, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
720 "pand %%mm0, %%mm3 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
721 PMINUB(%%mm5, %%mm3, %%mm0)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
722
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
723 "psubusb "MANGLE(b01)", %%mm3 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
724 PAVGB(%%mm7, %%mm3)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
725
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
726 "movq (%%"REG_a", %1, 2), %%mm0 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
727 "movq (%0, %1, 4), %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
728 "pxor %%mm6, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
729 "pxor %%mm6, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
730 "psubb %%mm3, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
731 "paddb %%mm3, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
732 "pxor %%mm6, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
733 "pxor %%mm6, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
734 "movq %%mm0, (%%"REG_a", %1, 2) \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
735 "movq %%mm2, (%0, %1, 4) \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
736 #endif
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
737
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
738 "lea (%0, %1), %%"REG_a" \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
739 "pcmpeqb %%mm6, %%mm6 \n\t" // -1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
740 // 0 1 2 3 4 5 6 7
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
741 // %0 %0+%1 %0+2%1 eax+2%1 %0+4%1 eax+4%1 ecx+%1 ecx+2%1
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
742 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
743
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
744
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
745 "movq (%%"REG_a", %1, 2), %%mm1 \n\t" // l3
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
746 "movq (%0, %1, 4), %%mm0 \n\t" // l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
747 "pxor %%mm6, %%mm1 \n\t" // -l3-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
748 PAVGB(%%mm1, %%mm0) // -q+128 = (l4-l3+256)/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
749 // mm1=-l3-1, mm0=128-q
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
750
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
751 "movq (%%"REG_a", %1, 4), %%mm2 \n\t" // l5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
752 "movq (%%"REG_a", %1), %%mm3 \n\t" // l2
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
753 "pxor %%mm6, %%mm2 \n\t" // -l5-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
754 "movq %%mm2, %%mm5 \n\t" // -l5-1
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
755 "movq "MANGLE(b80)", %%mm4 \n\t" // 128
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
756 "lea (%%"REG_a", %1, 4), %%"REG_c" \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
757 PAVGB(%%mm3, %%mm2) // (l2-l5+256)/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
758 PAVGB(%%mm0, %%mm4) // ~(l4-l3)/4 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
759 PAVGB(%%mm2, %%mm4) // ~(l2-l5)/4 +(l4-l3)/8 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
760 PAVGB(%%mm0, %%mm4) // ~(l2-l5)/8 +5(l4-l3)/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
761 // mm1=-l3-1, mm0=128-q, mm3=l2, mm4=menergy/16 + 128, mm5= -l5-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
762
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
763 "movq (%%"REG_a"), %%mm2 \n\t" // l1
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
764 "pxor %%mm6, %%mm2 \n\t" // -l1-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
765 PAVGB(%%mm3, %%mm2) // (l2-l1+256)/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
766 PAVGB((%0), %%mm1) // (l0-l3+256)/2
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
767 "movq "MANGLE(b80)", %%mm3 \n\t" // 128
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
768 PAVGB(%%mm2, %%mm3) // ~(l2-l1)/4 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
769 PAVGB(%%mm1, %%mm3) // ~(l0-l3)/4 +(l2-l1)/8 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
770 PAVGB(%%mm2, %%mm3) // ~(l0-l3)/8 +5(l2-l1)/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
771 // mm0=128-q, mm3=lenergy/16 + 128, mm4= menergy/16 + 128, mm5= -l5-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
772
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
773 PAVGB((%%REGc, %1), %%mm5) // (l6-l5+256)/2
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
774 "movq (%%"REG_c", %1, 2), %%mm1 \n\t" // l7
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
775 "pxor %%mm6, %%mm1 \n\t" // -l7-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
776 PAVGB((%0, %1, 4), %%mm1) // (l4-l7+256)/2
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
777 "movq "MANGLE(b80)", %%mm2 \n\t" // 128
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
778 PAVGB(%%mm5, %%mm2) // ~(l6-l5)/4 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
779 PAVGB(%%mm1, %%mm2) // ~(l4-l7)/4 +(l6-l5)/8 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
780 PAVGB(%%mm5, %%mm2) // ~(l4-l7)/8 +5(l6-l5)/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
781 // mm0=128-q, mm2=renergy/16 + 128, mm3=lenergy/16 + 128, mm4= menergy/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
782
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
783 "movq "MANGLE(b00)", %%mm1 \n\t" // 0
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
784 "movq "MANGLE(b00)", %%mm5 \n\t" // 0
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
785 "psubb %%mm2, %%mm1 \n\t" // 128 - renergy/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
786 "psubb %%mm3, %%mm5 \n\t" // 128 - lenergy/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
787 PMAXUB(%%mm1, %%mm2) // 128 + |renergy/16|
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
788 PMAXUB(%%mm5, %%mm3) // 128 + |lenergy/16|
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
789 PMINUB(%%mm2, %%mm3, %%mm1) // 128 + MIN(|lenergy|,|renergy|)/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
790
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
791 // mm0=128-q, mm3=128 + MIN(|lenergy|,|renergy|)/16, mm4= menergy/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
792
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
793 "movq "MANGLE(b00)", %%mm7 \n\t" // 0
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
794 "movq %2, %%mm2 \n\t" // QP
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
795 PAVGB(%%mm6, %%mm2) // 128 + QP/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
796 "psubb %%mm6, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
797
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
798 "movq %%mm4, %%mm1 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
799 "pcmpgtb %%mm7, %%mm1 \n\t" // SIGN(menergy)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
800 "pxor %%mm1, %%mm4 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
801 "psubb %%mm1, %%mm4 \n\t" // 128 + |menergy|/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
802 "pcmpgtb %%mm4, %%mm2 \n\t" // |menergy|/16 < QP/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
803 "psubusb %%mm3, %%mm4 \n\t" //d=|menergy|/16 - MIN(|lenergy|,|renergy|)/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
804 // mm0=128-q, mm1= SIGN(menergy), mm2= |menergy|/16 < QP/2, mm4= d/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
805
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
806 "movq %%mm4, %%mm3 \n\t" // d
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
807 "psubusb "MANGLE(b01)", %%mm4 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
808 PAVGB(%%mm7, %%mm4) // d/32
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
809 PAVGB(%%mm7, %%mm4) // (d + 32)/64
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
810 "paddb %%mm3, %%mm4 \n\t" // 5d/64
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
811 "pand %%mm2, %%mm4 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
812
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
813 "movq "MANGLE(b80)", %%mm5 \n\t" // 128
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
814 "psubb %%mm0, %%mm5 \n\t" // q
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
815 "paddsb %%mm6, %%mm5 \n\t" // fix bad rounding
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
816 "pcmpgtb %%mm5, %%mm7 \n\t" // SIGN(q)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
817 "pxor %%mm7, %%mm5 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
818
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
819 PMINUB(%%mm5, %%mm4, %%mm3) // MIN(|q|, 5d/64)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
820 "pxor %%mm1, %%mm7 \n\t" // SIGN(d*q)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
821
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
822 "pand %%mm7, %%mm4 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
823 "movq (%%"REG_a", %1, 2), %%mm0 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
824 "movq (%0, %1, 4), %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
825 "pxor %%mm1, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
826 "pxor %%mm1, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
827 "paddb %%mm4, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
828 "psubb %%mm4, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
829 "pxor %%mm1, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
830 "pxor %%mm1, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
831 "movq %%mm0, (%%"REG_a", %1, 2) \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
832 "movq %%mm2, (%0, %1, 4) \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
833
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
834 :
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
835 : "r" (src), "r" ((long)stride), "m" (c->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
836 : "%"REG_a, "%"REG_c
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
837 );
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
838
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
839 /*
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
840 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
841 int x;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
842 src-= stride;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
843 for(x=0; x<BLOCK_SIZE; x++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
844 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
845 const int middleEnergy= 5*(src[l5] - src[l4]) + 2*(src[l3] - src[l6]);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
846 if(ABS(middleEnergy)< 8*QP)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
847 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
848 const int q=(src[l4] - src[l5])/2;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
849 const int leftEnergy= 5*(src[l3] - src[l2]) + 2*(src[l1] - src[l4]);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
850 const int rightEnergy= 5*(src[l7] - src[l6]) + 2*(src[l5] - src[l8]);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
851
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
852 int d= ABS(middleEnergy) - MIN( ABS(leftEnergy), ABS(rightEnergy) );
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
853 d= MAX(d, 0);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
854
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
855 d= (5*d + 32) >> 6;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
856 d*= SIGN(-middleEnergy);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
857
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
858 if(q>0)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
859 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
860 d= d<0 ? 0 : d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
861 d= d>q ? q : d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
862 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
863 else
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
864 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
865 d= d>0 ? 0 : d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
866 d= d<q ? q : d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
867 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
868
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
869 src[l4]-= d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
870 src[l5]+= d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
871 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
872 src++;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
873 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
874 src-=8;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
875 for(x=0; x<8; x++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
876 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
877 int y;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
878 for(y=4; y<6; y++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
879 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
880 int d= src[x+y*stride] - tmp[x+(y-4)*8];
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
881 int ad= ABS(d);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
882 static int max=0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
883 static int sum=0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
884 static int num=0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
885 static int bias=0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
886
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
887 if(max<ad) max=ad;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
888 sum+= ad>3 ? 1 : 0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
889 if(ad>3)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
890 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
891 src[0] = src[7] = src[stride*7] = src[(stride+1)*7]=255;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
892 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
893 if(y==4) bias+=d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
894 num++;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
895 if(num%1000000 == 0)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
896 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
897 printf(" %d %d %d %d\n", num, sum, max, bias);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
898 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
899 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
900 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
901 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
902 */
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
903 #elif defined (HAVE_MMX)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
904 src+= stride*4;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
905 asm volatile(
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
906 "pxor %%mm7, %%mm7 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
907 "lea -40(%%"REG_SP"), %%"REG_c" \n\t" // make space for 4 8-byte vars
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
908 "and "ALIGN_MASK", %%"REG_c" \n\t" // align
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
909 // 0 1 2 3 4 5 6 7
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
910 // %0 %0+%1 %0+2%1 eax+2%1 %0+4%1 eax+4%1 edx+%1 edx+2%1
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
911 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
912
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
913 "movq (%0), %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
914 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
915 "punpcklbw %%mm7, %%mm0 \n\t" // low part of line 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
916 "punpckhbw %%mm7, %%mm1 \n\t" // high part of line 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
917
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
918 "movq (%0, %1), %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
919 "lea (%0, %1, 2), %%"REG_a" \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
920 "movq %%mm2, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
921 "punpcklbw %%mm7, %%mm2 \n\t" // low part of line 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
922 "punpckhbw %%mm7, %%mm3 \n\t" // high part of line 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
923
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
924 "movq (%%"REG_a"), %%mm4 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
925 "movq %%mm4, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
926 "punpcklbw %%mm7, %%mm4 \n\t" // low part of line 2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
927 "punpckhbw %%mm7, %%mm5 \n\t" // high part of line 2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
928
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
929 "paddw %%mm0, %%mm0 \n\t" // 2L0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
930 "paddw %%mm1, %%mm1 \n\t" // 2H0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
931 "psubw %%mm4, %%mm2 \n\t" // L1 - L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
932 "psubw %%mm5, %%mm3 \n\t" // H1 - H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
933 "psubw %%mm2, %%mm0 \n\t" // 2L0 - L1 + L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
934 "psubw %%mm3, %%mm1 \n\t" // 2H0 - H1 + H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
935
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
936 "psllw $2, %%mm2 \n\t" // 4L1 - 4L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
937 "psllw $2, %%mm3 \n\t" // 4H1 - 4H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
938 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
939 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
940
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
941 "movq (%%"REG_a", %1), %%mm2 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
942 "movq %%mm2, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
943 "punpcklbw %%mm7, %%mm2 \n\t" // L3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
944 "punpckhbw %%mm7, %%mm3 \n\t" // H3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
945
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
946 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2 - L3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
947 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2 - H3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
948 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
949 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
950 "movq %%mm0, (%%"REG_c") \n\t" // 2L0 - 5L1 + 5L2 - 2L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
951 "movq %%mm1, 8(%%"REG_c") \n\t" // 2H0 - 5H1 + 5H2 - 2H3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
952
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
953 "movq (%%"REG_a", %1, 2), %%mm0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
954 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
955 "punpcklbw %%mm7, %%mm0 \n\t" // L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
956 "punpckhbw %%mm7, %%mm1 \n\t" // H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
957
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
958 "psubw %%mm0, %%mm2 \n\t" // L3 - L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
959 "psubw %%mm1, %%mm3 \n\t" // H3 - H4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
960 "movq %%mm2, 16(%%"REG_c") \n\t" // L3 - L4
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
961 "movq %%mm3, 24(%%"REG_c") \n\t" // H3 - H4
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
962 "paddw %%mm4, %%mm4 \n\t" // 2L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
963 "paddw %%mm5, %%mm5 \n\t" // 2H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
964 "psubw %%mm2, %%mm4 \n\t" // 2L2 - L3 + L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
965 "psubw %%mm3, %%mm5 \n\t" // 2H2 - H3 + H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
966
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
967 "lea (%%"REG_a", %1), %0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
968 "psllw $2, %%mm2 \n\t" // 4L3 - 4L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
969 "psllw $2, %%mm3 \n\t" // 4H3 - 4H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
970 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
971 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
972 //50 opcodes so far
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
973 "movq (%0, %1, 2), %%mm2 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
974 "movq %%mm2, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
975 "punpcklbw %%mm7, %%mm2 \n\t" // L5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
976 "punpckhbw %%mm7, %%mm3 \n\t" // H5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
977 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4 - L5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
978 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4 - H5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
979 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4 - 2L5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
980 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4 - 2H5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
981
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
982 "movq (%%"REG_a", %1, 4), %%mm6 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
983 "punpcklbw %%mm7, %%mm6 \n\t" // L6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
984 "psubw %%mm6, %%mm2 \n\t" // L5 - L6
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
985 "movq (%%"REG_a", %1, 4), %%mm6 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
986 "punpckhbw %%mm7, %%mm6 \n\t" // H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
987 "psubw %%mm6, %%mm3 \n\t" // H5 - H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
988
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
989 "paddw %%mm0, %%mm0 \n\t" // 2L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
990 "paddw %%mm1, %%mm1 \n\t" // 2H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
991 "psubw %%mm2, %%mm0 \n\t" // 2L4 - L5 + L6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
992 "psubw %%mm3, %%mm1 \n\t" // 2H4 - H5 + H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
993
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
994 "psllw $2, %%mm2 \n\t" // 4L5 - 4L6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
995 "psllw $2, %%mm3 \n\t" // 4H5 - 4H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
996 "psubw %%mm2, %%mm0 \n\t" // 2L4 - 5L5 + 5L6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
997 "psubw %%mm3, %%mm1 \n\t" // 2H4 - 5H5 + 5H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
998
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
999 "movq (%0, %1, 4), %%mm2 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1000 "movq %%mm2, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1001 "punpcklbw %%mm7, %%mm2 \n\t" // L7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1002 "punpckhbw %%mm7, %%mm3 \n\t" // H7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1003
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1004 "paddw %%mm2, %%mm2 \n\t" // 2L7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1005 "paddw %%mm3, %%mm3 \n\t" // 2H7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1006 "psubw %%mm2, %%mm0 \n\t" // 2L4 - 5L5 + 5L6 - 2L7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1007 "psubw %%mm3, %%mm1 \n\t" // 2H4 - 5H5 + 5H6 - 2H7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1008
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1009 "movq (%%"REG_c"), %%mm2 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1010 "movq 8(%%"REG_c"), %%mm3 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1011
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1012 #ifdef HAVE_MMX2
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1013 "movq %%mm7, %%mm6 \n\t" // 0
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1014 "psubw %%mm0, %%mm6 \n\t"
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1015 "pmaxsw %%mm6, %%mm0 \n\t" // |2L4 - 5L5 + 5L6 - 2L7|
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1016 "movq %%mm7, %%mm6 \n\t" // 0
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1017 "psubw %%mm1, %%mm6 \n\t"
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1018 "pmaxsw %%mm6, %%mm1 \n\t" // |2H4 - 5H5 + 5H6 - 2H7|
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1019 "movq %%mm7, %%mm6 \n\t" // 0
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1020 "psubw %%mm2, %%mm6 \n\t"
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1021 "pmaxsw %%mm6, %%mm2 \n\t" // |2L0 - 5L1 + 5L2 - 2L3|
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1022 "movq %%mm7, %%mm6 \n\t" // 0
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1023 "psubw %%mm3, %%mm6 \n\t"
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1024 "pmaxsw %%mm6, %%mm3 \n\t" // |2H0 - 5H1 + 5H2 - 2H3|
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1025 #else
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1026 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1027 "pcmpgtw %%mm0, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1028 "pxor %%mm6, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1029 "psubw %%mm6, %%mm0 \n\t" // |2L4 - 5L5 + 5L6 - 2L7|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1030 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1031 "pcmpgtw %%mm1, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1032 "pxor %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1033 "psubw %%mm6, %%mm1 \n\t" // |2H4 - 5H5 + 5H6 - 2H7|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1034 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1035 "pcmpgtw %%mm2, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1036 "pxor %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1037 "psubw %%mm6, %%mm2 \n\t" // |2L0 - 5L1 + 5L2 - 2L3|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1038 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1039 "pcmpgtw %%mm3, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1040 "pxor %%mm6, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1041 "psubw %%mm6, %%mm3 \n\t" // |2H0 - 5H1 + 5H2 - 2H3|
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1042 #endif
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1043
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1044 #ifdef HAVE_MMX2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1045 "pminsw %%mm2, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1046 "pminsw %%mm3, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1047 #else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1048 "movq %%mm0, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1049 "psubusw %%mm2, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1050 "psubw %%mm6, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1051 "movq %%mm1, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1052 "psubusw %%mm3, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1053 "psubw %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1054 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1055
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
1056 "movd %2, %%mm2 \n\t" // QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
1057 "punpcklbw %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
1058
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1059 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1060 "pcmpgtw %%mm4, %%mm6 \n\t" // sign(2L2 - 5L3 + 5L4 - 2L5)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1061 "pxor %%mm6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1062 "psubw %%mm6, %%mm4 \n\t" // |2L2 - 5L3 + 5L4 - 2L5|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1063 "pcmpgtw %%mm5, %%mm7 \n\t" // sign(2H2 - 5H3 + 5H4 - 2H5)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1064 "pxor %%mm7, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1065 "psubw %%mm7, %%mm5 \n\t" // |2H2 - 5H3 + 5H4 - 2H5|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1066 // 100 opcodes
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1067 "psllw $3, %%mm2 \n\t" // 8QP
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1068 "movq %%mm2, %%mm3 \n\t" // 8QP
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1069 "pcmpgtw %%mm4, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1070 "pcmpgtw %%mm5, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1071 "pand %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1072 "pand %%mm3, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1073
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1074
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1075 "psubusw %%mm0, %%mm4 \n\t" // hd
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1076 "psubusw %%mm1, %%mm5 \n\t" // ld
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1077
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1078
211
f1074f0d4969 fix mangling with runtime cpu detection
atmos4
parents: 210
diff changeset
1079 "movq "MANGLE(w05)", %%mm2 \n\t" // 5
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1080 "pmullw %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1081 "pmullw %%mm2, %%mm5 \n\t"
211
f1074f0d4969 fix mangling with runtime cpu detection
atmos4
parents: 210
diff changeset
1082 "movq "MANGLE(w20)", %%mm2 \n\t" // 32
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1083 "paddw %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1084 "paddw %%mm2, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1085 "psrlw $6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1086 "psrlw $6, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1087
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1088 "movq 16(%%"REG_c"), %%mm0 \n\t" // L3 - L4
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1089 "movq 24(%%"REG_c"), %%mm1 \n\t" // H3 - H4
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1090
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1091 "pxor %%mm2, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1092 "pxor %%mm3, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1093
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1094 "pcmpgtw %%mm0, %%mm2 \n\t" // sign (L3-L4)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1095 "pcmpgtw %%mm1, %%mm3 \n\t" // sign (H3-H4)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1096 "pxor %%mm2, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1097 "pxor %%mm3, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1098 "psubw %%mm2, %%mm0 \n\t" // |L3-L4|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1099 "psubw %%mm3, %%mm1 \n\t" // |H3-H4|
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1100 "psrlw $1, %%mm0 \n\t" // |L3 - L4|/2
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1101 "psrlw $1, %%mm1 \n\t" // |H3 - H4|/2
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1102
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1103 "pxor %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1104 "pxor %%mm7, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1105 "pand %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1106 "pand %%mm3, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1107
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1108 #ifdef HAVE_MMX2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1109 "pminsw %%mm0, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1110 "pminsw %%mm1, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1111 #else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1112 "movq %%mm4, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1113 "psubusw %%mm0, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1114 "psubw %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1115 "movq %%mm5, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1116 "psubusw %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1117 "psubw %%mm2, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1118 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1119 "pxor %%mm6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1120 "pxor %%mm7, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1121 "psubw %%mm6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1122 "psubw %%mm7, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1123 "packsswb %%mm5, %%mm4 \n\t"
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1124 "movq (%0), %%mm0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1125 "paddb %%mm4, %%mm0 \n\t"
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1126 "movq %%mm0, (%0) \n\t"
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1127 "movq (%0, %1), %%mm0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1128 "psubb %%mm4, %%mm0 \n\t"
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1129 "movq %%mm0, (%0, %1) \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1130
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1131 : "+r" (src)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1132 : "r" ((long)stride), "m" (c->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1133 : "%"REG_a, "%"REG_c
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1134 );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1135 #else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1136 const int l1= stride;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1137 const int l2= stride + l1;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1138 const int l3= stride + l2;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1139 const int l4= stride + l3;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1140 const int l5= stride + l4;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1141 const int l6= stride + l5;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1142 const int l7= stride + l6;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1143 const int l8= stride + l7;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1144 // const int l9= stride + l8;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
1145 int x;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1146 src+= stride*3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
1147 for(x=0; x<BLOCK_SIZE; x++)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1148 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1149 const int middleEnergy= 5*(src[l5] - src[l4]) + 2*(src[l3] - src[l6]);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1150 if(ABS(middleEnergy) < 8*c->QP)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1151 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1152 const int q=(src[l4] - src[l5])/2;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1153 const int leftEnergy= 5*(src[l3] - src[l2]) + 2*(src[l1] - src[l4]);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1154 const int rightEnergy= 5*(src[l7] - src[l6]) + 2*(src[l5] - src[l8]);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1155
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1156 int d= ABS(middleEnergy) - MIN( ABS(leftEnergy), ABS(rightEnergy) );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1157 d= MAX(d, 0);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1158
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1159 d= (5*d + 32) >> 6;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1160 d*= SIGN(-middleEnergy);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1161
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1162 if(q>0)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1163 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1164 d= d<0 ? 0 : d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1165 d= d>q ? q : d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1166 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1167 else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1168 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1169 d= d>0 ? 0 : d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1170 d= d<q ? q : d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1171 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1172
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1173 src[l4]-= d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1174 src[l5]+= d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1175 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1176 src++;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1177 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1178 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1179 }
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
1180 #endif //HAVE_ALTIVEC
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
1181
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
1182 #ifndef HAVE_ALTIVEC
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1183 static inline void RENAME(dering)(uint8_t src[], int stride, PPContext *c)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1184 {
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1185 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1186 asm volatile(
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1187 "pxor %%mm6, %%mm6 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1188 "pcmpeqb %%mm7, %%mm7 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1189 "movq %2, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1190 "punpcklbw %%mm6, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1191 "psrlw $1, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1192 "psubw %%mm7, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1193 "packuswb %%mm0, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1194 "movq %%mm0, %3 \n\t"
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1195
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1196 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1197 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1198
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1199 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1200 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1201
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
1202 #undef FIND_MIN_MAX
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1203 #ifdef HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1204 #define REAL_FIND_MIN_MAX(addr)\
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1205 "movq " #addr ", %%mm0 \n\t"\
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1206 "pminub %%mm0, %%mm7 \n\t"\
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1207 "pmaxub %%mm0, %%mm6 \n\t"
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1208 #else
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1209 #define REAL_FIND_MIN_MAX(addr)\
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1210 "movq " #addr ", %%mm0 \n\t"\
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1211 "movq %%mm7, %%mm1 \n\t"\
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1212 "psubusb %%mm0, %%mm6 \n\t"\
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1213 "paddb %%mm0, %%mm6 \n\t"\
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1214 "psubusb %%mm0, %%mm1 \n\t"\
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1215 "psubb %%mm1, %%mm7 \n\t"
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1216 #endif
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1217 #define FIND_MIN_MAX(addr) REAL_FIND_MIN_MAX(addr)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1218
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1219 FIND_MIN_MAX((%%REGa))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1220 FIND_MIN_MAX((%%REGa, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1221 FIND_MIN_MAX((%%REGa, %1, 2))
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1222 FIND_MIN_MAX((%0, %1, 4))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1223 FIND_MIN_MAX((%%REGd))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1224 FIND_MIN_MAX((%%REGd, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1225 FIND_MIN_MAX((%%REGd, %1, 2))
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1226 FIND_MIN_MAX((%0, %1, 8))
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1227
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1228 "movq %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1229 "psrlq $8, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1230 #ifdef HAVE_MMX2
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1231 "pminub %%mm4, %%mm7 \n\t" // min of pixels
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1232 "pshufw $0xF9, %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1233 "pminub %%mm4, %%mm7 \n\t" // min of pixels
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1234 "pshufw $0xFE, %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1235 "pminub %%mm4, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1236 #else
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1237 "movq %%mm7, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1238 "psubusb %%mm4, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1239 "psubb %%mm1, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1240 "movq %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1241 "psrlq $16, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1242 "movq %%mm7, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1243 "psubusb %%mm4, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1244 "psubb %%mm1, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1245 "movq %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1246 "psrlq $32, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1247 "movq %%mm7, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1248 "psubusb %%mm4, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1249 "psubb %%mm1, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1250 #endif
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1251
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1252
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1253 "movq %%mm6, %%mm4 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1254 "psrlq $8, %%mm6 \n\t"
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1255 #ifdef HAVE_MMX2
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1256 "pmaxub %%mm4, %%mm6 \n\t" // max of pixels
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1257 "pshufw $0xF9, %%mm6, %%mm4 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1258 "pmaxub %%mm4, %%mm6 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1259 "pshufw $0xFE, %%mm6, %%mm4 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1260 "pmaxub %%mm4, %%mm6 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1261 #else
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1262 "psubusb %%mm4, %%mm6 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1263 "paddb %%mm4, %%mm6 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1264 "movq %%mm6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1265 "psrlq $16, %%mm6 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1266 "psubusb %%mm4, %%mm6 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1267 "paddb %%mm4, %%mm6 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1268 "movq %%mm6, %%mm4 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1269 "psrlq $32, %%mm6 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1270 "psubusb %%mm4, %%mm6 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1271 "paddb %%mm4, %%mm6 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1272 #endif
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1273 "movq %%mm6, %%mm0 \n\t" // max
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1274 "psubb %%mm7, %%mm6 \n\t" // max - min
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1275 "movd %%mm6, %%ecx \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1276 "cmpb "MANGLE(deringThreshold)", %%cl \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1277 " jb 1f \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1278 "lea -24(%%"REG_SP"), %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1279 "and "ALIGN_MASK", %%"REG_c" \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1280 PAVGB(%%mm0, %%mm7) // a=(max + min)/2
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1281 "punpcklbw %%mm7, %%mm7 \n\t"
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1282 "punpcklbw %%mm7, %%mm7 \n\t"
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1283 "punpcklbw %%mm7, %%mm7 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1284 "movq %%mm7, (%%"REG_c") \n\t"
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1285
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1286 "movq (%0), %%mm0 \n\t" // L10
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1287 "movq %%mm0, %%mm1 \n\t" // L10
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1288 "movq %%mm0, %%mm2 \n\t" // L10
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1289 "psllq $8, %%mm1 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1290 "psrlq $8, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1291 "movd -4(%0), %%mm3 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1292 "movd 8(%0), %%mm4 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1293 "psrlq $24, %%mm3 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1294 "psllq $56, %%mm4 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1295 "por %%mm3, %%mm1 \n\t" // L00
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1296 "por %%mm4, %%mm2 \n\t" // L20
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1297 "movq %%mm1, %%mm3 \n\t" // L00
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1298 PAVGB(%%mm2, %%mm1) // (L20 + L00)/2
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1299 PAVGB(%%mm0, %%mm1) // (L20 + L00 + 2L10)/4
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1300 "psubusb %%mm7, %%mm0 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1301 "psubusb %%mm7, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1302 "psubusb %%mm7, %%mm3 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1303 "pcmpeqb "MANGLE(b00)", %%mm0 \n\t" // L10 > a ? 0 : -1
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1304 "pcmpeqb "MANGLE(b00)", %%mm2 \n\t" // L20 > a ? 0 : -1
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1305 "pcmpeqb "MANGLE(b00)", %%mm3 \n\t" // L00 > a ? 0 : -1
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1306 "paddb %%mm2, %%mm0 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1307 "paddb %%mm3, %%mm0 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1308
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1309 "movq (%%"REG_a"), %%mm2 \n\t" // L11
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1310 "movq %%mm2, %%mm3 \n\t" // L11
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1311 "movq %%mm2, %%mm4 \n\t" // L11
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1312 "psllq $8, %%mm3 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1313 "psrlq $8, %%mm4 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1314 "movd -4(%%"REG_a"), %%mm5 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1315 "movd 8(%%"REG_a"), %%mm6 \n\t"
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1316 "psrlq $24, %%mm5 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1317 "psllq $56, %%mm6 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1318 "por %%mm5, %%mm3 \n\t" // L01
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1319 "por %%mm6, %%mm4 \n\t" // L21
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1320 "movq %%mm3, %%mm5 \n\t" // L01
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1321 PAVGB(%%mm4, %%mm3) // (L21 + L01)/2
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1322 PAVGB(%%mm2, %%mm3) // (L21 + L01 + 2L11)/4
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1323 "psubusb %%mm7, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1324 "psubusb %%mm7, %%mm4 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1325 "psubusb %%mm7, %%mm5 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1326 "pcmpeqb "MANGLE(b00)", %%mm2 \n\t" // L11 > a ? 0 : -1
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1327 "pcmpeqb "MANGLE(b00)", %%mm4 \n\t" // L21 > a ? 0 : -1
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1328 "pcmpeqb "MANGLE(b00)", %%mm5 \n\t" // L01 > a ? 0 : -1
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1329 "paddb %%mm4, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1330 "paddb %%mm5, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1331 // 0, 2, 3, 1
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1332 #define REAL_DERING_CORE(dst,src,ppsx,psx,sx,pplx,plx,lx,t0,t1) \
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1333 "movq " #src ", " #sx " \n\t" /* src[0] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1334 "movq " #sx ", " #lx " \n\t" /* src[0] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1335 "movq " #sx ", " #t0 " \n\t" /* src[0] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1336 "psllq $8, " #lx " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1337 "psrlq $8, " #t0 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1338 "movd -4" #src ", " #t1 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1339 "psrlq $24, " #t1 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1340 "por " #t1 ", " #lx " \n\t" /* src[-1] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1341 "movd 8" #src ", " #t1 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1342 "psllq $56, " #t1 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1343 "por " #t1 ", " #t0 " \n\t" /* src[+1] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1344 "movq " #lx ", " #t1 " \n\t" /* src[-1] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1345 PAVGB(t0, lx) /* (src[-1] + src[+1])/2 */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1346 PAVGB(sx, lx) /* (src[-1] + 2src[0] + src[+1])/4 */\
135
5083d662ff85 faster dering
michael
parents: 134
diff changeset
1347 PAVGB(lx, pplx) \
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1348 "movq " #lx ", 8(%%"REG_c") \n\t"\
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1349 "movq (%%"REG_c"), " #lx " \n\t"\
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1350 "psubusb " #lx ", " #t1 " \n\t"\
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1351 "psubusb " #lx ", " #t0 " \n\t"\
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1352 "psubusb " #lx ", " #sx " \n\t"\
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1353 "movq "MANGLE(b00)", " #lx " \n\t"\
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1354 "pcmpeqb " #lx ", " #t1 " \n\t" /* src[-1] > a ? 0 : -1*/\
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1355 "pcmpeqb " #lx ", " #t0 " \n\t" /* src[+1] > a ? 0 : -1*/\
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1356 "pcmpeqb " #lx ", " #sx " \n\t" /* src[0] > a ? 0 : -1*/\
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1357 "paddb " #t1 ", " #t0 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1358 "paddb " #t0 ", " #sx " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1359 \
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1360 PAVGB(plx, pplx) /* filtered */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1361 "movq " #dst ", " #t0 " \n\t" /* dst */\
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1362 "movq " #t0 ", " #t1 " \n\t" /* dst */\
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1363 "psubusb %3, " #t0 " \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1364 "paddusb %3, " #t1 " \n\t"\
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1365 PMAXUB(t0, pplx)\
2c469e390117 dering in c
michael
parents: 133
diff changeset
1366 PMINUB(t1, pplx, t0)\
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1367 "paddb " #sx ", " #ppsx " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1368 "paddb " #psx ", " #ppsx " \n\t"\
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1369 "#paddb "MANGLE(b02)", " #ppsx " \n\t"\
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1370 "pand "MANGLE(b08)", " #ppsx " \n\t"\
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1371 "pcmpeqb " #lx ", " #ppsx " \n\t"\
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1372 "pand " #ppsx ", " #pplx " \n\t"\
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1373 "pandn " #dst ", " #ppsx " \n\t"\
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1374 "por " #pplx ", " #ppsx " \n\t"\
135
5083d662ff85 faster dering
michael
parents: 134
diff changeset
1375 "movq " #ppsx ", " #dst " \n\t"\
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1376 "movq 8(%%"REG_c"), " #lx " \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1377
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1378 #define DERING_CORE(dst,src,ppsx,psx,sx,pplx,plx,lx,t0,t1) \
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1379 REAL_DERING_CORE(dst,src,ppsx,psx,sx,pplx,plx,lx,t0,t1)
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1380 /*
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1381 0000000
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1382 1111111
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1383
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1384 1111110
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1385 1111101
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1386 1111100
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1387 1111011
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1388 1111010
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1389 1111001
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1390
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1391 1111000
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1392 1110111
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1393
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1394 */
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1395 //DERING_CORE(dst,src ,ppsx ,psx ,sx ,pplx ,plx ,lx ,t0 ,t1)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1396 DERING_CORE((%%REGa),(%%REGa, %1) ,%%mm0,%%mm2,%%mm4,%%mm1,%%mm3,%%mm5,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1397 DERING_CORE((%%REGa, %1),(%%REGa, %1, 2) ,%%mm2,%%mm4,%%mm0,%%mm3,%%mm5,%%mm1,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1398 DERING_CORE((%%REGa, %1, 2),(%0, %1, 4) ,%%mm4,%%mm0,%%mm2,%%mm5,%%mm1,%%mm3,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1399 DERING_CORE((%0, %1, 4),(%%REGd) ,%%mm0,%%mm2,%%mm4,%%mm1,%%mm3,%%mm5,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1400 DERING_CORE((%%REGd),(%%REGd, %1) ,%%mm2,%%mm4,%%mm0,%%mm3,%%mm5,%%mm1,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1401 DERING_CORE((%%REGd, %1), (%%REGd, %1, 2),%%mm4,%%mm0,%%mm2,%%mm5,%%mm1,%%mm3,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1402 DERING_CORE((%%REGd, %1, 2),(%0, %1, 8) ,%%mm0,%%mm2,%%mm4,%%mm1,%%mm3,%%mm5,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1403 DERING_CORE((%0, %1, 8),(%%REGd, %1, 4) ,%%mm2,%%mm4,%%mm0,%%mm3,%%mm5,%%mm1,%%mm6,%%mm7)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1404
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1405 "1: \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1406 : : "r" (src), "r" ((long)stride), "m" (c->pQPb), "m"(c->pQPb2)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1407 : "%"REG_a, "%"REG_d, "%"REG_c
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1408 );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1409 #else
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1410 int y;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1411 int min=255;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1412 int max=0;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1413 int avg;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1414 uint8_t *p;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1415 int s[10];
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1416 const int QP2= c->QP/2 + 1;
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1417
2c469e390117 dering in c
michael
parents: 133
diff changeset
1418 for(y=1; y<9; y++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1419 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1420 int x;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1421 p= src + stride*y;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1422 for(x=1; x<9; x++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1423 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1424 p++;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1425 if(*p > max) max= *p;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1426 if(*p < min) min= *p;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1427 }
2c469e390117 dering in c
michael
parents: 133
diff changeset
1428 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1429 avg= (min + max + 1)>>1;
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1430
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1431 if(max - min <deringThreshold) return;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1432
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1433 for(y=0; y<10; y++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1434 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1435 int t = 0;
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1436
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1437 if(src[stride*y + 0] > avg) t+= 1;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1438 if(src[stride*y + 1] > avg) t+= 2;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1439 if(src[stride*y + 2] > avg) t+= 4;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1440 if(src[stride*y + 3] > avg) t+= 8;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1441 if(src[stride*y + 4] > avg) t+= 16;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1442 if(src[stride*y + 5] > avg) t+= 32;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1443 if(src[stride*y + 6] > avg) t+= 64;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1444 if(src[stride*y + 7] > avg) t+= 128;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1445 if(src[stride*y + 8] > avg) t+= 256;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1446 if(src[stride*y + 9] > avg) t+= 512;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1447
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1448 t |= (~t)<<16;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1449 t &= (t<<1) & (t>>1);
2c469e390117 dering in c
michael
parents: 133
diff changeset
1450 s[y] = t;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1451 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1452
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1453 for(y=1; y<9; y++)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1454 {
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1455 int t = s[y-1] & s[y] & s[y+1];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1456 t|= t>>16;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1457 s[y-1]= t;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1458 }
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1459
2c469e390117 dering in c
michael
parents: 133
diff changeset
1460 for(y=1; y<9; y++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1461 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1462 int x;
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1463 int t = s[y-1];
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1464
2c469e390117 dering in c
michael
parents: 133
diff changeset
1465 p= src + stride*y;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1466 for(x=1; x<9; x++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1467 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1468 p++;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1469 if(t & (1<<x))
2c469e390117 dering in c
michael
parents: 133
diff changeset
1470 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1471 int f= (*(p-stride-1)) + 2*(*(p-stride)) + (*(p-stride+1))
2c469e390117 dering in c
michael
parents: 133
diff changeset
1472 +2*(*(p -1)) + 4*(*p ) + 2*(*(p +1))
2c469e390117 dering in c
michael
parents: 133
diff changeset
1473 +(*(p+stride-1)) + 2*(*(p+stride)) + (*(p+stride+1));
2c469e390117 dering in c
michael
parents: 133
diff changeset
1474 f= (f + 8)>>4;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1475
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1476 #ifdef DEBUG_DERING_THRESHOLD
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1477 asm volatile("emms\n\t":);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1478 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1479 static long long numPixels=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1480 if(x!=1 && x!=8 && y!=1 && y!=8) numPixels++;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1481 // if((max-min)<20 || (max-min)*QP<200)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1482 // if((max-min)*QP < 500)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1483 // if(max-min<QP/2)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1484 if(max-min < 20)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1485 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1486 static int numSkiped=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1487 static int errorSum=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1488 static int worstQP=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1489 static int worstRange=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1490 static int worstDiff=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1491 int diff= (f - *p);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1492 int absDiff= ABS(diff);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1493 int error= diff*diff;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1494
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1495 if(x==1 || x==8 || y==1 || y==8) continue;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1496
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1497 numSkiped++;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1498 if(absDiff > worstDiff)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1499 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1500 worstDiff= absDiff;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1501 worstQP= QP;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1502 worstRange= max-min;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1503 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1504 errorSum+= error;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1505
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1506 if(1024LL*1024LL*1024LL % numSkiped == 0)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1507 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1508 printf( "sum:%1.3f, skip:%d, wQP:%d, "
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1509 "wRange:%d, wDiff:%d, relSkip:%1.3f\n",
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1510 (float)errorSum/numSkiped, numSkiped, worstQP, worstRange,
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1511 worstDiff, (float)numSkiped/numPixels);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1512 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1513 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1514 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1515 #endif
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1516 if (*p + QP2 < f) *p= *p + QP2;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1517 else if(*p - QP2 > f) *p= *p - QP2;
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1518 else *p=f;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1519 }
2c469e390117 dering in c
michael
parents: 133
diff changeset
1520 }
2c469e390117 dering in c
michael
parents: 133
diff changeset
1521 }
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1522 #ifdef DEBUG_DERING_THRESHOLD
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1523 if(max-min < 20)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1524 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1525 for(y=1; y<9; y++)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1526 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1527 int x;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1528 int t = 0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1529 p= src + stride*y;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1530 for(x=1; x<9; x++)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1531 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1532 p++;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1533 *p = MIN(*p + 20, 255);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1534 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1535 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1536 // src[0] = src[7]=src[stride*7]=src[stride*7 + 7]=255;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1537 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1538 #endif
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1539 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1540 }
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
1541 #endif //HAVE_ALTIVEC
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1542
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1543 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1544 * Deinterlaces the given block by linearly interpolating every second line.
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1545 * will be called for every 8x8 block and can read & write from line 4-15
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1546 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1547 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1548 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
1549 static inline void RENAME(deInterlaceInterpolateLinear)(uint8_t src[], int stride)
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1550 {
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1551 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1552 src+= 4*stride;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1553 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1554 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1555 "lea (%%"REG_a", %1, 4), %%"REG_c" \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1556 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1557 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1 %0+8%1 ecx+4%1
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1558
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1559 "movq (%0), %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1560 "movq (%%"REG_a", %1), %%mm1 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1561 PAVGB(%%mm1, %%mm0)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1562 "movq %%mm0, (%%"REG_a") \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1563 "movq (%0, %1, 4), %%mm0 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1564 PAVGB(%%mm0, %%mm1)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1565 "movq %%mm1, (%%"REG_a", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1566 "movq (%%"REG_c", %1), %%mm1 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1567 PAVGB(%%mm1, %%mm0)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1568 "movq %%mm0, (%%"REG_c") \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1569 "movq (%0, %1, 8), %%mm0 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1570 PAVGB(%%mm0, %%mm1)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1571 "movq %%mm1, (%%"REG_c", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1572
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1573 : : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1574 : "%"REG_a, "%"REG_c
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1575 );
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1576 #else
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1577 int a, b, x;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1578 src+= 4*stride;
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1579
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1580 for(x=0; x<2; x++){
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1581 a= *(uint32_t*)&src[stride*0];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1582 b= *(uint32_t*)&src[stride*2];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1583 *(uint32_t*)&src[stride*1]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1584 a= *(uint32_t*)&src[stride*4];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1585 *(uint32_t*)&src[stride*3]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1586 b= *(uint32_t*)&src[stride*6];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1587 *(uint32_t*)&src[stride*5]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1588 a= *(uint32_t*)&src[stride*8];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1589 *(uint32_t*)&src[stride*7]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1590 src += 4;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1591 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1592 #endif
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1593 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1594
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1595 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1596 * Deinterlaces the given block by cubic interpolating every second line.
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1597 * will be called for every 8x8 block and can read & write from line 4-15
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1598 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1599 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1600 * this filter will read lines 3-15 and write 7-13
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1601 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
1602 static inline void RENAME(deInterlaceInterpolateCubic)(uint8_t src[], int stride)
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1603 {
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1604 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1605 src+= stride*3;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1606 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1607 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1608 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1609 "lea (%%"REG_d", %1, 4), %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1610 "add %1, %%"REG_c" \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1611 "pxor %%mm7, %%mm7 \n\t"
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1612 // 0 1 2 3 4 5 6 7 8 9 10
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1613 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1 ecx
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1614
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1615 #define REAL_DEINT_CUBIC(a,b,c,d,e)\
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1616 "movq " #a ", %%mm0 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1617 "movq " #b ", %%mm1 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1618 "movq " #d ", %%mm2 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1619 "movq " #e ", %%mm3 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1620 PAVGB(%%mm2, %%mm1) /* (b+d) /2 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1621 PAVGB(%%mm3, %%mm0) /* a(a+e) /2 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1622 "movq %%mm0, %%mm2 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1623 "punpcklbw %%mm7, %%mm0 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1624 "punpckhbw %%mm7, %%mm2 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1625 "movq %%mm1, %%mm3 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1626 "punpcklbw %%mm7, %%mm1 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1627 "punpckhbw %%mm7, %%mm3 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1628 "psubw %%mm1, %%mm0 \n\t" /* L(a+e - (b+d))/2 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1629 "psubw %%mm3, %%mm2 \n\t" /* H(a+e - (b+d))/2 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1630 "psraw $3, %%mm0 \n\t" /* L(a+e - (b+d))/16 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1631 "psraw $3, %%mm2 \n\t" /* H(a+e - (b+d))/16 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1632 "psubw %%mm0, %%mm1 \n\t" /* L(9b + 9d - a - e)/16 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1633 "psubw %%mm2, %%mm3 \n\t" /* H(9b + 9d - a - e)/16 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1634 "packuswb %%mm3, %%mm1 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1635 "movq %%mm1, " #c " \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1636 #define DEINT_CUBIC(a,b,c,d,e) REAL_DEINT_CUBIC(a,b,c,d,e)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1637
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1638 DEINT_CUBIC((%0), (%%REGa, %1), (%%REGa, %1, 2), (%0, %1, 4), (%%REGd, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1639 DEINT_CUBIC((%%REGa, %1), (%0, %1, 4), (%%REGd), (%%REGd, %1), (%0, %1, 8))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1640 DEINT_CUBIC((%0, %1, 4), (%%REGd, %1), (%%REGd, %1, 2), (%0, %1, 8), (%%REGc))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1641 DEINT_CUBIC((%%REGd, %1), (%0, %1, 8), (%%REGd, %1, 4), (%%REGc), (%%REGc, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1642
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1643 : : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1644 : "%"REG_a, "%"REG_d, "%"REG_c
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1645 );
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1646 #else
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1647 int x;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1648 src+= stride*3;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1649 for(x=0; x<8; x++)
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1650 {
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1651 src[stride*3] = CLIP((-src[0] + 9*src[stride*2] + 9*src[stride*4] - src[stride*6])>>4);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1652 src[stride*5] = CLIP((-src[stride*2] + 9*src[stride*4] + 9*src[stride*6] - src[stride*8])>>4);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1653 src[stride*7] = CLIP((-src[stride*4] + 9*src[stride*6] + 9*src[stride*8] - src[stride*10])>>4);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1654 src[stride*9] = CLIP((-src[stride*6] + 9*src[stride*8] + 9*src[stride*10] - src[stride*12])>>4);
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1655 src++;
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1656 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1657 #endif
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1658 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1659
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1660 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1661 * Deinterlaces the given block by filtering every second line with a (-1 4 2 4 -1) filter.
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1662 * will be called for every 8x8 block and can read & write from line 4-15
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1663 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1664 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1665 * this filter will read lines 4-13 and write 5-11
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1666 */
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1667 static inline void RENAME(deInterlaceFF)(uint8_t src[], int stride, uint8_t *tmp)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1668 {
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1669 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1670 src+= stride*4;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1671 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1672 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1673 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1674 "pxor %%mm7, %%mm7 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1675 "movq (%2), %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1676 // 0 1 2 3 4 5 6 7 8 9 10
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1677 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1 ecx
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1678
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1679 #define REAL_DEINT_FF(a,b,c,d)\
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1680 "movq " #a ", %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1681 "movq " #b ", %%mm2 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1682 "movq " #c ", %%mm3 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1683 "movq " #d ", %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1684 PAVGB(%%mm3, %%mm1) \
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1685 PAVGB(%%mm4, %%mm0) \
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1686 "movq %%mm0, %%mm3 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1687 "punpcklbw %%mm7, %%mm0 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1688 "punpckhbw %%mm7, %%mm3 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1689 "movq %%mm1, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1690 "punpcklbw %%mm7, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1691 "punpckhbw %%mm7, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1692 "psllw $2, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1693 "psllw $2, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1694 "psubw %%mm0, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1695 "psubw %%mm3, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1696 "movq %%mm2, %%mm5 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1697 "movq %%mm2, %%mm0 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1698 "punpcklbw %%mm7, %%mm2 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1699 "punpckhbw %%mm7, %%mm5 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1700 "paddw %%mm2, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1701 "paddw %%mm5, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1702 "psraw $2, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1703 "psraw $2, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1704 "packuswb %%mm4, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1705 "movq %%mm1, " #b " \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1706
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1707 #define DEINT_FF(a,b,c,d) REAL_DEINT_FF(a,b,c,d)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1708
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1709 DEINT_FF((%0) , (%%REGa) , (%%REGa, %1), (%%REGa, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1710 DEINT_FF((%%REGa, %1), (%%REGa, %1, 2), (%0, %1, 4), (%%REGd) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1711 DEINT_FF((%0, %1, 4), (%%REGd) , (%%REGd, %1), (%%REGd, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1712 DEINT_FF((%%REGd, %1), (%%REGd, %1, 2), (%0, %1, 8), (%%REGd, %1, 4))
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1713
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1714 "movq %%mm0, (%2) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1715 : : "r" (src), "r" ((long)stride), "r"(tmp)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1716 : "%"REG_a, "%"REG_d
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1717 );
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1718 #else
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1719 int x;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1720 src+= stride*4;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1721 for(x=0; x<8; x++)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1722 {
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1723 int t1= tmp[x];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1724 int t2= src[stride*1];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1725
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1726 src[stride*1]= CLIP((-t1 + 4*src[stride*0] + 2*t2 + 4*src[stride*2] - src[stride*3] + 4)>>3);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1727 t1= src[stride*4];
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1728 src[stride*3]= CLIP((-t2 + 4*src[stride*2] + 2*t1 + 4*src[stride*4] - src[stride*5] + 4)>>3);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1729 t2= src[stride*6];
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1730 src[stride*5]= CLIP((-t1 + 4*src[stride*4] + 2*t2 + 4*src[stride*6] - src[stride*7] + 4)>>3);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1731 t1= src[stride*8];
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1732 src[stride*7]= CLIP((-t2 + 4*src[stride*6] + 2*t1 + 4*src[stride*8] - src[stride*9] + 4)>>3);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1733 tmp[x]= t1;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1734
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1735 src++;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1736 }
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1737 #endif
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1738 }
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1739
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1740 /**
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1741 * Deinterlaces the given block by filtering every line with a (-1 2 6 2 -1) filter.
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1742 * will be called for every 8x8 block and can read & write from line 4-15
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1743 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1744 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1745 * this filter will read lines 4-13 and write 4-11
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1746 */
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1747 static inline void RENAME(deInterlaceL5)(uint8_t src[], int stride, uint8_t *tmp, uint8_t *tmp2)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1748 {
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1749 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1750 src+= stride*4;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1751 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1752 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1753 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1754 "pxor %%mm7, %%mm7 \n\t"
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1755 "movq (%2), %%mm0 \n\t"
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1756 "movq (%3), %%mm1 \n\t"
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1757 // 0 1 2 3 4 5 6 7 8 9 10
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1758 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1 ecx
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1759
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1760 #define REAL_DEINT_L5(t1,t2,a,b,c)\
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1761 "movq " #a ", %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1762 "movq " #b ", %%mm3 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1763 "movq " #c ", %%mm4 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1764 PAVGB(t2, %%mm3) \
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1765 PAVGB(t1, %%mm4) \
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1766 "movq %%mm2, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1767 "movq %%mm2, " #t1 " \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1768 "punpcklbw %%mm7, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1769 "punpckhbw %%mm7, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1770 "movq %%mm2, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1771 "paddw %%mm2, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1772 "paddw %%mm6, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1773 "movq %%mm5, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1774 "paddw %%mm5, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1775 "paddw %%mm6, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1776 "movq %%mm3, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1777 "punpcklbw %%mm7, %%mm3 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1778 "punpckhbw %%mm7, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1779 "paddw %%mm3, %%mm3 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1780 "paddw %%mm6, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1781 "paddw %%mm3, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1782 "paddw %%mm6, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1783 "movq %%mm4, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1784 "punpcklbw %%mm7, %%mm4 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1785 "punpckhbw %%mm7, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1786 "psubw %%mm4, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1787 "psubw %%mm6, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1788 "psraw $2, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1789 "psraw $2, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1790 "packuswb %%mm5, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1791 "movq %%mm2, " #a " \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1792
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1793 #define DEINT_L5(t1,t2,a,b,c) REAL_DEINT_L5(t1,t2,a,b,c)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1794
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1795 DEINT_L5(%%mm0, %%mm1, (%0) , (%%REGa) , (%%REGa, %1) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1796 DEINT_L5(%%mm1, %%mm0, (%%REGa) , (%%REGa, %1) , (%%REGa, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1797 DEINT_L5(%%mm0, %%mm1, (%%REGa, %1) , (%%REGa, %1, 2), (%0, %1, 4) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1798 DEINT_L5(%%mm1, %%mm0, (%%REGa, %1, 2), (%0, %1, 4) , (%%REGd) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1799 DEINT_L5(%%mm0, %%mm1, (%0, %1, 4) , (%%REGd) , (%%REGd, %1) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1800 DEINT_L5(%%mm1, %%mm0, (%%REGd) , (%%REGd, %1) , (%%REGd, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1801 DEINT_L5(%%mm0, %%mm1, (%%REGd, %1) , (%%REGd, %1, 2), (%0, %1, 8) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1802 DEINT_L5(%%mm1, %%mm0, (%%REGd, %1, 2), (%0, %1, 8) , (%%REGd, %1, 4))
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1803
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1804 "movq %%mm0, (%2) \n\t"
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1805 "movq %%mm1, (%3) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1806 : : "r" (src), "r" ((long)stride), "r"(tmp), "r"(tmp2)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1807 : "%"REG_a, "%"REG_d
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1808 );
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1809 #else
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1810 int x;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1811 src+= stride*4;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1812 for(x=0; x<8; x++)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1813 {
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1814 int t1= tmp[x];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1815 int t2= tmp2[x];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1816 int t3= src[0];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1817
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1818 src[stride*0]= CLIP((-(t1 + src[stride*2]) + 2*(t2 + src[stride*1]) + 6*t3 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1819 t1= src[stride*1];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1820 src[stride*1]= CLIP((-(t2 + src[stride*3]) + 2*(t3 + src[stride*2]) + 6*t1 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1821 t2= src[stride*2];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1822 src[stride*2]= CLIP((-(t3 + src[stride*4]) + 2*(t1 + src[stride*3]) + 6*t2 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1823 t3= src[stride*3];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1824 src[stride*3]= CLIP((-(t1 + src[stride*5]) + 2*(t2 + src[stride*4]) + 6*t3 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1825 t1= src[stride*4];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1826 src[stride*4]= CLIP((-(t2 + src[stride*6]) + 2*(t3 + src[stride*5]) + 6*t1 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1827 t2= src[stride*5];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1828 src[stride*5]= CLIP((-(t3 + src[stride*7]) + 2*(t1 + src[stride*6]) + 6*t2 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1829 t3= src[stride*6];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1830 src[stride*6]= CLIP((-(t1 + src[stride*8]) + 2*(t2 + src[stride*7]) + 6*t3 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1831 t1= src[stride*7];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1832 src[stride*7]= CLIP((-(t2 + src[stride*9]) + 2*(t3 + src[stride*8]) + 6*t1 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1833
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1834 tmp[x]= t3;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1835 tmp2[x]= t1;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1836
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1837 src++;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1838 }
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1839 #endif
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1840 }
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1841
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1842 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1843 * Deinterlaces the given block by filtering all lines with a (1 2 1) filter.
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1844 * will be called for every 8x8 block and can read & write from line 4-15
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1845 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1846 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1847 * this filter will read lines 4-13 and write 4-11
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1848 */
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1849 static inline void RENAME(deInterlaceBlendLinear)(uint8_t src[], int stride, uint8_t *tmp)
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1850 {
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1851 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1852 src+= 4*stride;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1853 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1854 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1855 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1856 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1857 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1858
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1859 "movq (%2), %%mm0 \n\t" // L0
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1860 "movq (%%"REG_a"), %%mm1 \n\t" // L2
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1861 PAVGB(%%mm1, %%mm0) // L0+L2
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1862 "movq (%0), %%mm2 \n\t" // L1
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1863 PAVGB(%%mm2, %%mm0)
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1864 "movq %%mm0, (%0) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1865 "movq (%%"REG_a", %1), %%mm0 \n\t" // L3
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1866 PAVGB(%%mm0, %%mm2) // L1+L3
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1867 PAVGB(%%mm1, %%mm2) // 2L2 + L1 + L3
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1868 "movq %%mm2, (%%"REG_a") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1869 "movq (%%"REG_a", %1, 2), %%mm2 \n\t" // L4
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1870 PAVGB(%%mm2, %%mm1) // L2+L4
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1871 PAVGB(%%mm0, %%mm1) // 2L3 + L2 + L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1872 "movq %%mm1, (%%"REG_a", %1) \n\t"
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1873 "movq (%0, %1, 4), %%mm1 \n\t" // L5
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1874 PAVGB(%%mm1, %%mm0) // L3+L5
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1875 PAVGB(%%mm2, %%mm0) // 2L4 + L3 + L5
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1876 "movq %%mm0, (%%"REG_a", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1877 "movq (%%"REG_d"), %%mm0 \n\t" // L6
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1878 PAVGB(%%mm0, %%mm2) // L4+L6
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1879 PAVGB(%%mm1, %%mm2) // 2L5 + L4 + L6
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1880 "movq %%mm2, (%0, %1, 4) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1881 "movq (%%"REG_d", %1), %%mm2 \n\t" // L7
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1882 PAVGB(%%mm2, %%mm1) // L5+L7
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1883 PAVGB(%%mm0, %%mm1) // 2L6 + L5 + L7
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1884 "movq %%mm1, (%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1885 "movq (%%"REG_d", %1, 2), %%mm1 \n\t" // L8
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1886 PAVGB(%%mm1, %%mm0) // L6+L8
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1887 PAVGB(%%mm2, %%mm0) // 2L7 + L6 + L8
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1888 "movq %%mm0, (%%"REG_d", %1) \n\t"
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1889 "movq (%0, %1, 8), %%mm0 \n\t" // L9
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1890 PAVGB(%%mm0, %%mm2) // L7+L9
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1891 PAVGB(%%mm1, %%mm2) // 2L8 + L7 + L9
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1892 "movq %%mm2, (%%"REG_d", %1, 2) \n\t"
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1893 "movq %%mm1, (%2) \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1894
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1895 : : "r" (src), "r" ((long)stride), "r" (tmp)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1896 : "%"REG_a, "%"REG_d
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1897 );
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1898 #else
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1899 int a, b, c, x;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1900 src+= 4*stride;
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1901
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1902 for(x=0; x<2; x++){
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1903 a= *(uint32_t*)&tmp[stride*0];
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1904 b= *(uint32_t*)&src[stride*0];
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1905 c= *(uint32_t*)&src[stride*1];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1906 a= (a&c) + (((a^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1907 *(uint32_t*)&src[stride*0]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1908
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1909 a= *(uint32_t*)&src[stride*2];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1910 b= (a&b) + (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1911 *(uint32_t*)&src[stride*1]= (c|b) - (((c^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1912
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1913 b= *(uint32_t*)&src[stride*3];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1914 c= (b&c) + (((b^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1915 *(uint32_t*)&src[stride*2]= (c|a) - (((c^a)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1916
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1917 c= *(uint32_t*)&src[stride*4];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1918 a= (a&c) + (((a^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1919 *(uint32_t*)&src[stride*3]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1920
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1921 a= *(uint32_t*)&src[stride*5];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1922 b= (a&b) + (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1923 *(uint32_t*)&src[stride*4]= (c|b) - (((c^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1924
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1925 b= *(uint32_t*)&src[stride*6];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1926 c= (b&c) + (((b^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1927 *(uint32_t*)&src[stride*5]= (c|a) - (((c^a)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1928
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1929 c= *(uint32_t*)&src[stride*7];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1930 a= (a&c) + (((a^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1931 *(uint32_t*)&src[stride*6]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1932
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1933 a= *(uint32_t*)&src[stride*8];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1934 b= (a&b) + (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1935 *(uint32_t*)&src[stride*7]= (c|b) - (((c^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1936
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1937 *(uint32_t*)&tmp[stride*0]= c;
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1938 src += 4;
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1939 tmp += 4;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1940 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1941 #endif
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1942 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1943
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1944 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1945 * Deinterlaces the given block by applying a median filter to every second line.
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1946 * will be called for every 8x8 block and can read & write from line 4-15,
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1947 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1948 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1949 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
1950 static inline void RENAME(deInterlaceMedian)(uint8_t src[], int stride)
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1951 {
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
1952 #ifdef HAVE_MMX
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1953 src+= 4*stride;
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
1954 #ifdef HAVE_MMX2
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1955 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1956 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1957 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1958 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1959 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1960
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1961 "movq (%0), %%mm0 \n\t" //
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1962 "movq (%%"REG_a", %1), %%mm2 \n\t" //
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1963 "movq (%%"REG_a"), %%mm1 \n\t" //
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1964 "movq %%mm0, %%mm3 \n\t"
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1965 "pmaxub %%mm1, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1966 "pminub %%mm3, %%mm1 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1967 "pmaxub %%mm2, %%mm1 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1968 "pminub %%mm1, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1969 "movq %%mm0, (%%"REG_a") \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1970
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1971 "movq (%0, %1, 4), %%mm0 \n\t" //
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1972 "movq (%%"REG_a", %1, 2), %%mm1 \n\t" //
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1973 "movq %%mm2, %%mm3 \n\t"
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1974 "pmaxub %%mm1, %%mm2 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1975 "pminub %%mm3, %%mm1 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1976 "pmaxub %%mm0, %%mm1 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1977 "pminub %%mm1, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1978 "movq %%mm2, (%%"REG_a", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1979
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1980 "movq (%%"REG_d"), %%mm2 \n\t" //
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1981 "movq (%%"REG_d", %1), %%mm1 \n\t" //
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1982 "movq %%mm2, %%mm3 \n\t"
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1983 "pmaxub %%mm0, %%mm2 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1984 "pminub %%mm3, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1985 "pmaxub %%mm1, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1986 "pminub %%mm0, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1987 "movq %%mm2, (%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1988
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1989 "movq (%%"REG_d", %1, 2), %%mm2 \n\t" //
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1990 "movq (%0, %1, 8), %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1991 "movq %%mm2, %%mm3 \n\t"
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1992 "pmaxub %%mm0, %%mm2 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1993 "pminub %%mm3, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1994 "pmaxub %%mm1, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1995 "pminub %%mm0, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1996 "movq %%mm2, (%%"REG_d", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1997
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1998
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1999 : : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2000 : "%"REG_a, "%"REG_d
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2001 );
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2002
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2003 #else // MMX without MMX2
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2004 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2005 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2006 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2007 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2008 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2009 "pxor %%mm7, %%mm7 \n\t"
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2010
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2011 #define REAL_MEDIAN(a,b,c)\
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2012 "movq " #a ", %%mm0 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2013 "movq " #b ", %%mm2 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2014 "movq " #c ", %%mm1 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2015 "movq %%mm0, %%mm3 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2016 "movq %%mm1, %%mm4 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2017 "movq %%mm2, %%mm5 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2018 "psubusb %%mm1, %%mm3 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2019 "psubusb %%mm2, %%mm4 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2020 "psubusb %%mm0, %%mm5 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2021 "pcmpeqb %%mm7, %%mm3 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2022 "pcmpeqb %%mm7, %%mm4 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2023 "pcmpeqb %%mm7, %%mm5 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2024 "movq %%mm3, %%mm6 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2025 "pxor %%mm4, %%mm3 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2026 "pxor %%mm5, %%mm4 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2027 "pxor %%mm6, %%mm5 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2028 "por %%mm3, %%mm1 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2029 "por %%mm4, %%mm2 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2030 "por %%mm5, %%mm0 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2031 "pand %%mm2, %%mm0 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2032 "pand %%mm1, %%mm0 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2033 "movq %%mm0, " #b " \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2034 #define MEDIAN(a,b,c) REAL_MEDIAN(a,b,c)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2035
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2036 MEDIAN((%0), (%%REGa), (%%REGa, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2037 MEDIAN((%%REGa, %1), (%%REGa, %1, 2), (%0, %1, 4))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2038 MEDIAN((%0, %1, 4), (%%REGd), (%%REGd, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2039 MEDIAN((%%REGd, %1), (%%REGd, %1, 2), (%0, %1, 8))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2040
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2041 : : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2042 : "%"REG_a, "%"REG_d
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2043 );
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2044 #endif // MMX
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2045 #else
1029
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2046 int x, y;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
2047 src+= 4*stride;
1029
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2048 // FIXME - there should be a way to do a few columns in parallel like w/mmx
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2049 for(x=0; x<8; x++)
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2050 {
1029
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2051 uint8_t *colsrc = src;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2052 for (y=0; y<4; y++)
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2053 {
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2054 int a, b, c, d, e, f;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2055 a = colsrc[0 ];
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2056 b = colsrc[stride ];
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2057 c = colsrc[stride*2];
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2058 d = (a-b)>>31;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2059 e = (b-c)>>31;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2060 f = (c-a)>>31;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2061 colsrc[stride ] = (a|(d^f)) & (b|(d^e)) & (c|(e^f));
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2062 colsrc += stride*2;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2063 }
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2064 src++;
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2065 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2066 #endif
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2067 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2068
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
2069 #ifdef HAVE_MMX
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2070 /**
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2071 * transposes and shift the given 8x8 Block into dst1 and dst2
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2072 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
2073 static inline void RENAME(transpose1)(uint8_t *dst1, uint8_t *dst2, uint8_t *src, int srcStride)
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2074 {
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2075 asm(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2076 "lea (%0, %1), %%"REG_a" \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2077 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2078 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2079 "movq (%0), %%mm0 \n\t" // 12345678
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2080 "movq (%%"REG_a"), %%mm1 \n\t" // abcdefgh
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2081 "movq %%mm0, %%mm2 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2082 "punpcklbw %%mm1, %%mm0 \n\t" // 1a2b3c4d
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2083 "punpckhbw %%mm1, %%mm2 \n\t" // 5e6f7g8h
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2084
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2085 "movq (%%"REG_a", %1), %%mm1 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2086 "movq (%%"REG_a", %1, 2), %%mm3 \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2087 "movq %%mm1, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2088 "punpcklbw %%mm3, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2089 "punpckhbw %%mm3, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2090
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2091 "movq %%mm0, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2092 "punpcklwd %%mm1, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2093 "punpckhwd %%mm1, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2094 "movq %%mm2, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2095 "punpcklwd %%mm4, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2096 "punpckhwd %%mm4, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2097
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2098 "movd %%mm0, 128(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2099 "psrlq $32, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2100 "movd %%mm0, 144(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2101 "movd %%mm3, 160(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2102 "psrlq $32, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2103 "movd %%mm3, 176(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2104 "movd %%mm3, 48(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2105 "movd %%mm2, 192(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2106 "movd %%mm2, 64(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2107 "psrlq $32, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2108 "movd %%mm2, 80(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2109 "movd %%mm1, 96(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2110 "psrlq $32, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2111 "movd %%mm1, 112(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2112
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2113 "lea (%%"REG_a", %1, 4), %%"REG_a" \n\t"
789
54079a650ba8 using fewer registers (fixes compilation bug hopefully)
michael
parents: 788
diff changeset
2114
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2115 "movq (%0, %1, 4), %%mm0 \n\t" // 12345678
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2116 "movq (%%"REG_a"), %%mm1 \n\t" // abcdefgh
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2117 "movq %%mm0, %%mm2 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2118 "punpcklbw %%mm1, %%mm0 \n\t" // 1a2b3c4d
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2119 "punpckhbw %%mm1, %%mm2 \n\t" // 5e6f7g8h
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2120
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2121 "movq (%%"REG_a", %1), %%mm1 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2122 "movq (%%"REG_a", %1, 2), %%mm3 \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2123 "movq %%mm1, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2124 "punpcklbw %%mm3, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2125 "punpckhbw %%mm3, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2126
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2127 "movq %%mm0, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2128 "punpcklwd %%mm1, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2129 "punpckhwd %%mm1, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2130 "movq %%mm2, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2131 "punpcklwd %%mm4, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2132 "punpckhwd %%mm4, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2133
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2134 "movd %%mm0, 132(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2135 "psrlq $32, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2136 "movd %%mm0, 148(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2137 "movd %%mm3, 164(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2138 "psrlq $32, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2139 "movd %%mm3, 180(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2140 "movd %%mm3, 52(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2141 "movd %%mm2, 196(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2142 "movd %%mm2, 68(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2143 "psrlq $32, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2144 "movd %%mm2, 84(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2145 "movd %%mm1, 100(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2146 "psrlq $32, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2147 "movd %%mm1, 116(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2148
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2149
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2150 :: "r" (src), "r" ((long)srcStride), "r" (dst1), "r" (dst2)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2151 : "%"REG_a
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2152 );
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2153 }
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2154
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2155 /**
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2156 * transposes the given 8x8 block
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2157 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
2158 static inline void RENAME(transpose2)(uint8_t *dst, int dstStride, uint8_t *src)
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2159 {
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2160 asm(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2161 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2162 "lea (%%"REG_a",%1,4), %%"REG_d"\n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2163 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2164 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2165 "movq (%2), %%mm0 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2166 "movq 16(%2), %%mm1 \n\t" // abcdefgh
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2167 "movq %%mm0, %%mm2 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2168 "punpcklbw %%mm1, %%mm0 \n\t" // 1a2b3c4d
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2169 "punpckhbw %%mm1, %%mm2 \n\t" // 5e6f7g8h
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2170
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2171 "movq 32(%2), %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2172 "movq 48(%2), %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2173 "movq %%mm1, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2174 "punpcklbw %%mm3, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2175 "punpckhbw %%mm3, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2176
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2177 "movq %%mm0, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2178 "punpcklwd %%mm1, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2179 "punpckhwd %%mm1, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2180 "movq %%mm2, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2181 "punpcklwd %%mm4, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2182 "punpckhwd %%mm4, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2183
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2184 "movd %%mm0, (%0) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2185 "psrlq $32, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2186 "movd %%mm0, (%%"REG_a") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2187 "movd %%mm3, (%%"REG_a", %1) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2188 "psrlq $32, %%mm3 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2189 "movd %%mm3, (%%"REG_a", %1, 2) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2190 "movd %%mm2, (%0, %1, 4) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2191 "psrlq $32, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2192 "movd %%mm2, (%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2193 "movd %%mm1, (%%"REG_d", %1) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2194 "psrlq $32, %%mm1 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2195 "movd %%mm1, (%%"REG_d", %1, 2) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2196
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2197
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2198 "movq 64(%2), %%mm0 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2199 "movq 80(%2), %%mm1 \n\t" // abcdefgh
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2200 "movq %%mm0, %%mm2 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2201 "punpcklbw %%mm1, %%mm0 \n\t" // 1a2b3c4d
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2202 "punpckhbw %%mm1, %%mm2 \n\t" // 5e6f7g8h
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2203
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2204 "movq 96(%2), %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2205 "movq 112(%2), %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2206 "movq %%mm1, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2207 "punpcklbw %%mm3, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2208 "punpckhbw %%mm3, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2209
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2210 "movq %%mm0, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2211 "punpcklwd %%mm1, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2212 "punpckhwd %%mm1, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2213 "movq %%mm2, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2214 "punpcklwd %%mm4, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2215 "punpckhwd %%mm4, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2216
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2217 "movd %%mm0, 4(%0) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2218 "psrlq $32, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2219 "movd %%mm0, 4(%%"REG_a") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2220 "movd %%mm3, 4(%%"REG_a", %1) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2221 "psrlq $32, %%mm3 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2222 "movd %%mm3, 4(%%"REG_a", %1, 2) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2223 "movd %%mm2, 4(%0, %1, 4) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2224 "psrlq $32, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2225 "movd %%mm2, 4(%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2226 "movd %%mm1, 4(%%"REG_d", %1) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2227 "psrlq $32, %%mm1 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2228 "movd %%mm1, 4(%%"REG_d", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2229
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2230 :: "r" (dst), "r" ((long)dstStride), "r" (src)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2231 : "%"REG_a, "%"REG_d
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2232 );
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2233 }
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
2234 #endif
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2235 //static long test=0;
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2236
2041
b996fbe0a7e7 Newer version, using a vectorized version of the
michael
parents: 2040
diff changeset
2237 #ifndef HAVE_ALTIVEC
943
0566d1a8426f 10l (int i)
michael
parents: 941
diff changeset
2238 static inline void RENAME(tempNoiseReducer)(uint8_t *src, int stride,
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2239 uint8_t *tempBlured, uint32_t *tempBluredPast, int *maxNoise)
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2240 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2241 // to save a register (FIXME do this outside of the loops)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2242 tempBluredPast[127]= maxNoise[0];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2243 tempBluredPast[128]= maxNoise[1];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2244 tempBluredPast[129]= maxNoise[2];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2245
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2246 #define FAST_L2_DIFF
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2247 //#define L1_DIFF //u should change the thresholds too if u try that one
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2248 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2249 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2250 "lea (%2, %2, 2), %%"REG_a" \n\t" // 3*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2251 "lea (%2, %2, 4), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2252 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2253 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2254 // %x %x+%2 %x+2%2 %x+eax %x+4%2 %x+edx %x+2eax %x+ecx %x+8%2
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2255 //FIXME reorder?
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2256 #ifdef L1_DIFF //needs mmx2
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2257 "movq (%0), %%mm0 \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2258 "psadbw (%1), %%mm0 \n\t" // |L0-R0|
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2259 "movq (%0, %2), %%mm1 \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2260 "psadbw (%1, %2), %%mm1 \n\t" // |L1-R1|
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2261 "movq (%0, %2, 2), %%mm2 \n\t" // L2
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2262 "psadbw (%1, %2, 2), %%mm2 \n\t" // |L2-R2|
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2263 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2264 "psadbw (%1, %%"REG_a"), %%mm3 \n\t" // |L3-R3|
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2265
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2266 "movq (%0, %2, 4), %%mm4 \n\t" // L4
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2267 "paddw %%mm1, %%mm0 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2268 "psadbw (%1, %2, 4), %%mm4 \n\t" // |L4-R4|
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2269 "movq (%0, %%"REG_d"), %%mm5 \n\t" // L5
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2270 "paddw %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2271 "psadbw (%1, %%"REG_d"), %%mm5 \n\t" // |L5-R5|
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2272 "movq (%0, %%"REG_a", 2), %%mm6 \n\t" // L6
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2273 "paddw %%mm3, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2274 "psadbw (%1, %%"REG_a", 2), %%mm6 \n\t" // |L6-R6|
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2275 "movq (%0, %%"REG_c"), %%mm7 \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2276 "paddw %%mm4, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2277 "psadbw (%1, %%"REG_c"), %%mm7 \n\t" // |L7-R7|
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2278 "paddw %%mm5, %%mm6 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2279 "paddw %%mm7, %%mm6 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2280 "paddw %%mm6, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2281 #else
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2282 #if defined (FAST_L2_DIFF)
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2283 "pcmpeqb %%mm7, %%mm7 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
2284 "movq "MANGLE(b80)", %%mm6 \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2285 "pxor %%mm0, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2286 #define REAL_L2_DIFF_CORE(a, b)\
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2287 "movq " #a ", %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2288 "movq " #b ", %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2289 "pxor %%mm7, %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2290 PAVGB(%%mm2, %%mm5)\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2291 "paddb %%mm6, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2292 "movq %%mm5, %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2293 "psllw $8, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2294 "pmaddwd %%mm5, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2295 "pmaddwd %%mm2, %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2296 "paddd %%mm2, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2297 "psrld $14, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2298 "paddd %%mm5, %%mm0 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2299
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2300 #else
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2301 "pxor %%mm7, %%mm7 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2302 "pxor %%mm0, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2303 #define REAL_L2_DIFF_CORE(a, b)\
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2304 "movq " #a ", %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2305 "movq " #b ", %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2306 "movq %%mm5, %%mm1 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2307 "movq %%mm2, %%mm3 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2308 "punpcklbw %%mm7, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2309 "punpckhbw %%mm7, %%mm1 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2310 "punpcklbw %%mm7, %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2311 "punpckhbw %%mm7, %%mm3 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2312 "psubw %%mm2, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2313 "psubw %%mm3, %%mm1 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2314 "pmaddwd %%mm5, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2315 "pmaddwd %%mm1, %%mm1 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2316 "paddd %%mm1, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2317 "paddd %%mm5, %%mm0 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2318
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2319 #endif
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2320
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2321 #define L2_DIFF_CORE(a, b) REAL_L2_DIFF_CORE(a, b)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2322
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2323 L2_DIFF_CORE((%0), (%1))
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2324 L2_DIFF_CORE((%0, %2), (%1, %2))
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2325 L2_DIFF_CORE((%0, %2, 2), (%1, %2, 2))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2326 L2_DIFF_CORE((%0, %%REGa), (%1, %%REGa))
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2327 L2_DIFF_CORE((%0, %2, 4), (%1, %2, 4))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2328 L2_DIFF_CORE((%0, %%REGd), (%1, %%REGd))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2329 L2_DIFF_CORE((%0, %%REGa,2), (%1, %%REGa,2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2330 L2_DIFF_CORE((%0, %%REGc), (%1, %%REGc))
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2331
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2332 #endif
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2333
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2334 "movq %%mm0, %%mm4 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2335 "psrlq $32, %%mm0 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2336 "paddd %%mm0, %%mm4 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2337 "movd %%mm4, %%ecx \n\t"
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2338 "shll $2, %%ecx \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2339 "mov %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2340 "addl -4(%%"REG_d"), %%ecx \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2341 "addl 4(%%"REG_d"), %%ecx \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2342 "addl -1024(%%"REG_d"), %%ecx \n\t"
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2343 "addl $4, %%ecx \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2344 "addl 1024(%%"REG_d"), %%ecx \n\t"
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2345 "shrl $3, %%ecx \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2346 "movl %%ecx, (%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2347
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2348 // "mov %3, %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2349 // "mov %%"REG_c", test \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2350 // "jmp 4f \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2351 "cmpl 512(%%"REG_d"), %%ecx \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2352 " jb 2f \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2353 "cmpl 516(%%"REG_d"), %%ecx \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2354 " jb 1f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2355
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2356 "lea (%%"REG_a", %2, 2), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2357 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2358 "movq (%0), %%mm0 \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2359 "movq (%0, %2), %%mm1 \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2360 "movq (%0, %2, 2), %%mm2 \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2361 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2362 "movq (%0, %2, 4), %%mm4 \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2363 "movq (%0, %%"REG_d"), %%mm5 \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2364 "movq (%0, %%"REG_a", 2), %%mm6 \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2365 "movq (%0, %%"REG_c"), %%mm7 \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2366 "movq %%mm0, (%1) \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2367 "movq %%mm1, (%1, %2) \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2368 "movq %%mm2, (%1, %2, 2) \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2369 "movq %%mm3, (%1, %%"REG_a") \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2370 "movq %%mm4, (%1, %2, 4) \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2371 "movq %%mm5, (%1, %%"REG_d") \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2372 "movq %%mm6, (%1, %%"REG_a", 2) \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2373 "movq %%mm7, (%1, %%"REG_c") \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2374 "jmp 4f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2375
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2376 "1: \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2377 "lea (%%"REG_a", %2, 2), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2378 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2379 "movq (%0), %%mm0 \n\t" // L0
363
ff766a367974 3dnow temporal denoiser bugfix by R«±mi Guyomarch <rguyom@pobox.com>
michael
parents: 334
diff changeset
2380 PAVGB((%1), %%mm0) // L0
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2381 "movq (%0, %2), %%mm1 \n\t" // L1
363
ff766a367974 3dnow temporal denoiser bugfix by R«±mi Guyomarch <rguyom@pobox.com>
michael
parents: 334
diff changeset
2382 PAVGB((%1, %2), %%mm1) // L1
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2383 "movq (%0, %2, 2), %%mm2 \n\t" // L2
363
ff766a367974 3dnow temporal denoiser bugfix by R«±mi Guyomarch <rguyom@pobox.com>
michael
parents: 334
diff changeset
2384 PAVGB((%1, %2, 2), %%mm2) // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2385 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2386 PAVGB((%1, %%REGa), %%mm3) // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2387 "movq (%0, %2, 4), %%mm4 \n\t" // L4
363
ff766a367974 3dnow temporal denoiser bugfix by R«±mi Guyomarch <rguyom@pobox.com>
michael
parents: 334
diff changeset
2388 PAVGB((%1, %2, 4), %%mm4) // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2389 "movq (%0, %%"REG_d"), %%mm5 \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2390 PAVGB((%1, %%REGd), %%mm5) // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2391 "movq (%0, %%"REG_a", 2), %%mm6 \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2392 PAVGB((%1, %%REGa, 2), %%mm6) // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2393 "movq (%0, %%"REG_c"), %%mm7 \n\t" // L7
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2394 PAVGB((%1, %%REGc), %%mm7) // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2395 "movq %%mm0, (%1) \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2396 "movq %%mm1, (%1, %2) \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2397 "movq %%mm2, (%1, %2, 2) \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2398 "movq %%mm3, (%1, %%"REG_a") \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2399 "movq %%mm4, (%1, %2, 4) \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2400 "movq %%mm5, (%1, %%"REG_d") \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2401 "movq %%mm6, (%1, %%"REG_a", 2) \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2402 "movq %%mm7, (%1, %%"REG_c") \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2403 "movq %%mm0, (%0) \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2404 "movq %%mm1, (%0, %2) \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2405 "movq %%mm2, (%0, %2, 2) \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2406 "movq %%mm3, (%0, %%"REG_a") \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2407 "movq %%mm4, (%0, %2, 4) \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2408 "movq %%mm5, (%0, %%"REG_d") \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2409 "movq %%mm6, (%0, %%"REG_a", 2) \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2410 "movq %%mm7, (%0, %%"REG_c") \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2411 "jmp 4f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2412
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2413 "2: \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2414 "cmpl 508(%%"REG_d"), %%ecx \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2415 " jb 3f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2416
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2417 "lea (%%"REG_a", %2, 2), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2418 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2419 "movq (%0), %%mm0 \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2420 "movq (%0, %2), %%mm1 \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2421 "movq (%0, %2, 2), %%mm2 \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2422 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2423 "movq (%1), %%mm4 \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2424 "movq (%1, %2), %%mm5 \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2425 "movq (%1, %2, 2), %%mm6 \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2426 "movq (%1, %%"REG_a"), %%mm7 \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2427 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2428 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2429 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2430 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2431 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2432 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2433 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2434 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2435 "movq %%mm0, (%1) \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2436 "movq %%mm1, (%1, %2) \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2437 "movq %%mm2, (%1, %2, 2) \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2438 "movq %%mm3, (%1, %%"REG_a") \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2439 "movq %%mm0, (%0) \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2440 "movq %%mm1, (%0, %2) \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2441 "movq %%mm2, (%0, %2, 2) \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2442 "movq %%mm3, (%0, %%"REG_a") \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2443
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2444 "movq (%0, %2, 4), %%mm0 \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2445 "movq (%0, %%"REG_d"), %%mm1 \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2446 "movq (%0, %%"REG_a", 2), %%mm2 \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2447 "movq (%0, %%"REG_c"), %%mm3 \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2448 "movq (%1, %2, 4), %%mm4 \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2449 "movq (%1, %%"REG_d"), %%mm5 \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2450 "movq (%1, %%"REG_a", 2), %%mm6 \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2451 "movq (%1, %%"REG_c"), %%mm7 \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2452 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2453 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2454 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2455 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2456 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2457 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2458 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2459 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2460 "movq %%mm0, (%1, %2, 4) \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2461 "movq %%mm1, (%1, %%"REG_d") \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2462 "movq %%mm2, (%1, %%"REG_a", 2) \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2463 "movq %%mm3, (%1, %%"REG_c") \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2464 "movq %%mm0, (%0, %2, 4) \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2465 "movq %%mm1, (%0, %%"REG_d") \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2466 "movq %%mm2, (%0, %%"REG_a", 2) \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2467 "movq %%mm3, (%0, %%"REG_c") \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2468 "jmp 4f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2469
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2470 "3: \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2471 "lea (%%"REG_a", %2, 2), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2472 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2473 "movq (%0), %%mm0 \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2474 "movq (%0, %2), %%mm1 \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2475 "movq (%0, %2, 2), %%mm2 \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2476 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2477 "movq (%1), %%mm4 \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2478 "movq (%1, %2), %%mm5 \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2479 "movq (%1, %2, 2), %%mm6 \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2480 "movq (%1, %%"REG_a"), %%mm7 \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2481 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2482 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2483 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2484 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2485 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2486 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2487 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2488 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2489 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2490 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2491 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2492 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2493 "movq %%mm0, (%1) \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2494 "movq %%mm1, (%1, %2) \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2495 "movq %%mm2, (%1, %2, 2) \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2496 "movq %%mm3, (%1, %%"REG_a") \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2497 "movq %%mm0, (%0) \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2498 "movq %%mm1, (%0, %2) \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2499 "movq %%mm2, (%0, %2, 2) \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2500 "movq %%mm3, (%0, %%"REG_a") \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2501
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2502 "movq (%0, %2, 4), %%mm0 \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2503 "movq (%0, %%"REG_d"), %%mm1 \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2504 "movq (%0, %%"REG_a", 2), %%mm2 \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2505 "movq (%0, %%"REG_c"), %%mm3 \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2506 "movq (%1, %2, 4), %%mm4 \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2507 "movq (%1, %%"REG_d"), %%mm5 \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2508 "movq (%1, %%"REG_a", 2), %%mm6 \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2509 "movq (%1, %%"REG_c"), %%mm7 \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2510 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2511 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2512 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2513 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2514 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2515 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2516 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2517 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2518 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2519 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2520 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2521 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2522 "movq %%mm0, (%1, %2, 4) \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2523 "movq %%mm1, (%1, %%"REG_d") \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2524 "movq %%mm2, (%1, %%"REG_a", 2) \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2525 "movq %%mm3, (%1, %%"REG_c") \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2526 "movq %%mm0, (%0, %2, 4) \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2527 "movq %%mm1, (%0, %%"REG_d") \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2528 "movq %%mm2, (%0, %%"REG_a", 2) \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2529 "movq %%mm3, (%0, %%"REG_c") \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2530
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2531 "4: \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2532
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2533 :: "r" (src), "r" (tempBlured), "r"((long)stride), "m" (tempBluredPast)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2534 : "%"REG_a, "%"REG_d, "%"REG_c, "memory"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2535 );
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2536 //printf("%d\n", test);
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2537 #else
788
425d71e81c37 fix compilation on non-x86 with gcc 2.95
colin
parents: 787
diff changeset
2538 {
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2539 int y;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2540 int d=0;
2041
b996fbe0a7e7 Newer version, using a vectorized version of the
michael
parents: 2040
diff changeset
2541 // int sysd=0;
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2542 int i;
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2543
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2544 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2545 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2546 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2547 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2548 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2549 int ref= tempBlured[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2550 int cur= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2551 int d1=ref - cur;
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2552 // if(x==0 || x==7) d1+= d1>>1;
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2553 // if(y==0 || y==7) d1+= d1>>1;
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2554 // d+= ABS(d1);
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2555 d+= d1*d1;
2041
b996fbe0a7e7 Newer version, using a vectorized version of the
michael
parents: 2040
diff changeset
2556 // sysd+= d1;
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2557 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2558 }
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2559 i=d;
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2560 d= (
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2561 4*d
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2562 +(*(tempBluredPast-256))
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2563 +(*(tempBluredPast-1))+ (*(tempBluredPast+1))
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2564 +(*(tempBluredPast+256))
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2565 +4)>>3;
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2566 *tempBluredPast=i;
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2567 // ((*tempBluredPast)*3 + d + 2)>>2;
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2568
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2569 //printf("%d %d %d\n", maxNoise[0], maxNoise[1], maxNoise[2]);
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2570 /*
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2571 Switch between
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2572 1 0 0 0 0 0 0 (0)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2573 64 32 16 8 4 2 1 (1)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2574 64 48 36 27 20 15 11 (33) (approx)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2575 64 56 49 43 37 33 29 (200) (approx)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2576 */
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2577 if(d > maxNoise[1])
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2578 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2579 if(d < maxNoise[2])
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2580 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2581 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2582 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2583 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2584 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2585 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2586 int ref= tempBlured[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2587 int cur= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2588 tempBlured[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2589 src[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2590 (ref + cur + 1)>>1;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2591 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2592 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2593 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2594 else
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2595 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2596 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2597 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2598 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2599 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2600 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2601 tempBlured[ x + y*stride ]= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2602 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2603 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2604 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2605 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2606 else
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2607 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2608 if(d < maxNoise[0])
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2609 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2610 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2611 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2612 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2613 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2614 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2615 int ref= tempBlured[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2616 int cur= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2617 tempBlured[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2618 src[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2619 (ref*7 + cur + 4)>>3;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2620 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2621 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2622 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2623 else
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2624 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2625 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2626 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2627 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2628 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2629 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2630 int ref= tempBlured[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2631 int cur= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2632 tempBlured[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2633 src[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2634 (ref*3 + cur + 2)>>2;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2635 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2636 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2637 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2638 }
788
425d71e81c37 fix compilation on non-x86 with gcc 2.95
colin
parents: 787
diff changeset
2639 }
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2640 #endif
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2641 }
2041
b996fbe0a7e7 Newer version, using a vectorized version of the
michael
parents: 2040
diff changeset
2642 #endif //HAVE_ALTIVEC
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2643
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2644 #ifdef HAVE_MMX
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2645 /**
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2646 * accurate deblock filter
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2647 */
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2648 static always_inline void RENAME(do_a_deblock)(uint8_t *src, int step, int stride, PPContext *c){
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2649 int64_t dc_mask, eq_mask;
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2650 int64_t sums[10*8*2];
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2651 src+= step*3; // src points to begin of the 8x8 Block
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2652 //START_TIMER
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2653 asm volatile(
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2654 "movq %0, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2655 "movq %1, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2656 : : "m" (c->mmxDcOffset[c->nonBQP]), "m" (c->mmxDcThreshold[c->nonBQP])
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2657 );
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2658
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2659 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2660 "lea (%2, %3), %%"REG_a" \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2661 // 0 1 2 3 4 5 6 7 8 9
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2662 // %1 eax eax+%2 eax+2%2 %1+4%2 ecx ecx+%2 ecx+2%2 %1+8%2 ecx+4%2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2663
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2664 "movq (%2), %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2665 "movq (%%"REG_a"), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2666 "movq %%mm1, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2667 "movq %%mm1, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2668 "psubb %%mm1, %%mm0 \n\t" // mm0 = differnece
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2669 "paddb %%mm7, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2670 "pcmpgtb %%mm6, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2671
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2672 "movq (%%"REG_a",%3), %%mm2 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2673 PMAXUB(%%mm2, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2674 PMINUB(%%mm2, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2675 "psubb %%mm2, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2676 "paddb %%mm7, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2677 "pcmpgtb %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2678 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2679
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2680 "movq (%%"REG_a", %3, 2), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2681 PMAXUB(%%mm1, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2682 PMINUB(%%mm1, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2683 "psubb %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2684 "paddb %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2685 "pcmpgtb %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2686 "paddb %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2687
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2688 "lea (%%"REG_a", %3, 4), %%"REG_a" \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2689
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2690 "movq (%2, %3, 4), %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2691 PMAXUB(%%mm2, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2692 PMINUB(%%mm2, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2693 "psubb %%mm2, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2694 "paddb %%mm7, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2695 "pcmpgtb %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2696 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2697
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2698 "movq (%%"REG_a"), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2699 PMAXUB(%%mm1, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2700 PMINUB(%%mm1, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2701 "psubb %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2702 "paddb %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2703 "pcmpgtb %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2704 "paddb %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2705
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2706 "movq (%%"REG_a", %3), %%mm2 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2707 PMAXUB(%%mm2, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2708 PMINUB(%%mm2, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2709 "psubb %%mm2, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2710 "paddb %%mm7, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2711 "pcmpgtb %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2712 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2713
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2714 "movq (%%"REG_a", %3, 2), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2715 PMAXUB(%%mm1, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2716 PMINUB(%%mm1, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2717 "psubb %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2718 "paddb %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2719 "pcmpgtb %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2720 "paddb %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2721
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2722 "movq (%2, %3, 8), %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2723 PMAXUB(%%mm2, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2724 PMINUB(%%mm2, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2725 "psubb %%mm2, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2726 "paddb %%mm7, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2727 "pcmpgtb %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2728 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2729
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2730 "movq (%%"REG_a", %3, 4), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2731 "psubb %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2732 "paddb %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2733 "pcmpgtb %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2734 "paddb %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2735 "psubusb %%mm3, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2736
2276
185f3b18ec1f 100l (signed vs. unsigend)
michael
parents: 2043
diff changeset
2737 "pxor %%mm6, %%mm6 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2738 "movq %4, %%mm7 \n\t" // QP,..., QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2739 "paddusb %%mm7, %%mm7 \n\t" // 2QP ... 2QP
2276
185f3b18ec1f 100l (signed vs. unsigend)
michael
parents: 2043
diff changeset
2740 "psubusb %%mm4, %%mm7 \n\t" // Diff >=2QP -> 0
185f3b18ec1f 100l (signed vs. unsigend)
michael
parents: 2043
diff changeset
2741 "pcmpeqb %%mm6, %%mm7 \n\t" // Diff < 2QP -> 0
185f3b18ec1f 100l (signed vs. unsigend)
michael
parents: 2043
diff changeset
2742 "pcmpeqb %%mm6, %%mm7 \n\t" // Diff < 2QP -> 0
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2743 "movq %%mm7, %1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2744
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2745 "movq %5, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2746 "punpcklbw %%mm7, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2747 "punpcklbw %%mm7, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2748 "punpcklbw %%mm7, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2749 "psubb %%mm0, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2750 "pcmpgtb %%mm7, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2751 "movq %%mm6, %0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2752
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2753 : "=m" (eq_mask), "=m" (dc_mask)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2754 : "r" (src), "r" ((long)step), "m" (c->pQPb), "m"(c->ppMode.flatnessThreshold)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2755 : "%"REG_a
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2756 );
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2757
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2758 if(dc_mask & eq_mask){
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2759 long offset= -8*step;
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2760 int64_t *temp_sums= sums;
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2761
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2762 asm volatile(
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2763 "movq %2, %%mm0 \n\t" // QP,..., QP
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2764 "pxor %%mm4, %%mm4 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2765
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2766 "movq (%0), %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2767 "movq (%0, %1), %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2768 "movq %%mm5, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2769 "movq %%mm6, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2770 "psubusb %%mm6, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2771 "psubusb %%mm1, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2772 "por %%mm5, %%mm2 \n\t" // ABS Diff of lines
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2773 "psubusb %%mm2, %%mm0 \n\t" // diff >= QP -> 0
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2774 "pcmpeqb %%mm4, %%mm0 \n\t" // diff >= QP -> FF
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2775
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2776 "pxor %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2777 "pand %%mm0, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2778 "pxor %%mm1, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2779 // 0:QP 6:First
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2780
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2781 "movq (%0, %1, 8), %%mm5 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2782 "add %1, %0 \n\t" // %0 points to line 1 not 0
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2783 "movq (%0, %1, 8), %%mm7 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2784 "movq %%mm5, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2785 "movq %%mm7, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2786 "psubusb %%mm7, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2787 "psubusb %%mm1, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2788 "por %%mm5, %%mm2 \n\t" // ABS Diff of lines
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2789 "movq %2, %%mm0 \n\t" // QP,..., QP
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2790 "psubusb %%mm2, %%mm0 \n\t" // diff >= QP -> 0
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2791 "pcmpeqb %%mm4, %%mm0 \n\t" // diff >= QP -> FF
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2792
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2793 "pxor %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2794 "pand %%mm0, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2795 "pxor %%mm1, %%mm7 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2796
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2797 "movq %%mm6, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2798 "punpckhbw %%mm4, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2799 "punpcklbw %%mm4, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2800 // 4:0 5/6:First 7:Last
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2801
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2802 "movq %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2803 "movq %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2804 "psllw $2, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2805 "psllw $2, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2806 "paddw "MANGLE(w04)", %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2807 "paddw "MANGLE(w04)", %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2808
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2809 #define NEXT\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2810 "movq (%0), %%mm2 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2811 "movq (%0), %%mm3 \n\t"\
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2812 "add %1, %0 \n\t"\
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2813 "punpcklbw %%mm4, %%mm2 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2814 "punpckhbw %%mm4, %%mm3 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2815 "paddw %%mm2, %%mm0 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2816 "paddw %%mm3, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2817
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2818 #define PREV\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2819 "movq (%0), %%mm2 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2820 "movq (%0), %%mm3 \n\t"\
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2821 "add %1, %0 \n\t"\
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2822 "punpcklbw %%mm4, %%mm2 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2823 "punpckhbw %%mm4, %%mm3 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2824 "psubw %%mm2, %%mm0 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2825 "psubw %%mm3, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2826
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2827
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2828 NEXT //0
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2829 NEXT //1
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2830 NEXT //2
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2831 "movq %%mm0, (%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2832 "movq %%mm1, 8(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2833
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2834 NEXT //3
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2835 "psubw %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2836 "psubw %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2837 "movq %%mm0, 16(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2838 "movq %%mm1, 24(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2839
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2840 NEXT //4
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2841 "psubw %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2842 "psubw %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2843 "movq %%mm0, 32(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2844 "movq %%mm1, 40(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2845
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2846 NEXT //5
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2847 "psubw %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2848 "psubw %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2849 "movq %%mm0, 48(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2850 "movq %%mm1, 56(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2851
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2852 NEXT //6
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2853 "psubw %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2854 "psubw %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2855 "movq %%mm0, 64(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2856 "movq %%mm1, 72(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2857
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2858 "movq %%mm7, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2859 "punpckhbw %%mm4, %%mm7 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2860 "punpcklbw %%mm4, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2861
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2862 NEXT //7
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2863 "mov %4, %0 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2864 "add %1, %0 \n\t"
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2865 PREV //0
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2866 "movq %%mm0, 80(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2867 "movq %%mm1, 88(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2868
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2869 PREV //1
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2870 "paddw %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2871 "paddw %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2872 "movq %%mm0, 96(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2873 "movq %%mm1, 104(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2874
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2875 PREV //2
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2876 "paddw %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2877 "paddw %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2878 "movq %%mm0, 112(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2879 "movq %%mm1, 120(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2880
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2881 PREV //3
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2882 "paddw %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2883 "paddw %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2884 "movq %%mm0, 128(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2885 "movq %%mm1, 136(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2886
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2887 PREV //4
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2888 "paddw %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2889 "paddw %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2890 "movq %%mm0, 144(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2891 "movq %%mm1, 152(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2892
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2893 "mov %4, %0 \n\t" //FIXME
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2894
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2895 : "+&r"(src)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2896 : "r" ((long)step), "m" (c->pQPb), "r"(sums), "g"(src)
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2897 );
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2898
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2899 src+= step; // src points to begin of the 8x8 Block
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2900
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2901 asm volatile(
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2902 "movq %4, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2903 "pcmpeqb %%mm5, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2904 "pxor %%mm6, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2905 "pxor %%mm7, %%mm7 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2906
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2907 "1: \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2908 "movq (%1), %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2909 "movq 8(%1), %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2910 "paddw 32(%1), %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2911 "paddw 40(%1), %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2912 "movq (%0, %3), %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2913 "movq %%mm2, %%mm3 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2914 "movq %%mm2, %%mm4 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2915 "punpcklbw %%mm7, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2916 "punpckhbw %%mm7, %%mm3 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2917 "paddw %%mm2, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2918 "paddw %%mm3, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2919 "paddw %%mm2, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2920 "paddw %%mm3, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2921 "psrlw $4, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2922 "psrlw $4, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2923 "packuswb %%mm1, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2924 "pand %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2925 "pand %%mm5, %%mm4 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2926 "por %%mm4, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2927 "movq %%mm0, (%0, %3) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2928 "add $16, %1 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2929 "add %2, %0 \n\t"
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2930 " js 1b \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2931
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2932 : "+r"(offset), "+r"(temp_sums)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2933 : "r" ((long)step), "r"(src - offset), "m"(dc_mask & eq_mask)
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2934 );
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2935 }else
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2936 src+= step; // src points to begin of the 8x8 Block
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2937
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2938 if(eq_mask != -1LL){
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2939 uint8_t *temp_src= src;
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2940 asm volatile(
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2941 "pxor %%mm7, %%mm7 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2942 "lea -40(%%"REG_SP"), %%"REG_c" \n\t" // make space for 4 8-byte vars
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2943 "and "ALIGN_MASK", %%"REG_c" \n\t" // align
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2944 // 0 1 2 3 4 5 6 7 8 9
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2945 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1 %1+8%1 ecx+4%1
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2946
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2947 "movq (%0), %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2948 "movq %%mm0, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2949 "punpcklbw %%mm7, %%mm0 \n\t" // low part of line 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2950 "punpckhbw %%mm7, %%mm1 \n\t" // high part of line 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2951
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2952 "movq (%0, %1), %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2953 "lea (%0, %1, 2), %%"REG_a" \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2954 "movq %%mm2, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2955 "punpcklbw %%mm7, %%mm2 \n\t" // low part of line 1
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2956 "punpckhbw %%mm7, %%mm3 \n\t" // high part of line 1
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2957
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2958 "movq (%%"REG_a"), %%mm4 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2959 "movq %%mm4, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2960 "punpcklbw %%mm7, %%mm4 \n\t" // low part of line 2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2961 "punpckhbw %%mm7, %%mm5 \n\t" // high part of line 2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2962
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2963 "paddw %%mm0, %%mm0 \n\t" // 2L0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2964 "paddw %%mm1, %%mm1 \n\t" // 2H0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2965 "psubw %%mm4, %%mm2 \n\t" // L1 - L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2966 "psubw %%mm5, %%mm3 \n\t" // H1 - H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2967 "psubw %%mm2, %%mm0 \n\t" // 2L0 - L1 + L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2968 "psubw %%mm3, %%mm1 \n\t" // 2H0 - H1 + H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2969
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2970 "psllw $2, %%mm2 \n\t" // 4L1 - 4L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2971 "psllw $2, %%mm3 \n\t" // 4H1 - 4H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2972 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2973 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2974
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2975 "movq (%%"REG_a", %1), %%mm2 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2976 "movq %%mm2, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2977 "punpcklbw %%mm7, %%mm2 \n\t" // L3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2978 "punpckhbw %%mm7, %%mm3 \n\t" // H3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2979
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2980 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2 - L3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2981 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2 - H3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2982 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2983 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2984 "movq %%mm0, (%%"REG_c") \n\t" // 2L0 - 5L1 + 5L2 - 2L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2985 "movq %%mm1, 8(%%"REG_c") \n\t" // 2H0 - 5H1 + 5H2 - 2H3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2986
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2987 "movq (%%"REG_a", %1, 2), %%mm0 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2988 "movq %%mm0, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2989 "punpcklbw %%mm7, %%mm0 \n\t" // L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2990 "punpckhbw %%mm7, %%mm1 \n\t" // H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2991
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2992 "psubw %%mm0, %%mm2 \n\t" // L3 - L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2993 "psubw %%mm1, %%mm3 \n\t" // H3 - H4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2994 "movq %%mm2, 16(%%"REG_c") \n\t" // L3 - L4
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2995 "movq %%mm3, 24(%%"REG_c") \n\t" // H3 - H4
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2996 "paddw %%mm4, %%mm4 \n\t" // 2L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2997 "paddw %%mm5, %%mm5 \n\t" // 2H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2998 "psubw %%mm2, %%mm4 \n\t" // 2L2 - L3 + L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2999 "psubw %%mm3, %%mm5 \n\t" // 2H2 - H3 + H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3000
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3001 "lea (%%"REG_a", %1), %0 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3002 "psllw $2, %%mm2 \n\t" // 4L3 - 4L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3003 "psllw $2, %%mm3 \n\t" // 4H3 - 4H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3004 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3005 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3006 //50 opcodes so far
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3007 "movq (%0, %1, 2), %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3008 "movq %%mm2, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3009 "punpcklbw %%mm7, %%mm2 \n\t" // L5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3010 "punpckhbw %%mm7, %%mm3 \n\t" // H5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3011 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4 - L5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3012 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4 - H5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3013 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4 - 2L5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3014 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4 - 2H5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3015
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3016 "movq (%%"REG_a", %1, 4), %%mm6 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3017 "punpcklbw %%mm7, %%mm6 \n\t" // L6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3018 "psubw %%mm6, %%mm2 \n\t" // L5 - L6
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3019 "movq (%%"REG_a", %1, 4), %%mm6 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3020 "punpckhbw %%mm7, %%mm6 \n\t" // H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3021 "psubw %%mm6, %%mm3 \n\t" // H5 - H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3022
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3023 "paddw %%mm0, %%mm0 \n\t" // 2L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3024 "paddw %%mm1, %%mm1 \n\t" // 2H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3025 "psubw %%mm2, %%mm0 \n\t" // 2L4 - L5 + L6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3026 "psubw %%mm3, %%mm1 \n\t" // 2H4 - H5 + H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3027
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3028 "psllw $2, %%mm2 \n\t" // 4L5 - 4L6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3029 "psllw $2, %%mm3 \n\t" // 4H5 - 4H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3030 "psubw %%mm2, %%mm0 \n\t" // 2L4 - 5L5 + 5L6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3031 "psubw %%mm3, %%mm1 \n\t" // 2H4 - 5H5 + 5H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3032
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3033 "movq (%0, %1, 4), %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3034 "movq %%mm2, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3035 "punpcklbw %%mm7, %%mm2 \n\t" // L7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3036 "punpckhbw %%mm7, %%mm3 \n\t" // H7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3037
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3038 "paddw %%mm2, %%mm2 \n\t" // 2L7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3039 "paddw %%mm3, %%mm3 \n\t" // 2H7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3040 "psubw %%mm2, %%mm0 \n\t" // 2L4 - 5L5 + 5L6 - 2L7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3041 "psubw %%mm3, %%mm1 \n\t" // 2H4 - 5H5 + 5H6 - 2H7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3042
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3043 "movq (%%"REG_c"), %%mm2 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3044 "movq 8(%%"REG_c"), %%mm3 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3045
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3046 #ifdef HAVE_MMX2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3047 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3048 "psubw %%mm0, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3049 "pmaxsw %%mm6, %%mm0 \n\t" // |2L4 - 5L5 + 5L6 - 2L7|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3050 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3051 "psubw %%mm1, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3052 "pmaxsw %%mm6, %%mm1 \n\t" // |2H4 - 5H5 + 5H6 - 2H7|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3053 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3054 "psubw %%mm2, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3055 "pmaxsw %%mm6, %%mm2 \n\t" // |2L0 - 5L1 + 5L2 - 2L3|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3056 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3057 "psubw %%mm3, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3058 "pmaxsw %%mm6, %%mm3 \n\t" // |2H0 - 5H1 + 5H2 - 2H3|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3059 #else
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3060 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3061 "pcmpgtw %%mm0, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3062 "pxor %%mm6, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3063 "psubw %%mm6, %%mm0 \n\t" // |2L4 - 5L5 + 5L6 - 2L7|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3064 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3065 "pcmpgtw %%mm1, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3066 "pxor %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3067 "psubw %%mm6, %%mm1 \n\t" // |2H4 - 5H5 + 5H6 - 2H7|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3068 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3069 "pcmpgtw %%mm2, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3070 "pxor %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3071 "psubw %%mm6, %%mm2 \n\t" // |2L0 - 5L1 + 5L2 - 2L3|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3072 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3073 "pcmpgtw %%mm3, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3074 "pxor %%mm6, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3075 "psubw %%mm6, %%mm3 \n\t" // |2H0 - 5H1 + 5H2 - 2H3|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3076 #endif
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3077
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3078 #ifdef HAVE_MMX2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3079 "pminsw %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3080 "pminsw %%mm3, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3081 #else
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3082 "movq %%mm0, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3083 "psubusw %%mm2, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3084 "psubw %%mm6, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3085 "movq %%mm1, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3086 "psubusw %%mm3, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3087 "psubw %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3088 #endif
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3089
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3090 "movd %2, %%mm2 \n\t" // QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3091 "punpcklbw %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3092
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3093 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3094 "pcmpgtw %%mm4, %%mm6 \n\t" // sign(2L2 - 5L3 + 5L4 - 2L5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3095 "pxor %%mm6, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3096 "psubw %%mm6, %%mm4 \n\t" // |2L2 - 5L3 + 5L4 - 2L5|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3097 "pcmpgtw %%mm5, %%mm7 \n\t" // sign(2H2 - 5H3 + 5H4 - 2H5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3098 "pxor %%mm7, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3099 "psubw %%mm7, %%mm5 \n\t" // |2H2 - 5H3 + 5H4 - 2H5|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3100 // 100 opcodes
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3101 "psllw $3, %%mm2 \n\t" // 8QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3102 "movq %%mm2, %%mm3 \n\t" // 8QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3103 "pcmpgtw %%mm4, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3104 "pcmpgtw %%mm5, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3105 "pand %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3106 "pand %%mm3, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3107
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3108
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3109 "psubusw %%mm0, %%mm4 \n\t" // hd
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3110 "psubusw %%mm1, %%mm5 \n\t" // ld
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3111
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3112
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3113 "movq "MANGLE(w05)", %%mm2 \n\t" // 5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3114 "pmullw %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3115 "pmullw %%mm2, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3116 "movq "MANGLE(w20)", %%mm2 \n\t" // 32
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3117 "paddw %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3118 "paddw %%mm2, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3119 "psrlw $6, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3120 "psrlw $6, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3121
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3122 "movq 16(%%"REG_c"), %%mm0 \n\t" // L3 - L4
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3123 "movq 24(%%"REG_c"), %%mm1 \n\t" // H3 - H4
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3124
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3125 "pxor %%mm2, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3126 "pxor %%mm3, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3127
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3128 "pcmpgtw %%mm0, %%mm2 \n\t" // sign (L3-L4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3129 "pcmpgtw %%mm1, %%mm3 \n\t" // sign (H3-H4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3130 "pxor %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3131 "pxor %%mm3, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3132 "psubw %%mm2, %%mm0 \n\t" // |L3-L4|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3133 "psubw %%mm3, %%mm1 \n\t" // |H3-H4|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3134 "psrlw $1, %%mm0 \n\t" // |L3 - L4|/2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3135 "psrlw $1, %%mm1 \n\t" // |H3 - H4|/2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3136
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3137 "pxor %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3138 "pxor %%mm7, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3139 "pand %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3140 "pand %%mm3, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3141
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3142 #ifdef HAVE_MMX2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3143 "pminsw %%mm0, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3144 "pminsw %%mm1, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3145 #else
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3146 "movq %%mm4, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3147 "psubusw %%mm0, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3148 "psubw %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3149 "movq %%mm5, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3150 "psubusw %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3151 "psubw %%mm2, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3152 #endif
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3153 "pxor %%mm6, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3154 "pxor %%mm7, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3155 "psubw %%mm6, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3156 "psubw %%mm7, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3157 "packsswb %%mm5, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3158 "movq %3, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3159 "pandn %%mm4, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3160 "movq (%0), %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3161 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3162 "movq %%mm0, (%0) \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3163 "movq (%0, %1), %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3164 "psubb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3165 "movq %%mm0, (%0, %1) \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3166
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
3167 : "+r" (temp_src)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3168 : "r" ((long)step), "m" (c->pQPb), "m"(eq_mask)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3169 : "%"REG_a, "%"REG_c
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3170 );
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3171 }
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3172 /*if(step==16){
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3173 STOP_TIMER("step16")
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3174 }else{
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3175 STOP_TIMER("stepX")
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3176 }*/
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3177 }
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3178 #endif //HAVE_MMX
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3179
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3180 static void RENAME(postProcess)(uint8_t src[], int srcStride, uint8_t dst[], int dstStride, int width, int height,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3181 QP_STORE_T QPs[], int QPStride, int isColor, PPContext *c);
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3182
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3183 /**
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3184 * Copies a block from src to dst and fixes the blacklevel
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3185 * levelFix == 0 -> dont touch the brighness & contrast
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3186 */
634
be1cb0e1f276 warning fixes by Dominik Mierzejewski <dominik@rangers.eu.org>
arpi
parents: 600
diff changeset
3187 #undef SCALED_CPY
be1cb0e1f276 warning fixes by Dominik Mierzejewski <dominik@rangers.eu.org>
arpi
parents: 600
diff changeset
3188
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3189 static inline void RENAME(blockCopy)(uint8_t dst[], int dstStride, uint8_t src[], int srcStride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3190 int levelFix, int64_t *packedOffsetAndScale)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3191 {
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
3192 #ifndef HAVE_MMX
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3193 int i;
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
3194 #endif
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3195 if(levelFix)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3196 {
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3197 #ifdef HAVE_MMX
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3198 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3199 "movq (%%"REG_a"), %%mm2 \n\t" // packedYOffset
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3200 "movq 8(%%"REG_a"), %%mm3 \n\t" // packedYScale
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3201 "lea (%2,%4), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3202 "lea (%3,%5), %%"REG_d" \n\t"
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3203 "pxor %%mm4, %%mm4 \n\t"
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3204 #ifdef HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3205 #define REAL_SCALED_CPY(src1, src2, dst1, dst2) \
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3206 "movq " #src1 ", %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3207 "movq " #src1 ", %%mm5 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3208 "movq " #src2 ", %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3209 "movq " #src2 ", %%mm6 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3210 "punpcklbw %%mm0, %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3211 "punpckhbw %%mm5, %%mm5 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3212 "punpcklbw %%mm1, %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3213 "punpckhbw %%mm6, %%mm6 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3214 "pmulhuw %%mm3, %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3215 "pmulhuw %%mm3, %%mm5 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3216 "pmulhuw %%mm3, %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3217 "pmulhuw %%mm3, %%mm6 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3218 "psubw %%mm2, %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3219 "psubw %%mm2, %%mm5 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3220 "psubw %%mm2, %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3221 "psubw %%mm2, %%mm6 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3222 "packuswb %%mm5, %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3223 "packuswb %%mm6, %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3224 "movq %%mm0, " #dst1 " \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3225 "movq %%mm1, " #dst2 " \n\t"\
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3226
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3227 #else //HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3228 #define REAL_SCALED_CPY(src1, src2, dst1, dst2) \
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3229 "movq " #src1 ", %%mm0 \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3230 "movq " #src1 ", %%mm5 \n\t"\
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3231 "punpcklbw %%mm4, %%mm0 \n\t"\
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3232 "punpckhbw %%mm4, %%mm5 \n\t"\
117
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3233 "psubw %%mm2, %%mm0 \n\t"\
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3234 "psubw %%mm2, %%mm5 \n\t"\
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3235 "movq " #src2 ", %%mm1 \n\t"\
117
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3236 "psllw $6, %%mm0 \n\t"\
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3237 "psllw $6, %%mm5 \n\t"\
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3238 "pmulhw %%mm3, %%mm0 \n\t"\
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3239 "movq " #src2 ", %%mm6 \n\t"\
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3240 "pmulhw %%mm3, %%mm5 \n\t"\
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3241 "punpcklbw %%mm4, %%mm1 \n\t"\
118
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3242 "punpckhbw %%mm4, %%mm6 \n\t"\
117
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3243 "psubw %%mm2, %%mm1 \n\t"\
118
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3244 "psubw %%mm2, %%mm6 \n\t"\
117
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3245 "psllw $6, %%mm1 \n\t"\
118
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3246 "psllw $6, %%mm6 \n\t"\
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3247 "pmulhw %%mm3, %%mm1 \n\t"\
118
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3248 "pmulhw %%mm3, %%mm6 \n\t"\
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3249 "packuswb %%mm5, %%mm0 \n\t"\
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3250 "packuswb %%mm6, %%mm1 \n\t"\
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3251 "movq %%mm0, " #dst1 " \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3252 "movq %%mm1, " #dst2 " \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3253
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3254 #endif //!HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3255 #define SCALED_CPY(src1, src2, dst1, dst2)\
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3256 REAL_SCALED_CPY(src1, src2, dst1, dst2)
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3257
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3258 SCALED_CPY((%2) , (%2, %4) , (%3) , (%3, %5))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3259 SCALED_CPY((%2, %4, 2), (%%REGa, %4, 2), (%3, %5, 2), (%%REGd, %5, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3260 SCALED_CPY((%2, %4, 4), (%%REGa, %4, 4), (%3, %5, 4), (%%REGd, %5, 4))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3261 "lea (%%"REG_a",%4,4), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3262 "lea (%%"REG_d",%5,4), %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3263 SCALED_CPY((%%REGa, %4), (%%REGa, %4, 2), (%%REGd, %5), (%%REGd, %5, 2))
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3264
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3265
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3266 : "=&a" (packedOffsetAndScale)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3267 : "0" (packedOffsetAndScale),
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3268 "r"(src),
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3269 "r"(dst),
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3270 "r" ((long)srcStride),
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3271 "r" ((long)dstStride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3272 : "%"REG_d
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3273 );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3274 #else
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3275 for(i=0; i<8; i++)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3276 memcpy( &(dst[dstStride*i]),
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3277 &(src[srcStride*i]), BLOCK_SIZE);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3278 #endif
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3279 }
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3280 else
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3281 {
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3282 #ifdef HAVE_MMX
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3283 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3284 "lea (%0,%2), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3285 "lea (%1,%3), %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3286
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3287 #define REAL_SIMPLE_CPY(src1, src2, dst1, dst2) \
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3288 "movq " #src1 ", %%mm0 \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3289 "movq " #src2 ", %%mm1 \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3290 "movq %%mm0, " #dst1 " \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3291 "movq %%mm1, " #dst2 " \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3292
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3293 #define SIMPLE_CPY(src1, src2, dst1, dst2)\
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3294 REAL_SIMPLE_CPY(src1, src2, dst1, dst2)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3295
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3296 SIMPLE_CPY((%0) , (%0, %2) , (%1) , (%1, %3))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3297 SIMPLE_CPY((%0, %2, 2), (%%REGa, %2, 2), (%1, %3, 2), (%%REGd, %3, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3298 SIMPLE_CPY((%0, %2, 4), (%%REGa, %2, 4), (%1, %3, 4), (%%REGd, %3, 4))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3299 "lea (%%"REG_a",%2,4), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3300 "lea (%%"REG_d",%3,4), %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3301 SIMPLE_CPY((%%REGa, %2), (%%REGa, %2, 2), (%%REGd, %3), (%%REGd, %3, 2))
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3302
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3303 : : "r" (src),
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3304 "r" (dst),
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3305 "r" ((long)srcStride),
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3306 "r" ((long)dstStride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3307 : "%"REG_a, "%"REG_d
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3308 );
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3309 #else
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3310 for(i=0; i<8; i++)
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3311 memcpy( &(dst[dstStride*i]),
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3312 &(src[srcStride*i]), BLOCK_SIZE);
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3313 #endif
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3314 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3315 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3316
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3317 /**
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3318 * Duplicates the given 8 src pixels ? times upward
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3319 */
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3320 static inline void RENAME(duplicate)(uint8_t src[], int stride)
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3321 {
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3322 #ifdef HAVE_MMX
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3323 asm volatile(
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3324 "movq (%0), %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3325 "add %1, %0 \n\t"
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3326 "movq %%mm0, (%0) \n\t"
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3327 "movq %%mm0, (%0, %1) \n\t"
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3328 "movq %%mm0, (%0, %1, 2) \n\t"
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3329 : "+r" (src)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3330 : "r" ((long)-stride)
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3331 );
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3332 #else
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3333 int i;
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3334 uint8_t *p=src;
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3335 for(i=0; i<3; i++)
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3336 {
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3337 p-= stride;
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3338 memcpy(p, src, 8);
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3339 }
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3340 #endif
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3341 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3342
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3343 /**
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3344 * Filters array of bytes (Y or U or V values)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3345 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3346 static void RENAME(postProcess)(uint8_t src[], int srcStride, uint8_t dst[], int dstStride, int width, int height,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3347 QP_STORE_T QPs[], int QPStride, int isColor, PPContext *c2)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3348 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3349 PPContext __attribute__((aligned(8))) c= *c2; //copy to stack for faster access
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3350 int x,y;
172
a0efaf471d6b compiletime pp-mode support (luminance = chrominance filters though) 1-2% faster with -benchmark -vo null -nosound
michael
parents: 169
diff changeset
3351 #ifdef COMPILE_TIME_MODE
a0efaf471d6b compiletime pp-mode support (luminance = chrominance filters though) 1-2% faster with -benchmark -vo null -nosound
michael
parents: 169
diff changeset
3352 const int mode= COMPILE_TIME_MODE;
a0efaf471d6b compiletime pp-mode support (luminance = chrominance filters though) 1-2% faster with -benchmark -vo null -nosound
michael
parents: 169
diff changeset
3353 #else
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3354 const int mode= isColor ? c.ppMode.chromMode : c.ppMode.lumMode;
172
a0efaf471d6b compiletime pp-mode support (luminance = chrominance filters though) 1-2% faster with -benchmark -vo null -nosound
michael
parents: 169
diff changeset
3355 #endif
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3356 int black=0, white=255; // blackest black and whitest white in the picture
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3357 int QPCorrecture= 256*256;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3358
886
3abff5a87548 warning patch by (Dominik Mierzejewski <dominik at rangers dot eu dot org>)
michael
parents: 810
diff changeset
3359 int copyAhead;
3abff5a87548 warning patch by (Dominik Mierzejewski <dominik at rangers dot eu dot org>)
michael
parents: 810
diff changeset
3360 #ifdef HAVE_MMX
3abff5a87548 warning patch by (Dominik Mierzejewski <dominik at rangers dot eu dot org>)
michael
parents: 810
diff changeset
3361 int i;
3abff5a87548 warning patch by (Dominik Mierzejewski <dominik at rangers dot eu dot org>)
michael
parents: 810
diff changeset
3362 #endif
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3363
957
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3364 const int qpHShift= isColor ? 4-c.hChromaSubSample : 4;
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3365 const int qpVShift= isColor ? 4-c.vChromaSubSample : 4;
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3366
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3367 //FIXME remove
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3368 uint64_t * const yHistogram= c.yHistogram;
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3369 uint8_t * const tempSrc= srcStride > 0 ? c.tempSrc : c.tempSrc - 23*srcStride;
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3370 uint8_t * const tempDst= dstStride > 0 ? c.tempDst : c.tempDst - 23*dstStride;
2031
4225c131a2eb warning fixes by (Michael Roitzsch <mroi at users dot sourceforge dot net>)
michael
parents: 1724
diff changeset
3371 //const int mbWidth= isColor ? (width+7)>>3 : (width+15)>>4;
182
3ccd74a91074 minor brightness/contrast bugfix / moved some global vars into ppMode
michael
parents: 181
diff changeset
3372
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
3373 #ifdef HAVE_MMX
1724
ea5200a9f730 mpeg2 QP clamping fix
michael
parents: 1581
diff changeset
3374 for(i=0; i<57; i++){
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3375 int offset= ((i*c.ppMode.baseDcDiff)>>8) + 1;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3376 int threshold= offset*2 + 1;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3377 c.mmxDcOffset[i]= 0x7F - offset;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3378 c.mmxDcThreshold[i]= 0x7F - threshold;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3379 c.mmxDcOffset[i]*= 0x0101010101010101LL;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3380 c.mmxDcThreshold[i]*= 0x0101010101010101LL;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3381 }
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
3382 #endif
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3383
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3384 if(mode & CUBIC_IPOL_DEINT_FILTER) copyAhead=16;
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3385 else if( (mode & LINEAR_BLEND_DEINT_FILTER)
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3386 || (mode & FFMPEG_DEINT_FILTER)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3387 || (mode & LOWPASS5_DEINT_FILTER)) copyAhead=14;
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3388 else if( (mode & V_DEBLOCK)
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3389 || (mode & LINEAR_IPOL_DEINT_FILTER)
2037
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3390 || (mode & MEDIAN_DEINT_FILTER)
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3391 || (mode & V_A_DEBLOCK)) copyAhead=13;
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3392 else if(mode & V_X1_FILTER) copyAhead=11;
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3393 // else if(mode & V_RK1_FILTER) copyAhead=10;
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3394 else if(mode & DERING) copyAhead=9;
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3395 else copyAhead=8;
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3396
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3397 copyAhead-= 8;
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3398
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3399 if(!isColor)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3400 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3401 uint64_t sum= 0;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3402 int i;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3403 uint64_t maxClipped;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3404 uint64_t clipped;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3405 double scale;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3406
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3407 c.frameNum++;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3408 // first frame is fscked so we ignore it
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3409 if(c.frameNum == 1) yHistogram[0]= width*height/64*15/256;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3410
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3411 for(i=0; i<256; i++)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3412 {
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3413 sum+= yHistogram[i];
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3414 // printf("%d ", yHistogram[i]);
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3415 }
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3416 // printf("\n\n");
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3417
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3418 /* we allways get a completly black picture first */
793
8e9faf69110f cleanup
michael
parents: 791
diff changeset
3419 maxClipped= (uint64_t)(sum * c.ppMode.maxClippedThreshold);
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3420
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3421 clipped= sum;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3422 for(black=255; black>0; black--)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3423 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3424 if(clipped < maxClipped) break;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3425 clipped-= yHistogram[black];
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3426 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3427
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3428 clipped= sum;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3429 for(white=0; white<256; white++)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3430 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3431 if(clipped < maxClipped) break;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3432 clipped-= yHistogram[white];
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3433 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3434
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3435 scale= (double)(c.ppMode.maxAllowedY - c.ppMode.minAllowedY) / (double)(white-black);
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3436
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3437 #ifdef HAVE_MMX2
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3438 c.packedYScale= (uint16_t)(scale*256.0 + 0.5);
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3439 c.packedYOffset= (((black*c.packedYScale)>>8) - c.ppMode.minAllowedY) & 0xFFFF;
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3440 #else
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3441 c.packedYScale= (uint16_t)(scale*1024.0 + 0.5);
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3442 c.packedYOffset= (black - c.ppMode.minAllowedY) & 0xFFFF;
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3443 #endif
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3444
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3445 c.packedYOffset|= c.packedYOffset<<32;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3446 c.packedYOffset|= c.packedYOffset<<16;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3447
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3448 c.packedYScale|= c.packedYScale<<32;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3449 c.packedYScale|= c.packedYScale<<16;
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3450
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3451 if(mode & LEVEL_FIX) QPCorrecture= (int)(scale*256*256 + 0.5);
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3452 else QPCorrecture= 256*256;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3453 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3454 else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3455 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3456 c.packedYScale= 0x0100010001000100LL;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3457 c.packedYOffset= 0;
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3458 QPCorrecture= 256*256;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3459 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3460
148
1cfc4d567c0a minor changes (fixed some warnings, added attribute aligned(8) stuff)
michael
parents: 142
diff changeset
3461 /* copy & deinterlace first row of blocks */
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3462 y=-BLOCK_SIZE;
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3463 {
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3464 uint8_t *srcBlock= &(src[y*srcStride]);
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3465 uint8_t *dstBlock= tempDst + dstStride;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3466
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3467 // From this point on it is guranteed that we can read and write 16 lines downward
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3468 // finish 1 block before the next otherwise we´ll might have a problem
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3469 // with the L1 Cache of the P4 ... or only a few blocks at a time or soemthing
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3470 for(x=0; x<width; x+=BLOCK_SIZE)
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3471 {
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3472
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3473 #ifdef HAVE_MMX2
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3474 /*
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3475 prefetchnta(srcBlock + (((x>>2)&6) + 5)*srcStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3476 prefetchnta(srcBlock + (((x>>2)&6) + 6)*srcStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3477 prefetcht0(dstBlock + (((x>>2)&6) + 5)*dstStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3478 prefetcht0(dstBlock + (((x>>2)&6) + 6)*dstStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3479 */
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3480
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3481 asm(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3482 "mov %4, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3483 "shr $2, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3484 "and $6, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3485 "add %5, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3486 "mov %%"REG_a", %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3487 "imul %1, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3488 "imul %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3489 "prefetchnta 32(%%"REG_a", %0) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3490 "prefetcht0 32(%%"REG_d", %2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3491 "add %1, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3492 "add %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3493 "prefetchnta 32(%%"REG_a", %0) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3494 "prefetcht0 32(%%"REG_d", %2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3495 :: "r" (srcBlock), "r" ((long)srcStride), "r" (dstBlock), "r" ((long)dstStride),
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3496 "m" ((long)x), "m" ((long)copyAhead)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3497 : "%"REG_a, "%"REG_d
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3498 );
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3499
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3500 #elif defined(HAVE_3DNOW)
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3501 //FIXME check if this is faster on an 3dnow chip or if its faster without the prefetch or ...
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3502 /* prefetch(srcBlock + (((x>>3)&3) + 5)*srcStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3503 prefetch(srcBlock + (((x>>3)&3) + 9)*srcStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3504 prefetchw(dstBlock + (((x>>3)&3) + 5)*dstStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3505 prefetchw(dstBlock + (((x>>3)&3) + 9)*dstStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3506 */
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3507 #endif
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3508
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3509 RENAME(blockCopy)(dstBlock + dstStride*8, dstStride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3510 srcBlock + srcStride*8, srcStride, mode & LEVEL_FIX, &c.packedYOffset);
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3511
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3512 RENAME(duplicate)(dstBlock + dstStride*8, dstStride);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3513
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3514 if(mode & LINEAR_IPOL_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3515 RENAME(deInterlaceInterpolateLinear)(dstBlock, dstStride);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3516 else if(mode & LINEAR_BLEND_DEINT_FILTER)
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
3517 RENAME(deInterlaceBlendLinear)(dstBlock, dstStride, c.deintTemp + x);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3518 else if(mode & MEDIAN_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3519 RENAME(deInterlaceMedian)(dstBlock, dstStride);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3520 else if(mode & CUBIC_IPOL_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3521 RENAME(deInterlaceInterpolateCubic)(dstBlock, dstStride);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3522 else if(mode & FFMPEG_DEINT_FILTER)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3523 RENAME(deInterlaceFF)(dstBlock, dstStride, c.deintTemp + x);
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3524 else if(mode & LOWPASS5_DEINT_FILTER)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3525 RENAME(deInterlaceL5)(dstBlock, dstStride, c.deintTemp + x, c.deintTemp + width + x);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3526 /* else if(mode & CUBIC_BLEND_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3527 RENAME(deInterlaceBlendCubic)(dstBlock, dstStride);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3528 */
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3529 dstBlock+=8;
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3530 srcBlock+=8;
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3531 }
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3532 if(width==ABS(dstStride))
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3533 linecpy(dst, tempDst + 9*dstStride, copyAhead, dstStride);
941
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3534 else
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3535 {
943
0566d1a8426f 10l (int i)
michael
parents: 941
diff changeset
3536 int i;
941
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3537 for(i=0; i<copyAhead; i++)
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3538 {
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3539 memcpy(dst + i*dstStride, tempDst + (9+i)*dstStride, width);
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3540 }
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3541 }
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3542 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3543
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3544 //printf("\n");
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3545 for(y=0; y<height; y+=BLOCK_SIZE)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3546 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3547 //1% speedup if these are here instead of the inner loop
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3548 uint8_t *srcBlock= &(src[y*srcStride]);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3549 uint8_t *dstBlock= &(dst[y*dstStride]);
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3550 #ifdef HAVE_MMX
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3551 uint8_t *tempBlock1= c.tempBlocks;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3552 uint8_t *tempBlock2= c.tempBlocks + 8;
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3553 #endif
957
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3554 int8_t *QPptr= &QPs[(y>>qpVShift)*QPStride];
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3555 int8_t *nonBQPptr= &c.nonBQPTable[(y>>qpVShift)*ABS(QPStride)];
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3556 int QP=0;
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3557 /* can we mess with a 8x16 block from srcBlock/dstBlock downwards and 1 line upwards
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3558 if not than use a temporary buffer */
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3559 if(y+15 >= height)
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3560 {
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3561 int i;
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3562 /* copy from line (copyAhead) to (copyAhead+7) of src, these will be copied with
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3563 blockcopy to dst later */
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3564 linecpy(tempSrc + srcStride*copyAhead, srcBlock + srcStride*copyAhead,
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3565 MAX(height-y-copyAhead, 0), srcStride);
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3566
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3567 /* duplicate last line of src to fill the void upto line (copyAhead+7) */
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3568 for(i=MAX(height-y, 8); i<copyAhead+8; i++)
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3569 memcpy(tempSrc + srcStride*i, src + srcStride*(height-1), ABS(srcStride));
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3570
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3571 /* copy up to (copyAhead+1) lines of dst (line -1 to (copyAhead-1))*/
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3572 linecpy(tempDst, dstBlock - dstStride, MIN(height-y+1, copyAhead+1), dstStride);
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3573
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3574 /* duplicate last line of dst to fill the void upto line (copyAhead) */
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3575 for(i=height-y+1; i<=copyAhead; i++)
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3576 memcpy(tempDst + dstStride*i, dst + dstStride*(height-1), ABS(dstStride));
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3577
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3578 dstBlock= tempDst + dstStride;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3579 srcBlock= tempSrc;
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3580 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3581 //printf("\n");
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3582
112
a2c063b6ecf9 fixed a bug in the tmp buffer
michael
parents: 111
diff changeset
3583 // From this point on it is guranteed that we can read and write 16 lines downward
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3584 // finish 1 block before the next otherwise we´ll might have a problem
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3585 // with the L1 Cache of the P4 ... or only a few blocks at a time or soemthing
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3586 for(x=0; x<width; x+=BLOCK_SIZE)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3587 {
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3588 const int stride= dstStride;
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3589 #ifdef HAVE_MMX
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3590 uint8_t *tmpXchg;
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3591 #endif
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3592 if(isColor)
121
3ecf2a90c65e more speed
michael
parents: 120
diff changeset
3593 {
957
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3594 QP= QPptr[x>>qpHShift];
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3595 c.nonBQP= nonBQPptr[x>>qpHShift];
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3596 }
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3597 else
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3598 {
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3599 QP= QPptr[x>>4];
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3600 QP= (QP* QPCorrecture + 256*128)>>16;
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3601 c.nonBQP= nonBQPptr[x>>4];
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3602 c.nonBQP= (c.nonBQP* QPCorrecture + 256*128)>>16;
148
1cfc4d567c0a minor changes (fixed some warnings, added attribute aligned(8) stuff)
michael
parents: 142
diff changeset
3603 yHistogram[ srcBlock[srcStride*12 + 4] ]++;
121
3ecf2a90c65e more speed
michael
parents: 120
diff changeset
3604 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3605 c.QP= QP;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3606 #ifdef HAVE_MMX
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3607 asm volatile(
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3608 "movd %1, %%mm7 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3609 "packuswb %%mm7, %%mm7 \n\t" // 0, 0, 0, QP, 0, 0, 0, QP
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3610 "packuswb %%mm7, %%mm7 \n\t" // 0,QP, 0, QP, 0,QP, 0, QP
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3611 "packuswb %%mm7, %%mm7 \n\t" // QP,..., QP
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3612 "movq %%mm7, %0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3613 : "=m" (c.pQPb)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3614 : "r" (QP)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3615 );
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3616 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3617
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3618
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3619 #ifdef HAVE_MMX2
126
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3620 /*
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3621 prefetchnta(srcBlock + (((x>>2)&6) + 5)*srcStride + 32);
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3622 prefetchnta(srcBlock + (((x>>2)&6) + 6)*srcStride + 32);
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3623 prefetcht0(dstBlock + (((x>>2)&6) + 5)*dstStride + 32);
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3624 prefetcht0(dstBlock + (((x>>2)&6) + 6)*dstStride + 32);
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3625 */
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3626
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3627 asm(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3628 "mov %4, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3629 "shr $2, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3630 "and $6, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3631 "add %5, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3632 "mov %%"REG_a", %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3633 "imul %1, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3634 "imul %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3635 "prefetchnta 32(%%"REG_a", %0) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3636 "prefetcht0 32(%%"REG_d", %2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3637 "add %1, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3638 "add %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3639 "prefetchnta 32(%%"REG_a", %0) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3640 "prefetcht0 32(%%"REG_d", %2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3641 :: "r" (srcBlock), "r" ((long)srcStride), "r" (dstBlock), "r" ((long)dstStride),
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3642 "m" ((long)x), "m" ((long)copyAhead)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3643 : "%"REG_a, "%"REG_d
126
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3644 );
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3645
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3646 #elif defined(HAVE_3DNOW)
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3647 //FIXME check if this is faster on an 3dnow chip or if its faster without the prefetch or ...
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3648 /* prefetch(srcBlock + (((x>>3)&3) + 5)*srcStride + 32);
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3649 prefetch(srcBlock + (((x>>3)&3) + 9)*srcStride + 32);
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3650 prefetchw(dstBlock + (((x>>3)&3) + 5)*dstStride + 32);
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3651 prefetchw(dstBlock + (((x>>3)&3) + 9)*dstStride + 32);
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3652 */
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3653 #endif
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3654
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3655 RENAME(blockCopy)(dstBlock + dstStride*copyAhead, dstStride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3656 srcBlock + srcStride*copyAhead, srcStride, mode & LEVEL_FIX, &c.packedYOffset);
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3657
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3658 if(mode & LINEAR_IPOL_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3659 RENAME(deInterlaceInterpolateLinear)(dstBlock, dstStride);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3660 else if(mode & LINEAR_BLEND_DEINT_FILTER)
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
3661 RENAME(deInterlaceBlendLinear)(dstBlock, dstStride, c.deintTemp + x);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3662 else if(mode & MEDIAN_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3663 RENAME(deInterlaceMedian)(dstBlock, dstStride);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3664 else if(mode & CUBIC_IPOL_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3665 RENAME(deInterlaceInterpolateCubic)(dstBlock, dstStride);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3666 else if(mode & FFMPEG_DEINT_FILTER)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3667 RENAME(deInterlaceFF)(dstBlock, dstStride, c.deintTemp + x);
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3668 else if(mode & LOWPASS5_DEINT_FILTER)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3669 RENAME(deInterlaceL5)(dstBlock, dstStride, c.deintTemp + x, c.deintTemp + width + x);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3670 /* else if(mode & CUBIC_BLEND_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3671 RENAME(deInterlaceBlendCubic)(dstBlock, dstStride);
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
3672 */
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3673
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3674 /* only deblock if we have 2 blocks */
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3675 if(y + 8 < height)
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3676 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3677 if(mode & V_X1_FILTER)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3678 RENAME(vertX1Filter)(dstBlock, stride, &c);
115
4514b8e7f0f1 more logic behavior if the altenative deblock filters are used (turning a alt filter on without turning the deblock filter on uses the alt filter instead of using no filter now)
michael
parents: 113
diff changeset
3679 else if(mode & V_DEBLOCK)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3680 {
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3681 const int t= RENAME(vertClassify)(dstBlock, stride, &c);
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3682
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3683 if(t==1)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3684 RENAME(doVertLowPass)(dstBlock, stride, &c);
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3685 else if(t==2)
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3686 RENAME(doVertDefFilter)(dstBlock, stride, &c);
2037
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3687 }else if(mode & V_A_DEBLOCK){
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3688 RENAME(do_a_deblock)(dstBlock, stride, 1, &c);
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3689 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3690 }
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3691
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3692 #ifdef HAVE_MMX
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3693 RENAME(transpose1)(tempBlock1, tempBlock2, dstBlock, dstStride);
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3694 #endif
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3695 /* check if we have a previous block to deblock it with dstBlock */
112
a2c063b6ecf9 fixed a bug in the tmp buffer
michael
parents: 111
diff changeset
3696 if(x - 8 >= 0)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3697 {
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3698 #ifdef HAVE_MMX
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3699 if(mode & H_X1_FILTER)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3700 RENAME(vertX1Filter)(tempBlock1, 16, &c);
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3701 else if(mode & H_DEBLOCK)
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3702 {
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3703 //START_TIMER
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3704 const int t= RENAME(vertClassify)(tempBlock1, 16, &c);
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3705 //STOP_TIMER("dc & minmax")
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3706 if(t==1)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3707 RENAME(doVertLowPass)(tempBlock1, 16, &c);
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3708 else if(t==2)
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3709 RENAME(doVertDefFilter)(tempBlock1, 16, &c);
2037
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3710 }else if(mode & H_A_DEBLOCK){
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3711 RENAME(do_a_deblock)(tempBlock1, 16, 1, &c);
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3712 }
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3713
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3714 RENAME(transpose2)(dstBlock-4, dstStride, tempBlock1 + 4*16);
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3715
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3716 #else
115
4514b8e7f0f1 more logic behavior if the altenative deblock filters are used (turning a alt filter on without turning the deblock filter on uses the alt filter instead of using no filter now)
michael
parents: 113
diff changeset
3717 if(mode & H_X1_FILTER)
4514b8e7f0f1 more logic behavior if the altenative deblock filters are used (turning a alt filter on without turning the deblock filter on uses the alt filter instead of using no filter now)
michael
parents: 113
diff changeset
3718 horizX1Filter(dstBlock-4, stride, QP);
4514b8e7f0f1 more logic behavior if the altenative deblock filters are used (turning a alt filter on without turning the deblock filter on uses the alt filter instead of using no filter now)
michael
parents: 113
diff changeset
3719 else if(mode & H_DEBLOCK)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3720 {
2043
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3721 #ifdef HAVE_ALTIVEC
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3722 unsigned char __attribute__ ((aligned(16))) tempBlock[272];
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3723 transpose_16x8_char_toPackedAlign_altivec(tempBlock, dstBlock - (4 + 1), stride);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3724
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3725 const int t=vertClassify_altivec(tempBlock-48, 16, &c);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3726 if(t==1) {
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3727 doVertLowPass_altivec(tempBlock-48, 16, &c);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3728 transpose_8x16_char_fromPackedAlign_altivec(dstBlock - (4 + 1), tempBlock, stride);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3729 }
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3730 else if(t==2) {
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3731 doVertDefFilter_altivec(tempBlock-48, 16, &c);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3732 transpose_8x16_char_fromPackedAlign_altivec(dstBlock - (4 + 1), tempBlock, stride);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3733 }
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3734 #else
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3735 const int t= RENAME(horizClassify)(dstBlock-4, stride, &c);
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3736
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3737 if(t==1)
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3738 RENAME(doHorizLowPass)(dstBlock-4, stride, &c);
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3739 else if(t==2)
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3740 RENAME(doHorizDefFilter)(dstBlock-4, stride, &c);
2043
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3741 #endif
2037
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3742 }else if(mode & H_A_DEBLOCK){
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3743 RENAME(do_a_deblock)(dstBlock-8, 1, stride, &c);
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3744 }
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3745 #endif
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3746 if(mode & DERING)
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3747 {
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3748 //FIXME filter first line
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3749 if(y>0) RENAME(dering)(dstBlock - stride - 8, stride, &c);
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3750 }
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3751
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3752 if(mode & TEMP_NOISE_FILTER)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3753 {
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3754 RENAME(tempNoiseReducer)(dstBlock-8, stride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3755 c.tempBlured[isColor] + y*dstStride + x,
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3756 c.tempBluredPast[isColor] + (y>>3)*256 + (x>>3),
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3757 c.ppMode.maxTmpNoise);
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3758 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3759 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3760
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3761 dstBlock+=8;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3762 srcBlock+=8;
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3763
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
3764 #ifdef HAVE_MMX
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3765 tmpXchg= tempBlock1;
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3766 tempBlock1= tempBlock2;
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3767 tempBlock2 = tmpXchg;
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
3768 #endif
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3769 }
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3770
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3771 if(mode & DERING)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3772 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3773 if(y > 0) RENAME(dering)(dstBlock - dstStride - 8, dstStride, &c);
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3774 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3775
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3776 if((mode & TEMP_NOISE_FILTER))
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3777 {
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3778 RENAME(tempNoiseReducer)(dstBlock-8, dstStride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3779 c.tempBlured[isColor] + y*dstStride + x,
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3780 c.tempBluredPast[isColor] + (y>>3)*256 + (x>>3),
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3781 c.ppMode.maxTmpNoise);
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3782 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3783
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3784 /* did we use a tmp buffer for the last lines*/
112
a2c063b6ecf9 fixed a bug in the tmp buffer
michael
parents: 111
diff changeset
3785 if(y+15 >= height)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3786 {
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3787 uint8_t *dstBlock= &(dst[y*dstStride]);
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3788 if(width==ABS(dstStride))
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3789 linecpy(dstBlock, tempDst + dstStride, height-y, dstStride);
941
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3790 else
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3791 {
944
927c246f1f6d 10l another int i missing (without ^M)
faust3
parents: 943
diff changeset
3792 int i;
941
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3793 for(i=0; i<height-y; i++)
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3794 {
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3795 memcpy(dstBlock + i*dstStride, tempDst + (i+1)*dstStride, width);
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3796 }
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3797 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3798 }
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3799 /*
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3800 for(x=0; x<width; x+=32)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3801 {
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3802 volatile int i;
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3803 i+= + dstBlock[x + 7*dstStride] + dstBlock[x + 8*dstStride]
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3804 + dstBlock[x + 9*dstStride] + dstBlock[x +10*dstStride]
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3805 + dstBlock[x +11*dstStride] + dstBlock[x +12*dstStride];
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3806 // + dstBlock[x +13*dstStride]
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3807 // + dstBlock[x +14*dstStride] + dstBlock[x +15*dstStride];
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3808 }*/
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3809 }
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3810 #ifdef HAVE_3DNOW
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3811 asm volatile("femms");
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3812 #elif defined (HAVE_MMX)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3813 asm volatile("emms");
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3814 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3815
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3816 #ifdef DEBUG_BRIGHTNESS
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3817 if(!isColor)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3818 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3819 int max=1;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3820 int i;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3821 for(i=0; i<256; i++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3822 if(yHistogram[i] > max) max=yHistogram[i];
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3823
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3824 for(i=1; i<256; i++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3825 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3826 int x;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3827 int start=yHistogram[i-1]/(max/256+1);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3828 int end=yHistogram[i]/(max/256+1);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3829 int inc= end > start ? 1 : -1;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3830 for(x=start; x!=end+inc; x+=inc)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3831 dst[ i*dstStride + x]+=128;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3832 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3833
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3834 for(i=0; i<100; i+=2)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3835 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3836 dst[ (white)*dstStride + i]+=128;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3837 dst[ (black)*dstStride + i]+=128;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3838 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3839
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3840 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3841 #endif
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3842
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3843 *c2= c; //copy local context back
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3844
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3845 }