Mercurial > libavcodec.hg
annotate ppc/idct_altivec.c @ 11032:01bd040f8607 libavcodec
Unroll main loop so the edge==0 case is seperate.
This allows many things to be simplified away.
h264 decoder is overall 1% faster with a mbaff sample and
0.1% slower with the cathedral sample, probably because the slow loop
filter code must be loaded into the code cache for each first MB of each
row but isnt used for the following MBs.
author | michael |
---|---|
date | Thu, 28 Jan 2010 01:24:25 +0000 |
parents | dd2b5e52336a |
children | 50415a8f1451 |
rev | line source |
---|---|
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
1 /* |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
2 * Copyright (c) 2001 Michel Lespinasse |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
3 * |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
4 * This file is part of FFmpeg. |
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
5 * |
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
6 * FFmpeg is free software; you can redistribute it and/or |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
7 * modify it under the terms of the GNU Lesser General Public |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
8 * License as published by the Free Software Foundation; either |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
9 * version 2.1 of the License, or (at your option) any later version. |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
10 * |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
11 * FFmpeg is distributed in the hope that it will be useful, |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
12 * but WITHOUT ANY WARRANTY; without even the implied warranty of |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
13 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
14 * Lesser General Public License for more details. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
15 * |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
16 * You should have received a copy of the GNU Lesser General Public |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
17 * License along with FFmpeg; if not, write to the Free Software |
3036
0b546eab515d
Update licensing information: The FSF changed postal address.
diego
parents:
2979
diff
changeset
|
18 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
19 */ |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
20 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
21 /* |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
22 * NOTE: This code is based on GPL code from the libmpeg2 project. The |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
23 * author, Michel Lespinasses, has given explicit permission to release |
9363
203dc1dc297c
cosmetics: Reformat comment paragraph and fix a few typos in it.
diego
parents:
8590
diff
changeset
|
24 * under LGPL as part of FFmpeg. |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
25 */ |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
26 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
27 /* |
9363
203dc1dc297c
cosmetics: Reformat comment paragraph and fix a few typos in it.
diego
parents:
8590
diff
changeset
|
28 * FFmpeg integration by Dieter Shirley |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
29 * |
9363
203dc1dc297c
cosmetics: Reformat comment paragraph and fix a few typos in it.
diego
parents:
8590
diff
changeset
|
30 * This file is a direct copy of the AltiVec IDCT module from the libmpeg2 |
203dc1dc297c
cosmetics: Reformat comment paragraph and fix a few typos in it.
diego
parents:
8590
diff
changeset
|
31 * project. I've deleted all of the libmpeg2-specific code, renamed the |
203dc1dc297c
cosmetics: Reformat comment paragraph and fix a few typos in it.
diego
parents:
8590
diff
changeset
|
32 * functions and reordered the function parameters. The only change to the |
203dc1dc297c
cosmetics: Reformat comment paragraph and fix a few typos in it.
diego
parents:
8590
diff
changeset
|
33 * IDCT function itself was to factor out the partial transposition, and to |
203dc1dc297c
cosmetics: Reformat comment paragraph and fix a few typos in it.
diego
parents:
8590
diff
changeset
|
34 * perform a full transpose at the end of the function. |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
35 */ |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
36 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
37 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
38 #include <stdlib.h> /* malloc(), free() */ |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
39 #include <string.h> |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
9363
diff
changeset
|
40 #include "config.h" |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
9363
diff
changeset
|
41 #if HAVE_ALTIVEC_H |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
9363
diff
changeset
|
42 #include <altivec.h> |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
9363
diff
changeset
|
43 #endif |
6763 | 44 #include "libavcodec/dsputil.h" |
8494 | 45 #include "types_altivec.h" |
6105
33674fb857b5
Change some files to only include the necessary headers.
diego
parents:
5746
diff
changeset
|
46 #include "dsputil_ppc.h" |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
47 |
2979 | 48 #define IDCT_HALF \ |
49 /* 1st stage */ \ | |
50 t1 = vec_mradds (a1, vx7, vx1 ); \ | |
51 t8 = vec_mradds (a1, vx1, vec_subs (zero, vx7)); \ | |
52 t7 = vec_mradds (a2, vx5, vx3); \ | |
53 t3 = vec_mradds (ma2, vx3, vx5); \ | |
54 \ | |
55 /* 2nd stage */ \ | |
56 t5 = vec_adds (vx0, vx4); \ | |
57 t0 = vec_subs (vx0, vx4); \ | |
58 t2 = vec_mradds (a0, vx6, vx2); \ | |
59 t4 = vec_mradds (a0, vx2, vec_subs (zero, vx6)); \ | |
60 t6 = vec_adds (t8, t3); \ | |
61 t3 = vec_subs (t8, t3); \ | |
62 t8 = vec_subs (t1, t7); \ | |
63 t1 = vec_adds (t1, t7); \ | |
64 \ | |
65 /* 3rd stage */ \ | |
66 t7 = vec_adds (t5, t2); \ | |
67 t2 = vec_subs (t5, t2); \ | |
68 t5 = vec_adds (t0, t4); \ | |
69 t0 = vec_subs (t0, t4); \ | |
70 t4 = vec_subs (t8, t3); \ | |
71 t3 = vec_adds (t8, t3); \ | |
72 \ | |
73 /* 4th stage */ \ | |
74 vy0 = vec_adds (t7, t1); \ | |
75 vy7 = vec_subs (t7, t1); \ | |
76 vy1 = vec_mradds (c4, t3, t5); \ | |
77 vy6 = vec_mradds (mc4, t3, t5); \ | |
78 vy2 = vec_mradds (c4, t4, t0); \ | |
79 vy5 = vec_mradds (mc4, t4, t0); \ | |
80 vy3 = vec_adds (t2, t6); \ | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
81 vy4 = vec_subs (t2, t6); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
82 |
2967 | 83 |
2979 | 84 #define IDCT \ |
8494 | 85 vec_s16 vx0, vx1, vx2, vx3, vx4, vx5, vx6, vx7; \ |
86 vec_s16 vy0, vy1, vy2, vy3, vy4, vy5, vy6, vy7; \ | |
87 vec_s16 a0, a1, a2, ma2, c4, mc4, zero, bias; \ | |
88 vec_s16 t0, t1, t2, t3, t4, t5, t6, t7, t8; \ | |
89 vec_u16 shift; \ | |
2979 | 90 \ |
91 c4 = vec_splat (constants[0], 0); \ | |
92 a0 = vec_splat (constants[0], 1); \ | |
93 a1 = vec_splat (constants[0], 2); \ | |
94 a2 = vec_splat (constants[0], 3); \ | |
95 mc4 = vec_splat (constants[0], 4); \ | |
96 ma2 = vec_splat (constants[0], 5); \ | |
8494 | 97 bias = (vec_s16)vec_splat ((vec_s32)constants[0], 3); \ |
2979 | 98 \ |
99 zero = vec_splat_s16 (0); \ | |
100 shift = vec_splat_u16 (4); \ | |
101 \ | |
102 vx0 = vec_mradds (vec_sl (block[0], shift), constants[1], zero); \ | |
103 vx1 = vec_mradds (vec_sl (block[1], shift), constants[2], zero); \ | |
104 vx2 = vec_mradds (vec_sl (block[2], shift), constants[3], zero); \ | |
105 vx3 = vec_mradds (vec_sl (block[3], shift), constants[4], zero); \ | |
106 vx4 = vec_mradds (vec_sl (block[4], shift), constants[1], zero); \ | |
107 vx5 = vec_mradds (vec_sl (block[5], shift), constants[4], zero); \ | |
108 vx6 = vec_mradds (vec_sl (block[6], shift), constants[3], zero); \ | |
109 vx7 = vec_mradds (vec_sl (block[7], shift), constants[2], zero); \ | |
110 \ | |
111 IDCT_HALF \ | |
112 \ | |
113 vx0 = vec_mergeh (vy0, vy4); \ | |
114 vx1 = vec_mergel (vy0, vy4); \ | |
115 vx2 = vec_mergeh (vy1, vy5); \ | |
116 vx3 = vec_mergel (vy1, vy5); \ | |
117 vx4 = vec_mergeh (vy2, vy6); \ | |
118 vx5 = vec_mergel (vy2, vy6); \ | |
119 vx6 = vec_mergeh (vy3, vy7); \ | |
120 vx7 = vec_mergel (vy3, vy7); \ | |
121 \ | |
122 vy0 = vec_mergeh (vx0, vx4); \ | |
123 vy1 = vec_mergel (vx0, vx4); \ | |
124 vy2 = vec_mergeh (vx1, vx5); \ | |
125 vy3 = vec_mergel (vx1, vx5); \ | |
126 vy4 = vec_mergeh (vx2, vx6); \ | |
127 vy5 = vec_mergel (vx2, vx6); \ | |
128 vy6 = vec_mergeh (vx3, vx7); \ | |
129 vy7 = vec_mergel (vx3, vx7); \ | |
130 \ | |
131 vx0 = vec_adds (vec_mergeh (vy0, vy4), bias); \ | |
132 vx1 = vec_mergel (vy0, vy4); \ | |
133 vx2 = vec_mergeh (vy1, vy5); \ | |
134 vx3 = vec_mergel (vy1, vy5); \ | |
135 vx4 = vec_mergeh (vy2, vy6); \ | |
136 vx5 = vec_mergel (vy2, vy6); \ | |
137 vx6 = vec_mergeh (vy3, vy7); \ | |
138 vx7 = vec_mergel (vy3, vy7); \ | |
139 \ | |
140 IDCT_HALF \ | |
141 \ | |
142 shift = vec_splat_u16 (6); \ | |
143 vx0 = vec_sra (vy0, shift); \ | |
144 vx1 = vec_sra (vy1, shift); \ | |
145 vx2 = vec_sra (vy2, shift); \ | |
146 vx3 = vec_sra (vy3, shift); \ | |
147 vx4 = vec_sra (vy4, shift); \ | |
148 vx5 = vec_sra (vy5, shift); \ | |
149 vx6 = vec_sra (vy6, shift); \ | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
150 vx7 = vec_sra (vy7, shift); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
151 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1015
diff
changeset
|
152 |
8494 | 153 static const vec_s16 constants[5] = { |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7333
diff
changeset
|
154 {23170, 13573, 6518, 21895, -23170, -21895, 32, 31}, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7333
diff
changeset
|
155 {16384, 22725, 21407, 19266, 16384, 19266, 21407, 22725}, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7333
diff
changeset
|
156 {22725, 31521, 29692, 26722, 22725, 26722, 29692, 31521}, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7333
diff
changeset
|
157 {21407, 29692, 27969, 25172, 21407, 25172, 27969, 29692}, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7333
diff
changeset
|
158 {19266, 26722, 25172, 22654, 19266, 22654, 25172, 26722} |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
159 }; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
160 |
8494 | 161 void idct_put_altivec(uint8_t* dest, int stride, vec_s16* block) |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
162 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
163 POWERPC_PERF_DECLARE(altivec_idct_put_num, 1); |
8494 | 164 vec_u8 tmp; |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
165 |
8590 | 166 #if CONFIG_POWERPC_PERF |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
167 POWERPC_PERF_START_COUNT(altivec_idct_put_num, 1); |
1839
b370288f004d
Metrowerks CodeWarrior patches by (John Dalgliesh <johnd at defyne dot org>)
michael
parents:
1352
diff
changeset
|
168 #endif |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
169 IDCT |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
170 |
2979 | 171 #define COPY(dest,src) \ |
172 tmp = vec_packsu (src, src); \ | |
8494 | 173 vec_ste ((vec_u32)tmp, 0, (unsigned int *)dest); \ |
174 vec_ste ((vec_u32)tmp, 4, (unsigned int *)dest); | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
175 |
2979 | 176 COPY (dest, vx0) dest += stride; |
177 COPY (dest, vx1) dest += stride; | |
178 COPY (dest, vx2) dest += stride; | |
179 COPY (dest, vx3) dest += stride; | |
180 COPY (dest, vx4) dest += stride; | |
181 COPY (dest, vx5) dest += stride; | |
182 COPY (dest, vx6) dest += stride; | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
183 COPY (dest, vx7) |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
828
diff
changeset
|
184 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
185 POWERPC_PERF_STOP_COUNT(altivec_idct_put_num, 1); |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
186 } |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
187 |
8494 | 188 void idct_add_altivec(uint8_t* dest, int stride, vec_s16* block) |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
189 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
190 POWERPC_PERF_DECLARE(altivec_idct_add_num, 1); |
8494 | 191 vec_u8 tmp; |
192 vec_s16 tmp2, tmp3; | |
193 vec_u8 perm0; | |
194 vec_u8 perm1; | |
195 vec_u8 p0, p1, p; | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
196 |
8590 | 197 #if CONFIG_POWERPC_PERF |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
198 POWERPC_PERF_START_COUNT(altivec_idct_add_num, 1); |
1839
b370288f004d
Metrowerks CodeWarrior patches by (John Dalgliesh <johnd at defyne dot org>)
michael
parents:
1352
diff
changeset
|
199 #endif |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
828
diff
changeset
|
200 |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
201 IDCT |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
202 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
203 p0 = vec_lvsl (0, dest); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
204 p1 = vec_lvsl (stride, dest); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
205 p = vec_splat_u8 (-1); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
206 perm0 = vec_mergeh (p, p0); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
207 perm1 = vec_mergeh (p, p1); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
208 |
2979 | 209 #define ADD(dest,src,perm) \ |
210 /* *(uint64_t *)&tmp = *(uint64_t *)dest; */ \ | |
211 tmp = vec_ld (0, dest); \ | |
8494 | 212 tmp2 = (vec_s16)vec_perm (tmp, (vec_u8)zero, perm); \ |
2979 | 213 tmp3 = vec_adds (tmp2, src); \ |
214 tmp = vec_packsu (tmp3, tmp3); \ | |
8494 | 215 vec_ste ((vec_u32)tmp, 0, (unsigned int *)dest); \ |
216 vec_ste ((vec_u32)tmp, 4, (unsigned int *)dest); | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
217 |
2979 | 218 ADD (dest, vx0, perm0) dest += stride; |
219 ADD (dest, vx1, perm1) dest += stride; | |
220 ADD (dest, vx2, perm0) dest += stride; | |
221 ADD (dest, vx3, perm1) dest += stride; | |
222 ADD (dest, vx4, perm0) dest += stride; | |
223 ADD (dest, vx5, perm1) dest += stride; | |
224 ADD (dest, vx6, perm0) dest += stride; | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
225 ADD (dest, vx7, perm1) |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
828
diff
changeset
|
226 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
227 POWERPC_PERF_STOP_COUNT(altivec_idct_add_num, 1); |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
228 } |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
229 |