Mercurial > libavcodec.hg
annotate ppc/idct_altivec.c @ 6323:e6da66f378c7 libavcodec
mpegvideo.h has two function declarations with the 'inline' specifier
but no definition for those functions. The C standard requires a
definition to appear in the same translation unit for any function
declared with 'inline'. Most of the files including mpegvideo.h do not
define those functions. Fix this by removing the 'inline' specifiers
from the header.
patch by Uoti Urpala
author | diego |
---|---|
date | Sun, 03 Feb 2008 17:54:30 +0000 |
parents | 33674fb857b5 |
children | f7cbb7733146 |
rev | line source |
---|---|
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
1 /* |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
2 * Copyright (c) 2001 Michel Lespinasse |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
3 * |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
4 * This file is part of FFmpeg. |
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
5 * |
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
6 * FFmpeg is free software; you can redistribute it and/or |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
7 * modify it under the terms of the GNU Lesser General Public |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
8 * License as published by the Free Software Foundation; either |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
9 * version 2.1 of the License, or (at your option) any later version. |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
10 * |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
11 * FFmpeg is distributed in the hope that it will be useful, |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
12 * but WITHOUT ANY WARRANTY; without even the implied warranty of |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
13 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
14 * Lesser General Public License for more details. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
15 * |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
16 * You should have received a copy of the GNU Lesser General Public |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3036
diff
changeset
|
17 * License along with FFmpeg; if not, write to the Free Software |
3036
0b546eab515d
Update licensing information: The FSF changed postal address.
diego
parents:
2979
diff
changeset
|
18 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
19 */ |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
20 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
21 /* |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
22 * NOTE: This code is based on GPL code from the libmpeg2 project. The |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
23 * author, Michel Lespinasses, has given explicit permission to release |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
24 * under LGPL as part of ffmpeg. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
25 * |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
26 */ |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
27 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
28 /* |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
29 * FFMpeg integration by Dieter Shirley |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
30 * |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
31 * This file is a direct copy of the altivec idct module from the libmpeg2 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
32 * project. I've deleted all of the libmpeg2 specific code, renamed the functions and |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
33 * re-ordered the function parameters. The only change to the IDCT function |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
34 * itself was to factor out the partial transposition, and to perform a full |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
35 * transpose at the end of the function. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
36 */ |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
37 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
38 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
39 #include <stdlib.h> /* malloc(), free() */ |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
40 #include <string.h> |
5010
d5ba514e3f4a
Add libavcodec to compiler include flags in order to simplify header
diego
parents:
4521
diff
changeset
|
41 #include "dsputil.h" |
1277
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
42 |
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
43 #include "gcc_fixes.h" |
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
44 |
6105
33674fb857b5
Change some files to only include the necessary headers.
diego
parents:
5746
diff
changeset
|
45 #include "dsputil_ppc.h" |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
46 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
47 #define vector_s16_t vector signed short |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5215
diff
changeset
|
48 #define const_vector_s16_t const vector signed short |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
49 #define vector_u16_t vector unsigned short |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
50 #define vector_s8_t vector signed char |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
51 #define vector_u8_t vector unsigned char |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
52 #define vector_s32_t vector signed int |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
53 #define vector_u32_t vector unsigned int |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
54 |
2979 | 55 #define IDCT_HALF \ |
56 /* 1st stage */ \ | |
57 t1 = vec_mradds (a1, vx7, vx1 ); \ | |
58 t8 = vec_mradds (a1, vx1, vec_subs (zero, vx7)); \ | |
59 t7 = vec_mradds (a2, vx5, vx3); \ | |
60 t3 = vec_mradds (ma2, vx3, vx5); \ | |
61 \ | |
62 /* 2nd stage */ \ | |
63 t5 = vec_adds (vx0, vx4); \ | |
64 t0 = vec_subs (vx0, vx4); \ | |
65 t2 = vec_mradds (a0, vx6, vx2); \ | |
66 t4 = vec_mradds (a0, vx2, vec_subs (zero, vx6)); \ | |
67 t6 = vec_adds (t8, t3); \ | |
68 t3 = vec_subs (t8, t3); \ | |
69 t8 = vec_subs (t1, t7); \ | |
70 t1 = vec_adds (t1, t7); \ | |
71 \ | |
72 /* 3rd stage */ \ | |
73 t7 = vec_adds (t5, t2); \ | |
74 t2 = vec_subs (t5, t2); \ | |
75 t5 = vec_adds (t0, t4); \ | |
76 t0 = vec_subs (t0, t4); \ | |
77 t4 = vec_subs (t8, t3); \ | |
78 t3 = vec_adds (t8, t3); \ | |
79 \ | |
80 /* 4th stage */ \ | |
81 vy0 = vec_adds (t7, t1); \ | |
82 vy7 = vec_subs (t7, t1); \ | |
83 vy1 = vec_mradds (c4, t3, t5); \ | |
84 vy6 = vec_mradds (mc4, t3, t5); \ | |
85 vy2 = vec_mradds (c4, t4, t0); \ | |
86 vy5 = vec_mradds (mc4, t4, t0); \ | |
87 vy3 = vec_adds (t2, t6); \ | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
88 vy4 = vec_subs (t2, t6); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
89 |
2967 | 90 |
2979 | 91 #define IDCT \ |
92 vector_s16_t vx0, vx1, vx2, vx3, vx4, vx5, vx6, vx7; \ | |
93 vector_s16_t vy0, vy1, vy2, vy3, vy4, vy5, vy6, vy7; \ | |
94 vector_s16_t a0, a1, a2, ma2, c4, mc4, zero, bias; \ | |
95 vector_s16_t t0, t1, t2, t3, t4, t5, t6, t7, t8; \ | |
96 vector_u16_t shift; \ | |
97 \ | |
98 c4 = vec_splat (constants[0], 0); \ | |
99 a0 = vec_splat (constants[0], 1); \ | |
100 a1 = vec_splat (constants[0], 2); \ | |
101 a2 = vec_splat (constants[0], 3); \ | |
102 mc4 = vec_splat (constants[0], 4); \ | |
103 ma2 = vec_splat (constants[0], 5); \ | |
104 bias = (vector_s16_t)vec_splat ((vector_s32_t)constants[0], 3); \ | |
105 \ | |
106 zero = vec_splat_s16 (0); \ | |
107 shift = vec_splat_u16 (4); \ | |
108 \ | |
109 vx0 = vec_mradds (vec_sl (block[0], shift), constants[1], zero); \ | |
110 vx1 = vec_mradds (vec_sl (block[1], shift), constants[2], zero); \ | |
111 vx2 = vec_mradds (vec_sl (block[2], shift), constants[3], zero); \ | |
112 vx3 = vec_mradds (vec_sl (block[3], shift), constants[4], zero); \ | |
113 vx4 = vec_mradds (vec_sl (block[4], shift), constants[1], zero); \ | |
114 vx5 = vec_mradds (vec_sl (block[5], shift), constants[4], zero); \ | |
115 vx6 = vec_mradds (vec_sl (block[6], shift), constants[3], zero); \ | |
116 vx7 = vec_mradds (vec_sl (block[7], shift), constants[2], zero); \ | |
117 \ | |
118 IDCT_HALF \ | |
119 \ | |
120 vx0 = vec_mergeh (vy0, vy4); \ | |
121 vx1 = vec_mergel (vy0, vy4); \ | |
122 vx2 = vec_mergeh (vy1, vy5); \ | |
123 vx3 = vec_mergel (vy1, vy5); \ | |
124 vx4 = vec_mergeh (vy2, vy6); \ | |
125 vx5 = vec_mergel (vy2, vy6); \ | |
126 vx6 = vec_mergeh (vy3, vy7); \ | |
127 vx7 = vec_mergel (vy3, vy7); \ | |
128 \ | |
129 vy0 = vec_mergeh (vx0, vx4); \ | |
130 vy1 = vec_mergel (vx0, vx4); \ | |
131 vy2 = vec_mergeh (vx1, vx5); \ | |
132 vy3 = vec_mergel (vx1, vx5); \ | |
133 vy4 = vec_mergeh (vx2, vx6); \ | |
134 vy5 = vec_mergel (vx2, vx6); \ | |
135 vy6 = vec_mergeh (vx3, vx7); \ | |
136 vy7 = vec_mergel (vx3, vx7); \ | |
137 \ | |
138 vx0 = vec_adds (vec_mergeh (vy0, vy4), bias); \ | |
139 vx1 = vec_mergel (vy0, vy4); \ | |
140 vx2 = vec_mergeh (vy1, vy5); \ | |
141 vx3 = vec_mergel (vy1, vy5); \ | |
142 vx4 = vec_mergeh (vy2, vy6); \ | |
143 vx5 = vec_mergel (vy2, vy6); \ | |
144 vx6 = vec_mergeh (vy3, vy7); \ | |
145 vx7 = vec_mergel (vy3, vy7); \ | |
146 \ | |
147 IDCT_HALF \ | |
148 \ | |
149 shift = vec_splat_u16 (6); \ | |
150 vx0 = vec_sra (vy0, shift); \ | |
151 vx1 = vec_sra (vy1, shift); \ | |
152 vx2 = vec_sra (vy2, shift); \ | |
153 vx3 = vec_sra (vy3, shift); \ | |
154 vx4 = vec_sra (vy4, shift); \ | |
155 vx5 = vec_sra (vy5, shift); \ | |
156 vx6 = vec_sra (vy6, shift); \ | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
157 vx7 = vec_sra (vy7, shift); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
158 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1015
diff
changeset
|
159 |
1839
b370288f004d
Metrowerks CodeWarrior patches by (John Dalgliesh <johnd at defyne dot org>)
michael
parents:
1352
diff
changeset
|
160 static const_vector_s16_t constants[5] = { |
1277
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
161 (vector_s16_t) AVV(23170, 13573, 6518, 21895, -23170, -21895, 32, 31), |
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
162 (vector_s16_t) AVV(16384, 22725, 21407, 19266, 16384, 19266, 21407, 22725), |
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
163 (vector_s16_t) AVV(22725, 31521, 29692, 26722, 22725, 26722, 29692, 31521), |
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
164 (vector_s16_t) AVV(21407, 29692, 27969, 25172, 21407, 25172, 27969, 29692), |
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
165 (vector_s16_t) AVV(19266, 26722, 25172, 22654, 19266, 22654, 25172, 26722) |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
166 }; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
167 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
168 void idct_put_altivec(uint8_t* dest, int stride, vector_s16_t* block) |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
169 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
170 POWERPC_PERF_DECLARE(altivec_idct_put_num, 1); |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
171 vector_u8_t tmp; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
172 |
4521
891590781d9e
rename POWERPC_PERFORMANCE_REPORT to CONFIG_POWERPC_PERF
mru
parents:
3973
diff
changeset
|
173 #ifdef CONFIG_POWERPC_PERF |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
174 POWERPC_PERF_START_COUNT(altivec_idct_put_num, 1); |
1839
b370288f004d
Metrowerks CodeWarrior patches by (John Dalgliesh <johnd at defyne dot org>)
michael
parents:
1352
diff
changeset
|
175 #endif |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
176 IDCT |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
177 |
2979 | 178 #define COPY(dest,src) \ |
179 tmp = vec_packsu (src, src); \ | |
180 vec_ste ((vector_u32_t)tmp, 0, (unsigned int *)dest); \ | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
181 vec_ste ((vector_u32_t)tmp, 4, (unsigned int *)dest); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
182 |
2979 | 183 COPY (dest, vx0) dest += stride; |
184 COPY (dest, vx1) dest += stride; | |
185 COPY (dest, vx2) dest += stride; | |
186 COPY (dest, vx3) dest += stride; | |
187 COPY (dest, vx4) dest += stride; | |
188 COPY (dest, vx5) dest += stride; | |
189 COPY (dest, vx6) dest += stride; | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
190 COPY (dest, vx7) |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
828
diff
changeset
|
191 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
192 POWERPC_PERF_STOP_COUNT(altivec_idct_put_num, 1); |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
193 } |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
194 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
195 void idct_add_altivec(uint8_t* dest, int stride, vector_s16_t* block) |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
196 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
197 POWERPC_PERF_DECLARE(altivec_idct_add_num, 1); |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
198 vector_u8_t tmp; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
199 vector_s16_t tmp2, tmp3; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
200 vector_u8_t perm0; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
201 vector_u8_t perm1; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
202 vector_u8_t p0, p1, p; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
203 |
4521
891590781d9e
rename POWERPC_PERFORMANCE_REPORT to CONFIG_POWERPC_PERF
mru
parents:
3973
diff
changeset
|
204 #ifdef CONFIG_POWERPC_PERF |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
205 POWERPC_PERF_START_COUNT(altivec_idct_add_num, 1); |
1839
b370288f004d
Metrowerks CodeWarrior patches by (John Dalgliesh <johnd at defyne dot org>)
michael
parents:
1352
diff
changeset
|
206 #endif |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
828
diff
changeset
|
207 |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
208 IDCT |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
209 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
210 p0 = vec_lvsl (0, dest); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
211 p1 = vec_lvsl (stride, dest); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
212 p = vec_splat_u8 (-1); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
213 perm0 = vec_mergeh (p, p0); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
214 perm1 = vec_mergeh (p, p1); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
215 |
2979 | 216 #define ADD(dest,src,perm) \ |
217 /* *(uint64_t *)&tmp = *(uint64_t *)dest; */ \ | |
218 tmp = vec_ld (0, dest); \ | |
219 tmp2 = (vector_s16_t)vec_perm (tmp, (vector_u8_t)zero, perm); \ | |
220 tmp3 = vec_adds (tmp2, src); \ | |
221 tmp = vec_packsu (tmp3, tmp3); \ | |
222 vec_ste ((vector_u32_t)tmp, 0, (unsigned int *)dest); \ | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
223 vec_ste ((vector_u32_t)tmp, 4, (unsigned int *)dest); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
224 |
2979 | 225 ADD (dest, vx0, perm0) dest += stride; |
226 ADD (dest, vx1, perm1) dest += stride; | |
227 ADD (dest, vx2, perm0) dest += stride; | |
228 ADD (dest, vx3, perm1) dest += stride; | |
229 ADD (dest, vx4, perm0) dest += stride; | |
230 ADD (dest, vx5, perm1) dest += stride; | |
231 ADD (dest, vx6, perm0) dest += stride; | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
232 ADD (dest, vx7, perm1) |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
828
diff
changeset
|
233 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1277
diff
changeset
|
234 POWERPC_PERF_STOP_COUNT(altivec_idct_add_num, 1); |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
235 } |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
diff
changeset
|
236 |