Mercurial > libavcodec.hg
annotate sparc/dsputil_vis.c @ 3198:6b9f0c4fbdbe libavcodec
First part of a series of speed-enchancing patches.
This one sets up a snow.h and makes snow use the dsputil function pointer
framework to access the three functions that will be implemented in asm
in the other parts of the patchset.
Patch by Robert Edele < yartrebo AH earthlink POIS net>
Original thread:
Subject: [Ffmpeg-devel] [PATCH] Snow mmx+sse2 asm optimizations
Date: Sun, 05 Feb 2006 12:47:14 -0500
author | gpoirier |
---|---|
date | Thu, 16 Mar 2006 19:18:18 +0000 |
parents | 0b546eab515d |
children | c8c591fe26f8 |
rev | line source |
---|---|
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1 /* |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
2 * dsputil_vis.c |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3 * Copyright (C) 2003 David S. Miller <davem@redhat.com> |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
4 * |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
5 * This file is part of ffmpeg, a free MPEG-4 video stream decoder. |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
6 * See http://ffmpeg.sourceforge.net/ for updates. |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
7 * |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
8 * ffmpeg is free software; you can redistribute it and/or modify |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
9 * it under the terms of the GNU Lesser General Public License as published by |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
10 * the Free Software Foundation; either version 2.1 of the License, or |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
11 * (at your option) any later version. |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
12 * |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
13 * ffmpeg is distributed in the hope that it will be useful, |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
14 * but WITHOUT ANY WARRANTY; without even the implied warranty of |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
15 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
16 * GNU General Public License for more details. |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
17 * |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
18 * You should have received a copy of the Lesser GNU General Public License |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
19 * along with this program; if not, write to the Free Software |
3036
0b546eab515d
Update licensing information: The FSF changed postal address.
diego
parents:
2979
diff
changeset
|
20 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
21 */ |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
22 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
23 /* The *no_round* functions have been added by James A. Morrison, 2003,2004. |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
24 The vis code from libmpeg2 was adapted for ffmpeg by James A. Morrison. |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
25 */ |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
26 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
27 #include "config.h" |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
28 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
29 #ifdef ARCH_SPARC |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
30 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
31 #include <inttypes.h> |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
32 #include <signal.h> |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
33 #include <setjmp.h> |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
34 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
35 #include "../dsputil.h" |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
36 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
37 #include "vis.h" |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
38 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
39 /* The trick used in some of this file is the formula from the MMX |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
40 * motion comp code, which is: |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
41 * |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
42 * (x+y+1)>>1 == (x|y)-((x^y)>>1) |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
43 * |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
44 * This allows us to average 8 bytes at a time in a 64-bit FPU reg. |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
45 * We avoid overflows by masking before we do the shift, and we |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
46 * implement the shift by multiplying by 1/2 using mul8x16. So in |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
47 * VIS this is (assume 'x' is in f0, 'y' is in f2, a repeating mask |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
48 * of '0xfe' is in f4, a repeating mask of '0x7f' is in f6, and |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
49 * the value 0x80808080 is in f8): |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
50 * |
2979 | 51 * fxor f0, f2, f10 |
52 * fand f10, f4, f10 | |
53 * fmul8x16 f8, f10, f10 | |
54 * fand f10, f6, f10 | |
55 * for f0, f2, f12 | |
56 * fpsub16 f12, f10, f10 | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
57 */ |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
58 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
59 #define ATTR_ALIGN(alignd) __attribute__ ((aligned(alignd))) |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
60 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
61 #define DUP4(x) {x, x, x, x} |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
62 #define DUP8(x) {x, x, x, x, x, x, x, x} |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
63 static const int16_t constants1[] ATTR_ALIGN(8) = DUP4 (1); |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
64 static const int16_t constants2[] ATTR_ALIGN(8) = DUP4 (2); |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
65 static const int16_t constants3[] ATTR_ALIGN(8) = DUP4 (3); |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
66 static const int16_t constants6[] ATTR_ALIGN(8) = DUP4 (6); |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
67 static const int8_t constants_fe[] ATTR_ALIGN(8) = DUP8 (0xfe); |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
68 static const int8_t constants_7f[] ATTR_ALIGN(8) = DUP8 (0x7f); |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
69 static const int8_t constants128[] ATTR_ALIGN(8) = DUP8 (128); |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
70 static const int16_t constants256_512[] ATTR_ALIGN(8) = |
2979 | 71 {256, 512, 256, 512}; |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
72 static const int16_t constants256_1024[] ATTR_ALIGN(8) = |
2979 | 73 {256, 1024, 256, 1024}; |
74 | |
75 #define REF_0 0 | |
76 #define REF_0_1 1 | |
77 #define REF_2 2 | |
78 #define REF_2_1 3 | |
79 #define REF_4 4 | |
80 #define REF_4_1 5 | |
81 #define REF_6 6 | |
82 #define REF_6_1 7 | |
83 #define REF_S0 8 | |
84 #define REF_S0_1 9 | |
85 #define REF_S2 10 | |
86 #define REF_S2_1 11 | |
87 #define REF_S4 12 | |
88 #define REF_S4_1 13 | |
89 #define REF_S6 14 | |
90 #define REF_S6_1 15 | |
91 #define DST_0 16 | |
92 #define DST_1 17 | |
93 #define DST_2 18 | |
94 #define DST_3 19 | |
95 #define CONST_1 20 | |
96 #define CONST_2 20 | |
97 #define CONST_3 20 | |
98 #define CONST_6 20 | |
99 #define MASK_fe 20 | |
100 #define CONST_128 22 | |
101 #define CONST_256 22 | |
102 #define CONST_512 22 | |
103 #define CONST_1024 22 | |
104 #define TMP0 24 | |
105 #define TMP1 25 | |
106 #define TMP2 26 | |
107 #define TMP3 27 | |
108 #define TMP4 28 | |
109 #define TMP5 29 | |
110 #define ZERO 30 | |
111 #define MASK_7f 30 | |
112 | |
113 #define TMP6 32 | |
114 #define TMP8 34 | |
115 #define TMP10 36 | |
116 #define TMP12 38 | |
117 #define TMP14 40 | |
118 #define TMP16 42 | |
119 #define TMP18 44 | |
120 #define TMP20 46 | |
121 #define TMP22 48 | |
122 #define TMP24 50 | |
123 #define TMP26 52 | |
124 #define TMP28 54 | |
125 #define TMP30 56 | |
126 #define TMP32 58 | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
127 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
128 static void MC_put_o_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 129 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
130 { |
2979 | 131 uint8_t *ref = (uint8_t *) _ref; |
132 | |
133 ref = vis_alignaddr(ref); | |
134 do { /* 5 cycles */ | |
135 vis_ld64(ref[0], TMP0); | |
136 | |
137 vis_ld64_2(ref, 8, TMP2); | |
138 | |
139 vis_ld64_2(ref, 16, TMP4); | |
140 ref += stride; | |
141 | |
142 vis_faligndata(TMP0, TMP2, REF_0); | |
143 vis_st64(REF_0, dest[0]); | |
144 | |
145 vis_faligndata(TMP2, TMP4, REF_2); | |
146 vis_st64_2(REF_2, dest, 8); | |
147 dest += stride; | |
148 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
149 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
150 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
151 static void MC_put_o_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 152 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
153 { |
2979 | 154 uint8_t *ref = (uint8_t *) _ref; |
155 | |
156 ref = vis_alignaddr(ref); | |
157 do { /* 4 cycles */ | |
158 vis_ld64(ref[0], TMP0); | |
159 | |
160 vis_ld64(ref[8], TMP2); | |
161 ref += stride; | |
162 | |
163 /* stall */ | |
164 | |
165 vis_faligndata(TMP0, TMP2, REF_0); | |
166 vis_st64(REF_0, dest[0]); | |
167 dest += stride; | |
168 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
169 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
170 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
171 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
172 static void MC_avg_o_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 173 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
174 { |
2979 | 175 uint8_t *ref = (uint8_t *) _ref; |
176 int stride_8 = stride + 8; | |
177 | |
178 ref = vis_alignaddr(ref); | |
179 | |
180 vis_ld64(ref[0], TMP0); | |
181 | |
182 vis_ld64(ref[8], TMP2); | |
183 | |
184 vis_ld64(ref[16], TMP4); | |
185 | |
186 vis_ld64(dest[0], DST_0); | |
187 | |
188 vis_ld64(dest[8], DST_2); | |
189 | |
190 vis_ld64(constants_fe[0], MASK_fe); | |
191 vis_faligndata(TMP0, TMP2, REF_0); | |
192 | |
193 vis_ld64(constants_7f[0], MASK_7f); | |
194 vis_faligndata(TMP2, TMP4, REF_2); | |
195 | |
196 vis_ld64(constants128[0], CONST_128); | |
197 | |
198 ref += stride; | |
199 height = (height >> 1) - 1; | |
200 | |
201 do { /* 24 cycles */ | |
202 vis_ld64(ref[0], TMP0); | |
203 vis_xor(DST_0, REF_0, TMP6); | |
204 | |
205 vis_ld64_2(ref, 8, TMP2); | |
206 vis_and(TMP6, MASK_fe, TMP6); | |
207 | |
208 vis_ld64_2(ref, 16, TMP4); | |
209 ref += stride; | |
210 vis_mul8x16(CONST_128, TMP6, TMP6); | |
211 vis_xor(DST_2, REF_2, TMP8); | |
212 | |
213 vis_and(TMP8, MASK_fe, TMP8); | |
214 | |
215 vis_or(DST_0, REF_0, TMP10); | |
216 vis_ld64_2(dest, stride, DST_0); | |
217 vis_mul8x16(CONST_128, TMP8, TMP8); | |
218 | |
219 vis_or(DST_2, REF_2, TMP12); | |
220 vis_ld64_2(dest, stride_8, DST_2); | |
221 | |
222 vis_ld64(ref[0], TMP14); | |
223 vis_and(TMP6, MASK_7f, TMP6); | |
224 | |
225 vis_and(TMP8, MASK_7f, TMP8); | |
226 | |
227 vis_psub16(TMP10, TMP6, TMP6); | |
228 vis_st64(TMP6, dest[0]); | |
229 | |
230 vis_psub16(TMP12, TMP8, TMP8); | |
231 vis_st64_2(TMP8, dest, 8); | |
232 | |
233 dest += stride; | |
234 vis_ld64_2(ref, 8, TMP16); | |
235 vis_faligndata(TMP0, TMP2, REF_0); | |
236 | |
237 vis_ld64_2(ref, 16, TMP18); | |
238 vis_faligndata(TMP2, TMP4, REF_2); | |
239 ref += stride; | |
240 | |
241 vis_xor(DST_0, REF_0, TMP20); | |
242 | |
243 vis_and(TMP20, MASK_fe, TMP20); | |
244 | |
245 vis_xor(DST_2, REF_2, TMP22); | |
246 vis_mul8x16(CONST_128, TMP20, TMP20); | |
247 | |
248 vis_and(TMP22, MASK_fe, TMP22); | |
249 | |
250 vis_or(DST_0, REF_0, TMP24); | |
251 vis_mul8x16(CONST_128, TMP22, TMP22); | |
252 | |
253 vis_or(DST_2, REF_2, TMP26); | |
254 | |
255 vis_ld64_2(dest, stride, DST_0); | |
256 vis_faligndata(TMP14, TMP16, REF_0); | |
257 | |
258 vis_ld64_2(dest, stride_8, DST_2); | |
259 vis_faligndata(TMP16, TMP18, REF_2); | |
260 | |
261 vis_and(TMP20, MASK_7f, TMP20); | |
262 | |
263 vis_and(TMP22, MASK_7f, TMP22); | |
264 | |
265 vis_psub16(TMP24, TMP20, TMP20); | |
266 vis_st64(TMP20, dest[0]); | |
267 | |
268 vis_psub16(TMP26, TMP22, TMP22); | |
269 vis_st64_2(TMP22, dest, 8); | |
270 dest += stride; | |
271 } while (--height); | |
272 | |
273 vis_ld64(ref[0], TMP0); | |
274 vis_xor(DST_0, REF_0, TMP6); | |
275 | |
276 vis_ld64_2(ref, 8, TMP2); | |
277 vis_and(TMP6, MASK_fe, TMP6); | |
278 | |
279 vis_ld64_2(ref, 16, TMP4); | |
280 vis_mul8x16(CONST_128, TMP6, TMP6); | |
281 vis_xor(DST_2, REF_2, TMP8); | |
282 | |
283 vis_and(TMP8, MASK_fe, TMP8); | |
284 | |
285 vis_or(DST_0, REF_0, TMP10); | |
286 vis_ld64_2(dest, stride, DST_0); | |
287 vis_mul8x16(CONST_128, TMP8, TMP8); | |
288 | |
289 vis_or(DST_2, REF_2, TMP12); | |
290 vis_ld64_2(dest, stride_8, DST_2); | |
291 | |
292 vis_ld64(ref[0], TMP14); | |
293 vis_and(TMP6, MASK_7f, TMP6); | |
294 | |
295 vis_and(TMP8, MASK_7f, TMP8); | |
296 | |
297 vis_psub16(TMP10, TMP6, TMP6); | |
298 vis_st64(TMP6, dest[0]); | |
299 | |
300 vis_psub16(TMP12, TMP8, TMP8); | |
301 vis_st64_2(TMP8, dest, 8); | |
302 | |
303 dest += stride; | |
304 vis_faligndata(TMP0, TMP2, REF_0); | |
305 | |
306 vis_faligndata(TMP2, TMP4, REF_2); | |
307 | |
308 vis_xor(DST_0, REF_0, TMP20); | |
309 | |
310 vis_and(TMP20, MASK_fe, TMP20); | |
311 | |
312 vis_xor(DST_2, REF_2, TMP22); | |
313 vis_mul8x16(CONST_128, TMP20, TMP20); | |
314 | |
315 vis_and(TMP22, MASK_fe, TMP22); | |
316 | |
317 vis_or(DST_0, REF_0, TMP24); | |
318 vis_mul8x16(CONST_128, TMP22, TMP22); | |
319 | |
320 vis_or(DST_2, REF_2, TMP26); | |
321 | |
322 vis_and(TMP20, MASK_7f, TMP20); | |
323 | |
324 vis_and(TMP22, MASK_7f, TMP22); | |
325 | |
326 vis_psub16(TMP24, TMP20, TMP20); | |
327 vis_st64(TMP20, dest[0]); | |
328 | |
329 vis_psub16(TMP26, TMP22, TMP22); | |
330 vis_st64_2(TMP22, dest, 8); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
331 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
332 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
333 static void MC_avg_o_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 334 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
335 { |
2979 | 336 uint8_t *ref = (uint8_t *) _ref; |
337 | |
338 ref = vis_alignaddr(ref); | |
339 | |
340 vis_ld64(ref[0], TMP0); | |
341 | |
342 vis_ld64(ref[8], TMP2); | |
343 | |
344 vis_ld64(dest[0], DST_0); | |
345 | |
346 vis_ld64(constants_fe[0], MASK_fe); | |
347 | |
348 vis_ld64(constants_7f[0], MASK_7f); | |
349 vis_faligndata(TMP0, TMP2, REF_0); | |
350 | |
351 vis_ld64(constants128[0], CONST_128); | |
352 | |
353 ref += stride; | |
354 height = (height >> 1) - 1; | |
355 | |
356 do { /* 12 cycles */ | |
357 vis_ld64(ref[0], TMP0); | |
358 vis_xor(DST_0, REF_0, TMP4); | |
359 | |
360 vis_ld64(ref[8], TMP2); | |
361 vis_and(TMP4, MASK_fe, TMP4); | |
362 | |
363 vis_or(DST_0, REF_0, TMP6); | |
364 vis_ld64_2(dest, stride, DST_0); | |
365 ref += stride; | |
366 vis_mul8x16(CONST_128, TMP4, TMP4); | |
367 | |
368 vis_ld64(ref[0], TMP12); | |
369 vis_faligndata(TMP0, TMP2, REF_0); | |
370 | |
371 vis_ld64(ref[8], TMP2); | |
372 vis_xor(DST_0, REF_0, TMP0); | |
373 ref += stride; | |
374 | |
375 vis_and(TMP0, MASK_fe, TMP0); | |
376 | |
377 vis_and(TMP4, MASK_7f, TMP4); | |
378 | |
379 vis_psub16(TMP6, TMP4, TMP4); | |
380 vis_st64(TMP4, dest[0]); | |
381 dest += stride; | |
382 vis_mul8x16(CONST_128, TMP0, TMP0); | |
383 | |
384 vis_or(DST_0, REF_0, TMP6); | |
385 vis_ld64_2(dest, stride, DST_0); | |
386 | |
387 vis_faligndata(TMP12, TMP2, REF_0); | |
388 | |
389 vis_and(TMP0, MASK_7f, TMP0); | |
390 | |
391 vis_psub16(TMP6, TMP0, TMP4); | |
392 vis_st64(TMP4, dest[0]); | |
393 dest += stride; | |
394 } while (--height); | |
395 | |
396 vis_ld64(ref[0], TMP0); | |
397 vis_xor(DST_0, REF_0, TMP4); | |
398 | |
399 vis_ld64(ref[8], TMP2); | |
400 vis_and(TMP4, MASK_fe, TMP4); | |
401 | |
402 vis_or(DST_0, REF_0, TMP6); | |
403 vis_ld64_2(dest, stride, DST_0); | |
404 vis_mul8x16(CONST_128, TMP4, TMP4); | |
405 | |
406 vis_faligndata(TMP0, TMP2, REF_0); | |
407 | |
408 vis_xor(DST_0, REF_0, TMP0); | |
409 | |
410 vis_and(TMP0, MASK_fe, TMP0); | |
411 | |
412 vis_and(TMP4, MASK_7f, TMP4); | |
413 | |
414 vis_psub16(TMP6, TMP4, TMP4); | |
415 vis_st64(TMP4, dest[0]); | |
416 dest += stride; | |
417 vis_mul8x16(CONST_128, TMP0, TMP0); | |
418 | |
419 vis_or(DST_0, REF_0, TMP6); | |
420 | |
421 vis_and(TMP0, MASK_7f, TMP0); | |
422 | |
423 vis_psub16(TMP6, TMP0, TMP4); | |
424 vis_st64(TMP4, dest[0]); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
425 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
426 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
427 static void MC_put_x_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 428 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
429 { |
2979 | 430 uint8_t *ref = (uint8_t *) _ref; |
431 unsigned long off = (unsigned long) ref & 0x7; | |
432 unsigned long off_plus_1 = off + 1; | |
433 | |
434 ref = vis_alignaddr(ref); | |
435 | |
436 vis_ld64(ref[0], TMP0); | |
437 | |
438 vis_ld64_2(ref, 8, TMP2); | |
439 | |
440 vis_ld64_2(ref, 16, TMP4); | |
441 | |
442 vis_ld64(constants_fe[0], MASK_fe); | |
443 | |
444 vis_ld64(constants_7f[0], MASK_7f); | |
445 vis_faligndata(TMP0, TMP2, REF_0); | |
446 | |
447 vis_ld64(constants128[0], CONST_128); | |
448 vis_faligndata(TMP2, TMP4, REF_4); | |
449 | |
450 if (off != 0x7) { | |
451 vis_alignaddr_g0((void *)off_plus_1); | |
452 vis_faligndata(TMP0, TMP2, REF_2); | |
453 vis_faligndata(TMP2, TMP4, REF_6); | |
454 } else { | |
455 vis_src1(TMP2, REF_2); | |
456 vis_src1(TMP4, REF_6); | |
457 } | |
458 | |
459 ref += stride; | |
460 height = (height >> 1) - 1; | |
461 | |
462 do { /* 34 cycles */ | |
463 vis_ld64(ref[0], TMP0); | |
464 vis_xor(REF_0, REF_2, TMP6); | |
465 | |
466 vis_ld64_2(ref, 8, TMP2); | |
467 vis_xor(REF_4, REF_6, TMP8); | |
468 | |
469 vis_ld64_2(ref, 16, TMP4); | |
470 vis_and(TMP6, MASK_fe, TMP6); | |
471 ref += stride; | |
472 | |
473 vis_ld64(ref[0], TMP14); | |
474 vis_mul8x16(CONST_128, TMP6, TMP6); | |
475 vis_and(TMP8, MASK_fe, TMP8); | |
476 | |
477 vis_ld64_2(ref, 8, TMP16); | |
478 vis_mul8x16(CONST_128, TMP8, TMP8); | |
479 vis_or(REF_0, REF_2, TMP10); | |
480 | |
481 vis_ld64_2(ref, 16, TMP18); | |
482 ref += stride; | |
483 vis_or(REF_4, REF_6, TMP12); | |
484 | |
485 vis_alignaddr_g0((void *)off); | |
486 | |
487 vis_faligndata(TMP0, TMP2, REF_0); | |
488 | |
489 vis_faligndata(TMP2, TMP4, REF_4); | |
490 | |
491 if (off != 0x7) { | |
492 vis_alignaddr_g0((void *)off_plus_1); | |
493 vis_faligndata(TMP0, TMP2, REF_2); | |
494 vis_faligndata(TMP2, TMP4, REF_6); | |
495 } else { | |
496 vis_src1(TMP2, REF_2); | |
497 vis_src1(TMP4, REF_6); | |
498 } | |
499 | |
500 vis_and(TMP6, MASK_7f, TMP6); | |
501 | |
502 vis_and(TMP8, MASK_7f, TMP8); | |
503 | |
504 vis_psub16(TMP10, TMP6, TMP6); | |
505 vis_st64(TMP6, dest[0]); | |
506 | |
507 vis_psub16(TMP12, TMP8, TMP8); | |
508 vis_st64_2(TMP8, dest, 8); | |
509 dest += stride; | |
510 | |
511 vis_xor(REF_0, REF_2, TMP6); | |
512 | |
513 vis_xor(REF_4, REF_6, TMP8); | |
514 | |
515 vis_and(TMP6, MASK_fe, TMP6); | |
516 | |
517 vis_mul8x16(CONST_128, TMP6, TMP6); | |
518 vis_and(TMP8, MASK_fe, TMP8); | |
519 | |
520 vis_mul8x16(CONST_128, TMP8, TMP8); | |
521 vis_or(REF_0, REF_2, TMP10); | |
522 | |
523 vis_or(REF_4, REF_6, TMP12); | |
524 | |
525 vis_alignaddr_g0((void *)off); | |
526 | |
527 vis_faligndata(TMP14, TMP16, REF_0); | |
528 | |
529 vis_faligndata(TMP16, TMP18, REF_4); | |
530 | |
531 if (off != 0x7) { | |
532 vis_alignaddr_g0((void *)off_plus_1); | |
533 vis_faligndata(TMP14, TMP16, REF_2); | |
534 vis_faligndata(TMP16, TMP18, REF_6); | |
535 } else { | |
536 vis_src1(TMP16, REF_2); | |
537 vis_src1(TMP18, REF_6); | |
538 } | |
539 | |
540 vis_and(TMP6, MASK_7f, TMP6); | |
541 | |
542 vis_and(TMP8, MASK_7f, TMP8); | |
543 | |
544 vis_psub16(TMP10, TMP6, TMP6); | |
545 vis_st64(TMP6, dest[0]); | |
546 | |
547 vis_psub16(TMP12, TMP8, TMP8); | |
548 vis_st64_2(TMP8, dest, 8); | |
549 dest += stride; | |
550 } while (--height); | |
551 | |
552 vis_ld64(ref[0], TMP0); | |
553 vis_xor(REF_0, REF_2, TMP6); | |
554 | |
555 vis_ld64_2(ref, 8, TMP2); | |
556 vis_xor(REF_4, REF_6, TMP8); | |
557 | |
558 vis_ld64_2(ref, 16, TMP4); | |
559 vis_and(TMP6, MASK_fe, TMP6); | |
560 | |
561 vis_mul8x16(CONST_128, TMP6, TMP6); | |
562 vis_and(TMP8, MASK_fe, TMP8); | |
563 | |
564 vis_mul8x16(CONST_128, TMP8, TMP8); | |
565 vis_or(REF_0, REF_2, TMP10); | |
566 | |
567 vis_or(REF_4, REF_6, TMP12); | |
568 | |
569 vis_alignaddr_g0((void *)off); | |
570 | |
571 vis_faligndata(TMP0, TMP2, REF_0); | |
572 | |
573 vis_faligndata(TMP2, TMP4, REF_4); | |
574 | |
575 if (off != 0x7) { | |
576 vis_alignaddr_g0((void *)off_plus_1); | |
577 vis_faligndata(TMP0, TMP2, REF_2); | |
578 vis_faligndata(TMP2, TMP4, REF_6); | |
579 } else { | |
580 vis_src1(TMP2, REF_2); | |
581 vis_src1(TMP4, REF_6); | |
582 } | |
583 | |
584 vis_and(TMP6, MASK_7f, TMP6); | |
585 | |
586 vis_and(TMP8, MASK_7f, TMP8); | |
587 | |
588 vis_psub16(TMP10, TMP6, TMP6); | |
589 vis_st64(TMP6, dest[0]); | |
590 | |
591 vis_psub16(TMP12, TMP8, TMP8); | |
592 vis_st64_2(TMP8, dest, 8); | |
593 dest += stride; | |
594 | |
595 vis_xor(REF_0, REF_2, TMP6); | |
596 | |
597 vis_xor(REF_4, REF_6, TMP8); | |
598 | |
599 vis_and(TMP6, MASK_fe, TMP6); | |
600 | |
601 vis_mul8x16(CONST_128, TMP6, TMP6); | |
602 vis_and(TMP8, MASK_fe, TMP8); | |
603 | |
604 vis_mul8x16(CONST_128, TMP8, TMP8); | |
605 vis_or(REF_0, REF_2, TMP10); | |
606 | |
607 vis_or(REF_4, REF_6, TMP12); | |
608 | |
609 vis_and(TMP6, MASK_7f, TMP6); | |
610 | |
611 vis_and(TMP8, MASK_7f, TMP8); | |
612 | |
613 vis_psub16(TMP10, TMP6, TMP6); | |
614 vis_st64(TMP6, dest[0]); | |
615 | |
616 vis_psub16(TMP12, TMP8, TMP8); | |
617 vis_st64_2(TMP8, dest, 8); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
618 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
619 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
620 static void MC_put_x_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 621 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
622 { |
2979 | 623 uint8_t *ref = (uint8_t *) _ref; |
624 unsigned long off = (unsigned long) ref & 0x7; | |
625 unsigned long off_plus_1 = off + 1; | |
626 | |
627 ref = vis_alignaddr(ref); | |
628 | |
629 vis_ld64(ref[0], TMP0); | |
630 | |
631 vis_ld64(ref[8], TMP2); | |
632 | |
633 vis_ld64(constants_fe[0], MASK_fe); | |
634 | |
635 vis_ld64(constants_7f[0], MASK_7f); | |
636 | |
637 vis_ld64(constants128[0], CONST_128); | |
638 vis_faligndata(TMP0, TMP2, REF_0); | |
639 | |
640 if (off != 0x7) { | |
641 vis_alignaddr_g0((void *)off_plus_1); | |
642 vis_faligndata(TMP0, TMP2, REF_2); | |
643 } else { | |
644 vis_src1(TMP2, REF_2); | |
645 } | |
646 | |
647 ref += stride; | |
648 height = (height >> 1) - 1; | |
649 | |
650 do { /* 20 cycles */ | |
651 vis_ld64(ref[0], TMP0); | |
652 vis_xor(REF_0, REF_2, TMP4); | |
653 | |
654 vis_ld64_2(ref, 8, TMP2); | |
655 vis_and(TMP4, MASK_fe, TMP4); | |
656 ref += stride; | |
657 | |
658 vis_ld64(ref[0], TMP8); | |
659 vis_or(REF_0, REF_2, TMP6); | |
660 vis_mul8x16(CONST_128, TMP4, TMP4); | |
661 | |
662 vis_alignaddr_g0((void *)off); | |
663 | |
664 vis_ld64_2(ref, 8, TMP10); | |
665 ref += stride; | |
666 vis_faligndata(TMP0, TMP2, REF_0); | |
667 | |
668 if (off != 0x7) { | |
669 vis_alignaddr_g0((void *)off_plus_1); | |
670 vis_faligndata(TMP0, TMP2, REF_2); | |
671 } else { | |
672 vis_src1(TMP2, REF_2); | |
673 } | |
674 | |
675 vis_and(TMP4, MASK_7f, TMP4); | |
676 | |
677 vis_psub16(TMP6, TMP4, DST_0); | |
678 vis_st64(DST_0, dest[0]); | |
679 dest += stride; | |
680 | |
681 vis_xor(REF_0, REF_2, TMP12); | |
682 | |
683 vis_and(TMP12, MASK_fe, TMP12); | |
684 | |
685 vis_or(REF_0, REF_2, TMP14); | |
686 vis_mul8x16(CONST_128, TMP12, TMP12); | |
687 | |
688 vis_alignaddr_g0((void *)off); | |
689 vis_faligndata(TMP8, TMP10, REF_0); | |
690 if (off != 0x7) { | |
691 vis_alignaddr_g0((void *)off_plus_1); | |
692 vis_faligndata(TMP8, TMP10, REF_2); | |
693 } else { | |
694 vis_src1(TMP10, REF_2); | |
695 } | |
696 | |
697 vis_and(TMP12, MASK_7f, TMP12); | |
698 | |
699 vis_psub16(TMP14, TMP12, DST_0); | |
700 vis_st64(DST_0, dest[0]); | |
701 dest += stride; | |
702 } while (--height); | |
703 | |
704 vis_ld64(ref[0], TMP0); | |
705 vis_xor(REF_0, REF_2, TMP4); | |
706 | |
707 vis_ld64_2(ref, 8, TMP2); | |
708 vis_and(TMP4, MASK_fe, TMP4); | |
709 | |
710 vis_or(REF_0, REF_2, TMP6); | |
711 vis_mul8x16(CONST_128, TMP4, TMP4); | |
712 | |
713 vis_alignaddr_g0((void *)off); | |
714 | |
715 vis_faligndata(TMP0, TMP2, REF_0); | |
716 | |
717 if (off != 0x7) { | |
718 vis_alignaddr_g0((void *)off_plus_1); | |
719 vis_faligndata(TMP0, TMP2, REF_2); | |
720 } else { | |
721 vis_src1(TMP2, REF_2); | |
722 } | |
723 | |
724 vis_and(TMP4, MASK_7f, TMP4); | |
725 | |
726 vis_psub16(TMP6, TMP4, DST_0); | |
727 vis_st64(DST_0, dest[0]); | |
728 dest += stride; | |
729 | |
730 vis_xor(REF_0, REF_2, TMP12); | |
731 | |
732 vis_and(TMP12, MASK_fe, TMP12); | |
733 | |
734 vis_or(REF_0, REF_2, TMP14); | |
735 vis_mul8x16(CONST_128, TMP12, TMP12); | |
736 | |
737 vis_and(TMP12, MASK_7f, TMP12); | |
738 | |
739 vis_psub16(TMP14, TMP12, DST_0); | |
740 vis_st64(DST_0, dest[0]); | |
741 dest += stride; | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
742 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
743 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
744 static void MC_avg_x_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 745 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
746 { |
2979 | 747 uint8_t *ref = (uint8_t *) _ref; |
748 unsigned long off = (unsigned long) ref & 0x7; | |
749 unsigned long off_plus_1 = off + 1; | |
750 | |
751 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
752 | |
753 vis_ld64(constants3[0], CONST_3); | |
754 vis_fzero(ZERO); | |
755 vis_ld64(constants256_512[0], CONST_256); | |
756 | |
757 ref = vis_alignaddr(ref); | |
758 do { /* 26 cycles */ | |
759 vis_ld64(ref[0], TMP0); | |
760 | |
761 vis_ld64(ref[8], TMP2); | |
762 | |
763 vis_alignaddr_g0((void *)off); | |
764 | |
765 vis_ld64(ref[16], TMP4); | |
766 | |
767 vis_ld64(dest[0], DST_0); | |
768 vis_faligndata(TMP0, TMP2, REF_0); | |
769 | |
770 vis_ld64(dest[8], DST_2); | |
771 vis_faligndata(TMP2, TMP4, REF_4); | |
772 | |
773 if (off != 0x7) { | |
774 vis_alignaddr_g0((void *)off_plus_1); | |
775 vis_faligndata(TMP0, TMP2, REF_2); | |
776 vis_faligndata(TMP2, TMP4, REF_6); | |
777 } else { | |
778 vis_src1(TMP2, REF_2); | |
779 vis_src1(TMP4, REF_6); | |
780 } | |
781 | |
782 vis_mul8x16au(REF_0, CONST_256, TMP0); | |
783 | |
784 vis_pmerge(ZERO, REF_2, TMP4); | |
785 vis_mul8x16au(REF_0_1, CONST_256, TMP2); | |
786 | |
787 vis_pmerge(ZERO, REF_2_1, TMP6); | |
788 | |
789 vis_padd16(TMP0, TMP4, TMP0); | |
790 | |
791 vis_mul8x16al(DST_0, CONST_512, TMP4); | |
792 vis_padd16(TMP2, TMP6, TMP2); | |
793 | |
794 vis_mul8x16al(DST_1, CONST_512, TMP6); | |
795 | |
796 vis_mul8x16au(REF_6, CONST_256, TMP12); | |
797 | |
798 vis_padd16(TMP0, TMP4, TMP0); | |
799 vis_mul8x16au(REF_6_1, CONST_256, TMP14); | |
800 | |
801 vis_padd16(TMP2, TMP6, TMP2); | |
802 vis_mul8x16au(REF_4, CONST_256, TMP16); | |
803 | |
804 vis_padd16(TMP0, CONST_3, TMP8); | |
805 vis_mul8x16au(REF_4_1, CONST_256, TMP18); | |
806 | |
807 vis_padd16(TMP2, CONST_3, TMP10); | |
808 vis_pack16(TMP8, DST_0); | |
809 | |
810 vis_pack16(TMP10, DST_1); | |
811 vis_padd16(TMP16, TMP12, TMP0); | |
812 | |
813 vis_st64(DST_0, dest[0]); | |
814 vis_mul8x16al(DST_2, CONST_512, TMP4); | |
815 vis_padd16(TMP18, TMP14, TMP2); | |
816 | |
817 vis_mul8x16al(DST_3, CONST_512, TMP6); | |
818 vis_padd16(TMP0, CONST_3, TMP0); | |
819 | |
820 vis_padd16(TMP2, CONST_3, TMP2); | |
821 | |
822 vis_padd16(TMP0, TMP4, TMP0); | |
823 | |
824 vis_padd16(TMP2, TMP6, TMP2); | |
825 vis_pack16(TMP0, DST_2); | |
826 | |
827 vis_pack16(TMP2, DST_3); | |
828 vis_st64(DST_2, dest[8]); | |
829 | |
830 ref += stride; | |
831 dest += stride; | |
832 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
833 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
834 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
835 static void MC_avg_x_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 836 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
837 { |
2979 | 838 uint8_t *ref = (uint8_t *) _ref; |
839 unsigned long off = (unsigned long) ref & 0x7; | |
840 unsigned long off_plus_1 = off + 1; | |
841 int stride_times_2 = stride << 1; | |
842 | |
843 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
844 | |
845 vis_ld64(constants3[0], CONST_3); | |
846 vis_fzero(ZERO); | |
847 vis_ld64(constants256_512[0], CONST_256); | |
848 | |
849 ref = vis_alignaddr(ref); | |
850 height >>= 2; | |
851 do { /* 47 cycles */ | |
852 vis_ld64(ref[0], TMP0); | |
853 | |
854 vis_ld64_2(ref, 8, TMP2); | |
855 ref += stride; | |
856 | |
857 vis_alignaddr_g0((void *)off); | |
858 | |
859 vis_ld64(ref[0], TMP4); | |
860 vis_faligndata(TMP0, TMP2, REF_0); | |
861 | |
862 vis_ld64_2(ref, 8, TMP6); | |
863 ref += stride; | |
864 | |
865 vis_ld64(ref[0], TMP8); | |
866 | |
867 vis_ld64_2(ref, 8, TMP10); | |
868 ref += stride; | |
869 vis_faligndata(TMP4, TMP6, REF_4); | |
870 | |
871 vis_ld64(ref[0], TMP12); | |
872 | |
873 vis_ld64_2(ref, 8, TMP14); | |
874 ref += stride; | |
875 vis_faligndata(TMP8, TMP10, REF_S0); | |
876 | |
877 vis_faligndata(TMP12, TMP14, REF_S4); | |
878 | |
879 if (off != 0x7) { | |
880 vis_alignaddr_g0((void *)off_plus_1); | |
881 | |
882 vis_ld64(dest[0], DST_0); | |
883 vis_faligndata(TMP0, TMP2, REF_2); | |
884 | |
885 vis_ld64_2(dest, stride, DST_2); | |
886 vis_faligndata(TMP4, TMP6, REF_6); | |
887 | |
888 vis_faligndata(TMP8, TMP10, REF_S2); | |
889 | |
890 vis_faligndata(TMP12, TMP14, REF_S6); | |
891 } else { | |
892 vis_ld64(dest[0], DST_0); | |
893 vis_src1(TMP2, REF_2); | |
894 | |
895 vis_ld64_2(dest, stride, DST_2); | |
896 vis_src1(TMP6, REF_6); | |
897 | |
898 vis_src1(TMP10, REF_S2); | |
899 | |
900 vis_src1(TMP14, REF_S6); | |
901 } | |
902 | |
903 vis_pmerge(ZERO, REF_0, TMP0); | |
904 vis_mul8x16au(REF_0_1, CONST_256, TMP2); | |
905 | |
906 vis_pmerge(ZERO, REF_2, TMP4); | |
907 vis_mul8x16au(REF_2_1, CONST_256, TMP6); | |
908 | |
909 vis_padd16(TMP0, CONST_3, TMP0); | |
910 vis_mul8x16al(DST_0, CONST_512, TMP16); | |
911 | |
912 vis_padd16(TMP2, CONST_3, TMP2); | |
913 vis_mul8x16al(DST_1, CONST_512, TMP18); | |
914 | |
915 vis_padd16(TMP0, TMP4, TMP0); | |
916 vis_mul8x16au(REF_4, CONST_256, TMP8); | |
917 | |
918 vis_padd16(TMP2, TMP6, TMP2); | |
919 vis_mul8x16au(REF_4_1, CONST_256, TMP10); | |
920 | |
921 vis_padd16(TMP0, TMP16, TMP0); | |
922 vis_mul8x16au(REF_6, CONST_256, TMP12); | |
923 | |
924 vis_padd16(TMP2, TMP18, TMP2); | |
925 vis_mul8x16au(REF_6_1, CONST_256, TMP14); | |
926 | |
927 vis_padd16(TMP8, CONST_3, TMP8); | |
928 vis_mul8x16al(DST_2, CONST_512, TMP16); | |
929 | |
930 vis_padd16(TMP8, TMP12, TMP8); | |
931 vis_mul8x16al(DST_3, CONST_512, TMP18); | |
932 | |
933 vis_padd16(TMP10, TMP14, TMP10); | |
934 vis_pack16(TMP0, DST_0); | |
935 | |
936 vis_pack16(TMP2, DST_1); | |
937 vis_st64(DST_0, dest[0]); | |
938 dest += stride; | |
939 vis_padd16(TMP10, CONST_3, TMP10); | |
940 | |
941 vis_ld64_2(dest, stride, DST_0); | |
942 vis_padd16(TMP8, TMP16, TMP8); | |
943 | |
944 vis_ld64_2(dest, stride_times_2, TMP4/*DST_2*/); | |
945 vis_padd16(TMP10, TMP18, TMP10); | |
946 vis_pack16(TMP8, DST_2); | |
947 | |
948 vis_pack16(TMP10, DST_3); | |
949 vis_st64(DST_2, dest[0]); | |
950 dest += stride; | |
951 | |
952 vis_mul8x16au(REF_S0_1, CONST_256, TMP2); | |
953 vis_pmerge(ZERO, REF_S0, TMP0); | |
954 | |
955 vis_pmerge(ZERO, REF_S2, TMP24); | |
956 vis_mul8x16au(REF_S2_1, CONST_256, TMP6); | |
957 | |
958 vis_padd16(TMP0, CONST_3, TMP0); | |
959 vis_mul8x16au(REF_S4, CONST_256, TMP8); | |
960 | |
961 vis_padd16(TMP2, CONST_3, TMP2); | |
962 vis_mul8x16au(REF_S4_1, CONST_256, TMP10); | |
963 | |
964 vis_padd16(TMP0, TMP24, TMP0); | |
965 vis_mul8x16au(REF_S6, CONST_256, TMP12); | |
966 | |
967 vis_padd16(TMP2, TMP6, TMP2); | |
968 vis_mul8x16au(REF_S6_1, CONST_256, TMP14); | |
969 | |
970 vis_padd16(TMP8, CONST_3, TMP8); | |
971 vis_mul8x16al(DST_0, CONST_512, TMP16); | |
972 | |
973 vis_padd16(TMP10, CONST_3, TMP10); | |
974 vis_mul8x16al(DST_1, CONST_512, TMP18); | |
975 | |
976 vis_padd16(TMP8, TMP12, TMP8); | |
977 vis_mul8x16al(TMP4/*DST_2*/, CONST_512, TMP20); | |
978 | |
979 vis_mul8x16al(TMP5/*DST_3*/, CONST_512, TMP22); | |
980 vis_padd16(TMP0, TMP16, TMP0); | |
981 | |
982 vis_padd16(TMP2, TMP18, TMP2); | |
983 vis_pack16(TMP0, DST_0); | |
984 | |
985 vis_padd16(TMP10, TMP14, TMP10); | |
986 vis_pack16(TMP2, DST_1); | |
987 vis_st64(DST_0, dest[0]); | |
988 dest += stride; | |
989 | |
990 vis_padd16(TMP8, TMP20, TMP8); | |
991 | |
992 vis_padd16(TMP10, TMP22, TMP10); | |
993 vis_pack16(TMP8, DST_2); | |
994 | |
995 vis_pack16(TMP10, DST_3); | |
996 vis_st64(DST_2, dest[0]); | |
997 dest += stride; | |
998 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
999 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1000 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1001 static void MC_put_y_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 1002 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1003 { |
2979 | 1004 uint8_t *ref = (uint8_t *) _ref; |
1005 | |
1006 ref = vis_alignaddr(ref); | |
1007 vis_ld64(ref[0], TMP0); | |
1008 | |
1009 vis_ld64_2(ref, 8, TMP2); | |
1010 | |
1011 vis_ld64_2(ref, 16, TMP4); | |
1012 ref += stride; | |
1013 | |
1014 vis_ld64(ref[0], TMP6); | |
1015 vis_faligndata(TMP0, TMP2, REF_0); | |
1016 | |
1017 vis_ld64_2(ref, 8, TMP8); | |
1018 vis_faligndata(TMP2, TMP4, REF_4); | |
1019 | |
1020 vis_ld64_2(ref, 16, TMP10); | |
1021 ref += stride; | |
1022 | |
1023 vis_ld64(constants_fe[0], MASK_fe); | |
1024 vis_faligndata(TMP6, TMP8, REF_2); | |
1025 | |
1026 vis_ld64(constants_7f[0], MASK_7f); | |
1027 vis_faligndata(TMP8, TMP10, REF_6); | |
1028 | |
1029 vis_ld64(constants128[0], CONST_128); | |
1030 height = (height >> 1) - 1; | |
1031 do { /* 24 cycles */ | |
1032 vis_ld64(ref[0], TMP0); | |
1033 vis_xor(REF_0, REF_2, TMP12); | |
1034 | |
1035 vis_ld64_2(ref, 8, TMP2); | |
1036 vis_xor(REF_4, REF_6, TMP16); | |
1037 | |
1038 vis_ld64_2(ref, 16, TMP4); | |
1039 ref += stride; | |
1040 vis_or(REF_0, REF_2, TMP14); | |
1041 | |
1042 vis_ld64(ref[0], TMP6); | |
1043 vis_or(REF_4, REF_6, TMP18); | |
1044 | |
1045 vis_ld64_2(ref, 8, TMP8); | |
1046 vis_faligndata(TMP0, TMP2, REF_0); | |
1047 | |
1048 vis_ld64_2(ref, 16, TMP10); | |
1049 ref += stride; | |
1050 vis_faligndata(TMP2, TMP4, REF_4); | |
1051 | |
1052 vis_and(TMP12, MASK_fe, TMP12); | |
1053 | |
1054 vis_and(TMP16, MASK_fe, TMP16); | |
1055 vis_mul8x16(CONST_128, TMP12, TMP12); | |
1056 | |
1057 vis_mul8x16(CONST_128, TMP16, TMP16); | |
1058 vis_xor(REF_0, REF_2, TMP0); | |
1059 | |
1060 vis_xor(REF_4, REF_6, TMP2); | |
1061 | |
1062 vis_or(REF_0, REF_2, TMP20); | |
1063 | |
1064 vis_and(TMP12, MASK_7f, TMP12); | |
1065 | |
1066 vis_and(TMP16, MASK_7f, TMP16); | |
1067 | |
1068 vis_psub16(TMP14, TMP12, TMP12); | |
1069 vis_st64(TMP12, dest[0]); | |
1070 | |
1071 vis_psub16(TMP18, TMP16, TMP16); | |
1072 vis_st64_2(TMP16, dest, 8); | |
1073 dest += stride; | |
1074 | |
1075 vis_or(REF_4, REF_6, TMP18); | |
1076 | |
1077 vis_and(TMP0, MASK_fe, TMP0); | |
1078 | |
1079 vis_and(TMP2, MASK_fe, TMP2); | |
1080 vis_mul8x16(CONST_128, TMP0, TMP0); | |
1081 | |
1082 vis_faligndata(TMP6, TMP8, REF_2); | |
1083 vis_mul8x16(CONST_128, TMP2, TMP2); | |
1084 | |
1085 vis_faligndata(TMP8, TMP10, REF_6); | |
1086 | |
1087 vis_and(TMP0, MASK_7f, TMP0); | |
1088 | |
1089 vis_and(TMP2, MASK_7f, TMP2); | |
1090 | |
1091 vis_psub16(TMP20, TMP0, TMP0); | |
1092 vis_st64(TMP0, dest[0]); | |
1093 | |
1094 vis_psub16(TMP18, TMP2, TMP2); | |
1095 vis_st64_2(TMP2, dest, 8); | |
1096 dest += stride; | |
1097 } while (--height); | |
1098 | |
1099 vis_ld64(ref[0], TMP0); | |
1100 vis_xor(REF_0, REF_2, TMP12); | |
1101 | |
1102 vis_ld64_2(ref, 8, TMP2); | |
1103 vis_xor(REF_4, REF_6, TMP16); | |
1104 | |
1105 vis_ld64_2(ref, 16, TMP4); | |
1106 vis_or(REF_0, REF_2, TMP14); | |
1107 | |
1108 vis_or(REF_4, REF_6, TMP18); | |
1109 | |
1110 vis_faligndata(TMP0, TMP2, REF_0); | |
1111 | |
1112 vis_faligndata(TMP2, TMP4, REF_4); | |
1113 | |
1114 vis_and(TMP12, MASK_fe, TMP12); | |
1115 | |
1116 vis_and(TMP16, MASK_fe, TMP16); | |
1117 vis_mul8x16(CONST_128, TMP12, TMP12); | |
1118 | |
1119 vis_mul8x16(CONST_128, TMP16, TMP16); | |
1120 vis_xor(REF_0, REF_2, TMP0); | |
1121 | |
1122 vis_xor(REF_4, REF_6, TMP2); | |
1123 | |
1124 vis_or(REF_0, REF_2, TMP20); | |
1125 | |
1126 vis_and(TMP12, MASK_7f, TMP12); | |
1127 | |
1128 vis_and(TMP16, MASK_7f, TMP16); | |
1129 | |
1130 vis_psub16(TMP14, TMP12, TMP12); | |
1131 vis_st64(TMP12, dest[0]); | |
1132 | |
1133 vis_psub16(TMP18, TMP16, TMP16); | |
1134 vis_st64_2(TMP16, dest, 8); | |
1135 dest += stride; | |
1136 | |
1137 vis_or(REF_4, REF_6, TMP18); | |
1138 | |
1139 vis_and(TMP0, MASK_fe, TMP0); | |
1140 | |
1141 vis_and(TMP2, MASK_fe, TMP2); | |
1142 vis_mul8x16(CONST_128, TMP0, TMP0); | |
1143 | |
1144 vis_mul8x16(CONST_128, TMP2, TMP2); | |
1145 | |
1146 vis_and(TMP0, MASK_7f, TMP0); | |
1147 | |
1148 vis_and(TMP2, MASK_7f, TMP2); | |
1149 | |
1150 vis_psub16(TMP20, TMP0, TMP0); | |
1151 vis_st64(TMP0, dest[0]); | |
1152 | |
1153 vis_psub16(TMP18, TMP2, TMP2); | |
1154 vis_st64_2(TMP2, dest, 8); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1155 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1156 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1157 static void MC_put_y_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 1158 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1159 { |
2979 | 1160 uint8_t *ref = (uint8_t *) _ref; |
1161 | |
1162 ref = vis_alignaddr(ref); | |
1163 vis_ld64(ref[0], TMP0); | |
1164 | |
1165 vis_ld64_2(ref, 8, TMP2); | |
1166 ref += stride; | |
1167 | |
1168 vis_ld64(ref[0], TMP4); | |
1169 | |
1170 vis_ld64_2(ref, 8, TMP6); | |
1171 ref += stride; | |
1172 | |
1173 vis_ld64(constants_fe[0], MASK_fe); | |
1174 vis_faligndata(TMP0, TMP2, REF_0); | |
1175 | |
1176 vis_ld64(constants_7f[0], MASK_7f); | |
1177 vis_faligndata(TMP4, TMP6, REF_2); | |
1178 | |
1179 vis_ld64(constants128[0], CONST_128); | |
1180 height = (height >> 1) - 1; | |
1181 do { /* 12 cycles */ | |
1182 vis_ld64(ref[0], TMP0); | |
1183 vis_xor(REF_0, REF_2, TMP4); | |
1184 | |
1185 vis_ld64_2(ref, 8, TMP2); | |
1186 ref += stride; | |
1187 vis_and(TMP4, MASK_fe, TMP4); | |
1188 | |
1189 vis_or(REF_0, REF_2, TMP6); | |
1190 vis_mul8x16(CONST_128, TMP4, TMP4); | |
1191 | |
1192 vis_faligndata(TMP0, TMP2, REF_0); | |
1193 vis_ld64(ref[0], TMP0); | |
1194 | |
1195 vis_ld64_2(ref, 8, TMP2); | |
1196 ref += stride; | |
1197 vis_xor(REF_0, REF_2, TMP12); | |
1198 | |
1199 vis_and(TMP4, MASK_7f, TMP4); | |
1200 | |
1201 vis_and(TMP12, MASK_fe, TMP12); | |
1202 | |
1203 vis_mul8x16(CONST_128, TMP12, TMP12); | |
1204 vis_or(REF_0, REF_2, TMP14); | |
1205 | |
1206 vis_psub16(TMP6, TMP4, DST_0); | |
1207 vis_st64(DST_0, dest[0]); | |
1208 dest += stride; | |
1209 | |
1210 vis_faligndata(TMP0, TMP2, REF_2); | |
1211 | |
1212 vis_and(TMP12, MASK_7f, TMP12); | |
1213 | |
1214 vis_psub16(TMP14, TMP12, DST_0); | |
1215 vis_st64(DST_0, dest[0]); | |
1216 dest += stride; | |
1217 } while (--height); | |
1218 | |
1219 vis_ld64(ref[0], TMP0); | |
1220 vis_xor(REF_0, REF_2, TMP4); | |
1221 | |
1222 vis_ld64_2(ref, 8, TMP2); | |
1223 vis_and(TMP4, MASK_fe, TMP4); | |
1224 | |
1225 vis_or(REF_0, REF_2, TMP6); | |
1226 vis_mul8x16(CONST_128, TMP4, TMP4); | |
1227 | |
1228 vis_faligndata(TMP0, TMP2, REF_0); | |
1229 | |
1230 vis_xor(REF_0, REF_2, TMP12); | |
1231 | |
1232 vis_and(TMP4, MASK_7f, TMP4); | |
1233 | |
1234 vis_and(TMP12, MASK_fe, TMP12); | |
1235 | |
1236 vis_mul8x16(CONST_128, TMP12, TMP12); | |
1237 vis_or(REF_0, REF_2, TMP14); | |
1238 | |
1239 vis_psub16(TMP6, TMP4, DST_0); | |
1240 vis_st64(DST_0, dest[0]); | |
1241 dest += stride; | |
1242 | |
1243 vis_and(TMP12, MASK_7f, TMP12); | |
1244 | |
1245 vis_psub16(TMP14, TMP12, DST_0); | |
1246 vis_st64(DST_0, dest[0]); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1247 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1248 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1249 static void MC_avg_y_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 1250 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1251 { |
2979 | 1252 uint8_t *ref = (uint8_t *) _ref; |
1253 int stride_8 = stride + 8; | |
1254 int stride_16 = stride + 16; | |
1255 | |
1256 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
1257 | |
1258 ref = vis_alignaddr(ref); | |
1259 | |
1260 vis_ld64(ref[ 0], TMP0); | |
1261 vis_fzero(ZERO); | |
1262 | |
1263 vis_ld64(ref[ 8], TMP2); | |
1264 | |
1265 vis_ld64(ref[16], TMP4); | |
1266 | |
1267 vis_ld64(constants3[0], CONST_3); | |
1268 vis_faligndata(TMP0, TMP2, REF_2); | |
1269 | |
1270 vis_ld64(constants256_512[0], CONST_256); | |
1271 vis_faligndata(TMP2, TMP4, REF_6); | |
1272 height >>= 1; | |
1273 | |
1274 do { /* 31 cycles */ | |
1275 vis_ld64_2(ref, stride, TMP0); | |
1276 vis_pmerge(ZERO, REF_2, TMP12); | |
1277 vis_mul8x16au(REF_2_1, CONST_256, TMP14); | |
1278 | |
1279 vis_ld64_2(ref, stride_8, TMP2); | |
1280 vis_pmerge(ZERO, REF_6, TMP16); | |
1281 vis_mul8x16au(REF_6_1, CONST_256, TMP18); | |
1282 | |
1283 vis_ld64_2(ref, stride_16, TMP4); | |
1284 ref += stride; | |
1285 | |
1286 vis_ld64(dest[0], DST_0); | |
1287 vis_faligndata(TMP0, TMP2, REF_0); | |
1288 | |
1289 vis_ld64_2(dest, 8, DST_2); | |
1290 vis_faligndata(TMP2, TMP4, REF_4); | |
1291 | |
1292 vis_ld64_2(ref, stride, TMP6); | |
1293 vis_pmerge(ZERO, REF_0, TMP0); | |
1294 vis_mul8x16au(REF_0_1, CONST_256, TMP2); | |
1295 | |
1296 vis_ld64_2(ref, stride_8, TMP8); | |
1297 vis_pmerge(ZERO, REF_4, TMP4); | |
1298 | |
1299 vis_ld64_2(ref, stride_16, TMP10); | |
1300 ref += stride; | |
1301 | |
1302 vis_ld64_2(dest, stride, REF_S0/*DST_4*/); | |
1303 vis_faligndata(TMP6, TMP8, REF_2); | |
1304 vis_mul8x16au(REF_4_1, CONST_256, TMP6); | |
1305 | |
1306 vis_ld64_2(dest, stride_8, REF_S2/*DST_6*/); | |
1307 vis_faligndata(TMP8, TMP10, REF_6); | |
1308 vis_mul8x16al(DST_0, CONST_512, TMP20); | |
1309 | |
1310 vis_padd16(TMP0, CONST_3, TMP0); | |
1311 vis_mul8x16al(DST_1, CONST_512, TMP22); | |
1312 | |
1313 vis_padd16(TMP2, CONST_3, TMP2); | |
1314 vis_mul8x16al(DST_2, CONST_512, TMP24); | |
1315 | |
1316 vis_padd16(TMP4, CONST_3, TMP4); | |
1317 vis_mul8x16al(DST_3, CONST_512, TMP26); | |
1318 | |
1319 vis_padd16(TMP6, CONST_3, TMP6); | |
1320 | |
1321 vis_padd16(TMP12, TMP20, TMP12); | |
1322 vis_mul8x16al(REF_S0, CONST_512, TMP20); | |
1323 | |
1324 vis_padd16(TMP14, TMP22, TMP14); | |
1325 vis_mul8x16al(REF_S0_1, CONST_512, TMP22); | |
1326 | |
1327 vis_padd16(TMP16, TMP24, TMP16); | |
1328 vis_mul8x16al(REF_S2, CONST_512, TMP24); | |
1329 | |
1330 vis_padd16(TMP18, TMP26, TMP18); | |
1331 vis_mul8x16al(REF_S2_1, CONST_512, TMP26); | |
1332 | |
1333 vis_padd16(TMP12, TMP0, TMP12); | |
1334 vis_mul8x16au(REF_2, CONST_256, TMP28); | |
1335 | |
1336 vis_padd16(TMP14, TMP2, TMP14); | |
1337 vis_mul8x16au(REF_2_1, CONST_256, TMP30); | |
1338 | |
1339 vis_padd16(TMP16, TMP4, TMP16); | |
1340 vis_mul8x16au(REF_6, CONST_256, REF_S4); | |
1341 | |
1342 vis_padd16(TMP18, TMP6, TMP18); | |
1343 vis_mul8x16au(REF_6_1, CONST_256, REF_S6); | |
1344 | |
1345 vis_pack16(TMP12, DST_0); | |
1346 vis_padd16(TMP28, TMP0, TMP12); | |
1347 | |
1348 vis_pack16(TMP14, DST_1); | |
1349 vis_st64(DST_0, dest[0]); | |
1350 vis_padd16(TMP30, TMP2, TMP14); | |
1351 | |
1352 vis_pack16(TMP16, DST_2); | |
1353 vis_padd16(REF_S4, TMP4, TMP16); | |
1354 | |
1355 vis_pack16(TMP18, DST_3); | |
1356 vis_st64_2(DST_2, dest, 8); | |
1357 dest += stride; | |
1358 vis_padd16(REF_S6, TMP6, TMP18); | |
1359 | |
1360 vis_padd16(TMP12, TMP20, TMP12); | |
1361 | |
1362 vis_padd16(TMP14, TMP22, TMP14); | |
1363 vis_pack16(TMP12, DST_0); | |
1364 | |
1365 vis_padd16(TMP16, TMP24, TMP16); | |
1366 vis_pack16(TMP14, DST_1); | |
1367 vis_st64(DST_0, dest[0]); | |
1368 | |
1369 vis_padd16(TMP18, TMP26, TMP18); | |
1370 vis_pack16(TMP16, DST_2); | |
1371 | |
1372 vis_pack16(TMP18, DST_3); | |
1373 vis_st64_2(DST_2, dest, 8); | |
1374 dest += stride; | |
1375 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1376 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1377 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1378 static void MC_avg_y_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 1379 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1380 { |
2979 | 1381 uint8_t *ref = (uint8_t *) _ref; |
1382 int stride_8 = stride + 8; | |
1383 | |
1384 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
1385 | |
1386 ref = vis_alignaddr(ref); | |
1387 | |
1388 vis_ld64(ref[ 0], TMP0); | |
1389 vis_fzero(ZERO); | |
1390 | |
1391 vis_ld64(ref[ 8], TMP2); | |
1392 | |
1393 vis_ld64(constants3[0], CONST_3); | |
1394 vis_faligndata(TMP0, TMP2, REF_2); | |
1395 | |
1396 vis_ld64(constants256_512[0], CONST_256); | |
1397 | |
1398 height >>= 1; | |
1399 do { /* 20 cycles */ | |
1400 vis_ld64_2(ref, stride, TMP0); | |
1401 vis_pmerge(ZERO, REF_2, TMP8); | |
1402 vis_mul8x16au(REF_2_1, CONST_256, TMP10); | |
1403 | |
1404 vis_ld64_2(ref, stride_8, TMP2); | |
1405 ref += stride; | |
1406 | |
1407 vis_ld64(dest[0], DST_0); | |
1408 | |
1409 vis_ld64_2(dest, stride, DST_2); | |
1410 vis_faligndata(TMP0, TMP2, REF_0); | |
1411 | |
1412 vis_ld64_2(ref, stride, TMP4); | |
1413 vis_mul8x16al(DST_0, CONST_512, TMP16); | |
1414 vis_pmerge(ZERO, REF_0, TMP12); | |
1415 | |
1416 vis_ld64_2(ref, stride_8, TMP6); | |
1417 ref += stride; | |
1418 vis_mul8x16al(DST_1, CONST_512, TMP18); | |
1419 vis_pmerge(ZERO, REF_0_1, TMP14); | |
1420 | |
1421 vis_padd16(TMP12, CONST_3, TMP12); | |
1422 vis_mul8x16al(DST_2, CONST_512, TMP24); | |
1423 | |
1424 vis_padd16(TMP14, CONST_3, TMP14); | |
1425 vis_mul8x16al(DST_3, CONST_512, TMP26); | |
1426 | |
1427 vis_faligndata(TMP4, TMP6, REF_2); | |
1428 | |
1429 vis_padd16(TMP8, TMP12, TMP8); | |
1430 | |
1431 vis_padd16(TMP10, TMP14, TMP10); | |
1432 vis_mul8x16au(REF_2, CONST_256, TMP20); | |
1433 | |
1434 vis_padd16(TMP8, TMP16, TMP0); | |
1435 vis_mul8x16au(REF_2_1, CONST_256, TMP22); | |
1436 | |
1437 vis_padd16(TMP10, TMP18, TMP2); | |
1438 vis_pack16(TMP0, DST_0); | |
1439 | |
1440 vis_pack16(TMP2, DST_1); | |
1441 vis_st64(DST_0, dest[0]); | |
1442 dest += stride; | |
1443 vis_padd16(TMP12, TMP20, TMP12); | |
1444 | |
1445 vis_padd16(TMP14, TMP22, TMP14); | |
1446 | |
1447 vis_padd16(TMP12, TMP24, TMP0); | |
1448 | |
1449 vis_padd16(TMP14, TMP26, TMP2); | |
1450 vis_pack16(TMP0, DST_2); | |
1451 | |
1452 vis_pack16(TMP2, DST_3); | |
1453 vis_st64(DST_2, dest[0]); | |
1454 dest += stride; | |
1455 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1456 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1457 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1458 static void MC_put_xy_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 1459 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1460 { |
2979 | 1461 uint8_t *ref = (uint8_t *) _ref; |
1462 unsigned long off = (unsigned long) ref & 0x7; | |
1463 unsigned long off_plus_1 = off + 1; | |
1464 int stride_8 = stride + 8; | |
1465 int stride_16 = stride + 16; | |
1466 | |
1467 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
1468 | |
1469 ref = vis_alignaddr(ref); | |
1470 | |
1471 vis_ld64(ref[ 0], TMP0); | |
1472 vis_fzero(ZERO); | |
1473 | |
1474 vis_ld64(ref[ 8], TMP2); | |
1475 | |
1476 vis_ld64(ref[16], TMP4); | |
1477 | |
1478 vis_ld64(constants2[0], CONST_2); | |
1479 vis_faligndata(TMP0, TMP2, REF_S0); | |
1480 | |
1481 vis_ld64(constants256_512[0], CONST_256); | |
1482 vis_faligndata(TMP2, TMP4, REF_S4); | |
1483 | |
1484 if (off != 0x7) { | |
1485 vis_alignaddr_g0((void *)off_plus_1); | |
1486 vis_faligndata(TMP0, TMP2, REF_S2); | |
1487 vis_faligndata(TMP2, TMP4, REF_S6); | |
1488 } else { | |
1489 vis_src1(TMP2, REF_S2); | |
1490 vis_src1(TMP4, REF_S6); | |
1491 } | |
1492 | |
1493 height >>= 1; | |
1494 do { | |
1495 vis_ld64_2(ref, stride, TMP0); | |
1496 vis_mul8x16au(REF_S0, CONST_256, TMP12); | |
1497 vis_pmerge(ZERO, REF_S0_1, TMP14); | |
1498 | |
1499 vis_alignaddr_g0((void *)off); | |
1500 | |
1501 vis_ld64_2(ref, stride_8, TMP2); | |
1502 vis_mul8x16au(REF_S2, CONST_256, TMP16); | |
1503 vis_pmerge(ZERO, REF_S2_1, TMP18); | |
1504 | |
1505 vis_ld64_2(ref, stride_16, TMP4); | |
1506 ref += stride; | |
1507 vis_mul8x16au(REF_S4, CONST_256, TMP20); | |
1508 vis_pmerge(ZERO, REF_S4_1, TMP22); | |
1509 | |
1510 vis_ld64_2(ref, stride, TMP6); | |
1511 vis_mul8x16au(REF_S6, CONST_256, TMP24); | |
1512 vis_pmerge(ZERO, REF_S6_1, TMP26); | |
1513 | |
1514 vis_ld64_2(ref, stride_8, TMP8); | |
1515 vis_faligndata(TMP0, TMP2, REF_0); | |
1516 | |
1517 vis_ld64_2(ref, stride_16, TMP10); | |
1518 ref += stride; | |
1519 vis_faligndata(TMP2, TMP4, REF_4); | |
1520 | |
1521 vis_faligndata(TMP6, TMP8, REF_S0); | |
1522 | |
1523 vis_faligndata(TMP8, TMP10, REF_S4); | |
1524 | |
1525 if (off != 0x7) { | |
1526 vis_alignaddr_g0((void *)off_plus_1); | |
1527 vis_faligndata(TMP0, TMP2, REF_2); | |
1528 vis_faligndata(TMP2, TMP4, REF_6); | |
1529 vis_faligndata(TMP6, TMP8, REF_S2); | |
1530 vis_faligndata(TMP8, TMP10, REF_S6); | |
1531 } else { | |
1532 vis_src1(TMP2, REF_2); | |
1533 vis_src1(TMP4, REF_6); | |
1534 vis_src1(TMP8, REF_S2); | |
1535 vis_src1(TMP10, REF_S6); | |
1536 } | |
1537 | |
1538 vis_mul8x16au(REF_0, CONST_256, TMP0); | |
1539 vis_pmerge(ZERO, REF_0_1, TMP2); | |
1540 | |
1541 vis_mul8x16au(REF_2, CONST_256, TMP4); | |
1542 vis_pmerge(ZERO, REF_2_1, TMP6); | |
1543 | |
1544 vis_padd16(TMP0, CONST_2, TMP8); | |
1545 vis_mul8x16au(REF_4, CONST_256, TMP0); | |
1546 | |
1547 vis_padd16(TMP2, CONST_2, TMP10); | |
1548 vis_mul8x16au(REF_4_1, CONST_256, TMP2); | |
1549 | |
1550 vis_padd16(TMP8, TMP4, TMP8); | |
1551 vis_mul8x16au(REF_6, CONST_256, TMP4); | |
1552 | |
1553 vis_padd16(TMP10, TMP6, TMP10); | |
1554 vis_mul8x16au(REF_6_1, CONST_256, TMP6); | |
1555 | |
1556 vis_padd16(TMP12, TMP8, TMP12); | |
1557 | |
1558 vis_padd16(TMP14, TMP10, TMP14); | |
1559 | |
1560 vis_padd16(TMP12, TMP16, TMP12); | |
1561 | |
1562 vis_padd16(TMP14, TMP18, TMP14); | |
1563 vis_pack16(TMP12, DST_0); | |
1564 | |
1565 vis_pack16(TMP14, DST_1); | |
1566 vis_st64(DST_0, dest[0]); | |
1567 vis_padd16(TMP0, CONST_2, TMP12); | |
1568 | |
1569 vis_mul8x16au(REF_S0, CONST_256, TMP0); | |
1570 vis_padd16(TMP2, CONST_2, TMP14); | |
1571 | |
1572 vis_mul8x16au(REF_S0_1, CONST_256, TMP2); | |
1573 vis_padd16(TMP12, TMP4, TMP12); | |
1574 | |
1575 vis_mul8x16au(REF_S2, CONST_256, TMP4); | |
1576 vis_padd16(TMP14, TMP6, TMP14); | |
1577 | |
1578 vis_mul8x16au(REF_S2_1, CONST_256, TMP6); | |
1579 vis_padd16(TMP20, TMP12, TMP20); | |
1580 | |
1581 vis_padd16(TMP22, TMP14, TMP22); | |
1582 | |
1583 vis_padd16(TMP20, TMP24, TMP20); | |
1584 | |
1585 vis_padd16(TMP22, TMP26, TMP22); | |
1586 vis_pack16(TMP20, DST_2); | |
1587 | |
1588 vis_pack16(TMP22, DST_3); | |
1589 vis_st64_2(DST_2, dest, 8); | |
1590 dest += stride; | |
1591 vis_padd16(TMP0, TMP4, TMP24); | |
1592 | |
1593 vis_mul8x16au(REF_S4, CONST_256, TMP0); | |
1594 vis_padd16(TMP2, TMP6, TMP26); | |
1595 | |
1596 vis_mul8x16au(REF_S4_1, CONST_256, TMP2); | |
1597 vis_padd16(TMP24, TMP8, TMP24); | |
1598 | |
1599 vis_padd16(TMP26, TMP10, TMP26); | |
1600 vis_pack16(TMP24, DST_0); | |
1601 | |
1602 vis_pack16(TMP26, DST_1); | |
1603 vis_st64(DST_0, dest[0]); | |
1604 vis_pmerge(ZERO, REF_S6, TMP4); | |
1605 | |
1606 vis_pmerge(ZERO, REF_S6_1, TMP6); | |
1607 | |
1608 vis_padd16(TMP0, TMP4, TMP0); | |
1609 | |
1610 vis_padd16(TMP2, TMP6, TMP2); | |
1611 | |
1612 vis_padd16(TMP0, TMP12, TMP0); | |
1613 | |
1614 vis_padd16(TMP2, TMP14, TMP2); | |
1615 vis_pack16(TMP0, DST_2); | |
1616 | |
1617 vis_pack16(TMP2, DST_3); | |
1618 vis_st64_2(DST_2, dest, 8); | |
1619 dest += stride; | |
1620 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1621 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1622 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1623 static void MC_put_xy_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 1624 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1625 { |
2979 | 1626 uint8_t *ref = (uint8_t *) _ref; |
1627 unsigned long off = (unsigned long) ref & 0x7; | |
1628 unsigned long off_plus_1 = off + 1; | |
1629 int stride_8 = stride + 8; | |
1630 | |
1631 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
1632 | |
1633 ref = vis_alignaddr(ref); | |
1634 | |
1635 vis_ld64(ref[ 0], TMP0); | |
1636 vis_fzero(ZERO); | |
1637 | |
1638 vis_ld64(ref[ 8], TMP2); | |
1639 | |
1640 vis_ld64(constants2[0], CONST_2); | |
1641 | |
1642 vis_ld64(constants256_512[0], CONST_256); | |
1643 vis_faligndata(TMP0, TMP2, REF_S0); | |
1644 | |
1645 if (off != 0x7) { | |
1646 vis_alignaddr_g0((void *)off_plus_1); | |
1647 vis_faligndata(TMP0, TMP2, REF_S2); | |
1648 } else { | |
1649 vis_src1(TMP2, REF_S2); | |
1650 } | |
1651 | |
1652 height >>= 1; | |
1653 do { /* 26 cycles */ | |
1654 vis_ld64_2(ref, stride, TMP0); | |
1655 vis_mul8x16au(REF_S0, CONST_256, TMP8); | |
1656 vis_pmerge(ZERO, REF_S2, TMP12); | |
1657 | |
1658 vis_alignaddr_g0((void *)off); | |
1659 | |
1660 vis_ld64_2(ref, stride_8, TMP2); | |
1661 ref += stride; | |
1662 vis_mul8x16au(REF_S0_1, CONST_256, TMP10); | |
1663 vis_pmerge(ZERO, REF_S2_1, TMP14); | |
1664 | |
1665 vis_ld64_2(ref, stride, TMP4); | |
1666 | |
1667 vis_ld64_2(ref, stride_8, TMP6); | |
1668 ref += stride; | |
1669 vis_faligndata(TMP0, TMP2, REF_S4); | |
1670 | |
1671 vis_pmerge(ZERO, REF_S4, TMP18); | |
1672 | |
1673 vis_pmerge(ZERO, REF_S4_1, TMP20); | |
1674 | |
1675 vis_faligndata(TMP4, TMP6, REF_S0); | |
1676 | |
1677 if (off != 0x7) { | |
1678 vis_alignaddr_g0((void *)off_plus_1); | |
1679 vis_faligndata(TMP0, TMP2, REF_S6); | |
1680 vis_faligndata(TMP4, TMP6, REF_S2); | |
1681 } else { | |
1682 vis_src1(TMP2, REF_S6); | |
1683 vis_src1(TMP6, REF_S2); | |
1684 } | |
1685 | |
1686 vis_padd16(TMP18, CONST_2, TMP18); | |
1687 vis_mul8x16au(REF_S6, CONST_256, TMP22); | |
1688 | |
1689 vis_padd16(TMP20, CONST_2, TMP20); | |
1690 vis_mul8x16au(REF_S6_1, CONST_256, TMP24); | |
1691 | |
1692 vis_mul8x16au(REF_S0, CONST_256, TMP26); | |
1693 vis_pmerge(ZERO, REF_S0_1, TMP28); | |
1694 | |
1695 vis_mul8x16au(REF_S2, CONST_256, TMP30); | |
1696 vis_padd16(TMP18, TMP22, TMP18); | |
1697 | |
1698 vis_mul8x16au(REF_S2_1, CONST_256, TMP32); | |
1699 vis_padd16(TMP20, TMP24, TMP20); | |
1700 | |
1701 vis_padd16(TMP8, TMP18, TMP8); | |
1702 | |
1703 vis_padd16(TMP10, TMP20, TMP10); | |
1704 | |
1705 vis_padd16(TMP8, TMP12, TMP8); | |
1706 | |
1707 vis_padd16(TMP10, TMP14, TMP10); | |
1708 vis_pack16(TMP8, DST_0); | |
1709 | |
1710 vis_pack16(TMP10, DST_1); | |
1711 vis_st64(DST_0, dest[0]); | |
1712 dest += stride; | |
1713 vis_padd16(TMP18, TMP26, TMP18); | |
1714 | |
1715 vis_padd16(TMP20, TMP28, TMP20); | |
1716 | |
1717 vis_padd16(TMP18, TMP30, TMP18); | |
1718 | |
1719 vis_padd16(TMP20, TMP32, TMP20); | |
1720 vis_pack16(TMP18, DST_2); | |
1721 | |
1722 vis_pack16(TMP20, DST_3); | |
1723 vis_st64(DST_2, dest[0]); | |
1724 dest += stride; | |
1725 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1726 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1727 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1728 static void MC_avg_xy_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 1729 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1730 { |
2979 | 1731 uint8_t *ref = (uint8_t *) _ref; |
1732 unsigned long off = (unsigned long) ref & 0x7; | |
1733 unsigned long off_plus_1 = off + 1; | |
1734 int stride_8 = stride + 8; | |
1735 int stride_16 = stride + 16; | |
1736 | |
1737 vis_set_gsr(4 << VIS_GSR_SCALEFACT_SHIFT); | |
1738 | |
1739 ref = vis_alignaddr(ref); | |
1740 | |
1741 vis_ld64(ref[ 0], TMP0); | |
1742 vis_fzero(ZERO); | |
1743 | |
1744 vis_ld64(ref[ 8], TMP2); | |
1745 | |
1746 vis_ld64(ref[16], TMP4); | |
1747 | |
1748 vis_ld64(constants6[0], CONST_6); | |
1749 vis_faligndata(TMP0, TMP2, REF_S0); | |
1750 | |
1751 vis_ld64(constants256_1024[0], CONST_256); | |
1752 vis_faligndata(TMP2, TMP4, REF_S4); | |
1753 | |
1754 if (off != 0x7) { | |
1755 vis_alignaddr_g0((void *)off_plus_1); | |
1756 vis_faligndata(TMP0, TMP2, REF_S2); | |
1757 vis_faligndata(TMP2, TMP4, REF_S6); | |
1758 } else { | |
1759 vis_src1(TMP2, REF_S2); | |
1760 vis_src1(TMP4, REF_S6); | |
1761 } | |
1762 | |
1763 height >>= 1; | |
1764 do { /* 55 cycles */ | |
1765 vis_ld64_2(ref, stride, TMP0); | |
1766 vis_mul8x16au(REF_S0, CONST_256, TMP12); | |
1767 vis_pmerge(ZERO, REF_S0_1, TMP14); | |
1768 | |
1769 vis_alignaddr_g0((void *)off); | |
1770 | |
1771 vis_ld64_2(ref, stride_8, TMP2); | |
1772 vis_mul8x16au(REF_S2, CONST_256, TMP16); | |
1773 vis_pmerge(ZERO, REF_S2_1, TMP18); | |
1774 | |
1775 vis_ld64_2(ref, stride_16, TMP4); | |
1776 ref += stride; | |
1777 vis_mul8x16au(REF_S4, CONST_256, TMP20); | |
1778 vis_pmerge(ZERO, REF_S4_1, TMP22); | |
1779 | |
1780 vis_ld64_2(ref, stride, TMP6); | |
1781 vis_mul8x16au(REF_S6, CONST_256, TMP24); | |
1782 vis_pmerge(ZERO, REF_S6_1, TMP26); | |
1783 | |
1784 vis_ld64_2(ref, stride_8, TMP8); | |
1785 vis_faligndata(TMP0, TMP2, REF_0); | |
1786 | |
1787 vis_ld64_2(ref, stride_16, TMP10); | |
1788 ref += stride; | |
1789 vis_faligndata(TMP2, TMP4, REF_4); | |
1790 | |
1791 vis_ld64(dest[0], DST_0); | |
1792 vis_faligndata(TMP6, TMP8, REF_S0); | |
1793 | |
1794 vis_ld64_2(dest, 8, DST_2); | |
1795 vis_faligndata(TMP8, TMP10, REF_S4); | |
1796 | |
1797 if (off != 0x7) { | |
1798 vis_alignaddr_g0((void *)off_plus_1); | |
1799 vis_faligndata(TMP0, TMP2, REF_2); | |
1800 vis_faligndata(TMP2, TMP4, REF_6); | |
1801 vis_faligndata(TMP6, TMP8, REF_S2); | |
1802 vis_faligndata(TMP8, TMP10, REF_S6); | |
1803 } else { | |
1804 vis_src1(TMP2, REF_2); | |
1805 vis_src1(TMP4, REF_6); | |
1806 vis_src1(TMP8, REF_S2); | |
1807 vis_src1(TMP10, REF_S6); | |
1808 } | |
1809 | |
1810 vis_mul8x16al(DST_0, CONST_1024, TMP30); | |
1811 vis_pmerge(ZERO, REF_0, TMP0); | |
1812 | |
1813 vis_mul8x16al(DST_1, CONST_1024, TMP32); | |
1814 vis_pmerge(ZERO, REF_0_1, TMP2); | |
1815 | |
1816 vis_mul8x16au(REF_2, CONST_256, TMP4); | |
1817 vis_pmerge(ZERO, REF_2_1, TMP6); | |
1818 | |
1819 vis_mul8x16al(DST_2, CONST_1024, REF_0); | |
1820 vis_padd16(TMP0, CONST_6, TMP0); | |
1821 | |
1822 vis_mul8x16al(DST_3, CONST_1024, REF_2); | |
1823 vis_padd16(TMP2, CONST_6, TMP2); | |
1824 | |
1825 vis_padd16(TMP0, TMP4, TMP0); | |
1826 vis_mul8x16au(REF_4, CONST_256, TMP4); | |
1827 | |
1828 vis_padd16(TMP2, TMP6, TMP2); | |
1829 vis_mul8x16au(REF_4_1, CONST_256, TMP6); | |
1830 | |
1831 vis_padd16(TMP12, TMP0, TMP12); | |
1832 vis_mul8x16au(REF_6, CONST_256, TMP8); | |
1833 | |
1834 vis_padd16(TMP14, TMP2, TMP14); | |
1835 vis_mul8x16au(REF_6_1, CONST_256, TMP10); | |
1836 | |
1837 vis_padd16(TMP12, TMP16, TMP12); | |
1838 vis_mul8x16au(REF_S0, CONST_256, REF_4); | |
1839 | |
1840 vis_padd16(TMP14, TMP18, TMP14); | |
1841 vis_mul8x16au(REF_S0_1, CONST_256, REF_6); | |
1842 | |
1843 vis_padd16(TMP12, TMP30, TMP12); | |
1844 | |
1845 vis_padd16(TMP14, TMP32, TMP14); | |
1846 vis_pack16(TMP12, DST_0); | |
1847 | |
1848 vis_pack16(TMP14, DST_1); | |
1849 vis_st64(DST_0, dest[0]); | |
1850 vis_padd16(TMP4, CONST_6, TMP4); | |
1851 | |
1852 vis_ld64_2(dest, stride, DST_0); | |
1853 vis_padd16(TMP6, CONST_6, TMP6); | |
1854 vis_mul8x16au(REF_S2, CONST_256, TMP12); | |
1855 | |
1856 vis_padd16(TMP4, TMP8, TMP4); | |
1857 vis_mul8x16au(REF_S2_1, CONST_256, TMP14); | |
1858 | |
1859 vis_padd16(TMP6, TMP10, TMP6); | |
1860 | |
1861 vis_padd16(TMP20, TMP4, TMP20); | |
1862 | |
1863 vis_padd16(TMP22, TMP6, TMP22); | |
1864 | |
1865 vis_padd16(TMP20, TMP24, TMP20); | |
1866 | |
1867 vis_padd16(TMP22, TMP26, TMP22); | |
1868 | |
1869 vis_padd16(TMP20, REF_0, TMP20); | |
1870 vis_mul8x16au(REF_S4, CONST_256, REF_0); | |
1871 | |
1872 vis_padd16(TMP22, REF_2, TMP22); | |
1873 vis_pack16(TMP20, DST_2); | |
1874 | |
1875 vis_pack16(TMP22, DST_3); | |
1876 vis_st64_2(DST_2, dest, 8); | |
1877 dest += stride; | |
1878 | |
1879 vis_ld64_2(dest, 8, DST_2); | |
1880 vis_mul8x16al(DST_0, CONST_1024, TMP30); | |
1881 vis_pmerge(ZERO, REF_S4_1, REF_2); | |
1882 | |
1883 vis_mul8x16al(DST_1, CONST_1024, TMP32); | |
1884 vis_padd16(REF_4, TMP0, TMP8); | |
1885 | |
1886 vis_mul8x16au(REF_S6, CONST_256, REF_4); | |
1887 vis_padd16(REF_6, TMP2, TMP10); | |
1888 | |
1889 vis_mul8x16au(REF_S6_1, CONST_256, REF_6); | |
1890 vis_padd16(TMP8, TMP12, TMP8); | |
1891 | |
1892 vis_padd16(TMP10, TMP14, TMP10); | |
1893 | |
1894 vis_padd16(TMP8, TMP30, TMP8); | |
1895 | |
1896 vis_padd16(TMP10, TMP32, TMP10); | |
1897 vis_pack16(TMP8, DST_0); | |
1898 | |
1899 vis_pack16(TMP10, DST_1); | |
1900 vis_st64(DST_0, dest[0]); | |
1901 | |
1902 vis_padd16(REF_0, TMP4, REF_0); | |
1903 | |
1904 vis_mul8x16al(DST_2, CONST_1024, TMP30); | |
1905 vis_padd16(REF_2, TMP6, REF_2); | |
1906 | |
1907 vis_mul8x16al(DST_3, CONST_1024, TMP32); | |
1908 vis_padd16(REF_0, REF_4, REF_0); | |
1909 | |
1910 vis_padd16(REF_2, REF_6, REF_2); | |
1911 | |
1912 vis_padd16(REF_0, TMP30, REF_0); | |
1913 | |
1914 /* stall */ | |
1915 | |
1916 vis_padd16(REF_2, TMP32, REF_2); | |
1917 vis_pack16(REF_0, DST_2); | |
1918 | |
1919 vis_pack16(REF_2, DST_3); | |
1920 vis_st64_2(DST_2, dest, 8); | |
1921 dest += stride; | |
1922 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1923 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1924 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1925 static void MC_avg_xy_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 1926 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
1927 { |
2979 | 1928 uint8_t *ref = (uint8_t *) _ref; |
1929 unsigned long off = (unsigned long) ref & 0x7; | |
1930 unsigned long off_plus_1 = off + 1; | |
1931 int stride_8 = stride + 8; | |
1932 | |
1933 vis_set_gsr(4 << VIS_GSR_SCALEFACT_SHIFT); | |
1934 | |
1935 ref = vis_alignaddr(ref); | |
1936 | |
1937 vis_ld64(ref[0], TMP0); | |
1938 vis_fzero(ZERO); | |
1939 | |
1940 vis_ld64_2(ref, 8, TMP2); | |
1941 | |
1942 vis_ld64(constants6[0], CONST_6); | |
1943 | |
1944 vis_ld64(constants256_1024[0], CONST_256); | |
1945 vis_faligndata(TMP0, TMP2, REF_S0); | |
1946 | |
1947 if (off != 0x7) { | |
1948 vis_alignaddr_g0((void *)off_plus_1); | |
1949 vis_faligndata(TMP0, TMP2, REF_S2); | |
1950 } else { | |
1951 vis_src1(TMP2, REF_S2); | |
1952 } | |
1953 | |
1954 height >>= 1; | |
1955 do { /* 31 cycles */ | |
1956 vis_ld64_2(ref, stride, TMP0); | |
1957 vis_mul8x16au(REF_S0, CONST_256, TMP8); | |
1958 vis_pmerge(ZERO, REF_S0_1, TMP10); | |
1959 | |
1960 vis_ld64_2(ref, stride_8, TMP2); | |
1961 ref += stride; | |
1962 vis_mul8x16au(REF_S2, CONST_256, TMP12); | |
1963 vis_pmerge(ZERO, REF_S2_1, TMP14); | |
1964 | |
1965 vis_alignaddr_g0((void *)off); | |
1966 | |
1967 vis_ld64_2(ref, stride, TMP4); | |
1968 vis_faligndata(TMP0, TMP2, REF_S4); | |
1969 | |
1970 vis_ld64_2(ref, stride_8, TMP6); | |
1971 ref += stride; | |
1972 | |
1973 vis_ld64(dest[0], DST_0); | |
1974 vis_faligndata(TMP4, TMP6, REF_S0); | |
1975 | |
1976 vis_ld64_2(dest, stride, DST_2); | |
1977 | |
1978 if (off != 0x7) { | |
1979 vis_alignaddr_g0((void *)off_plus_1); | |
1980 vis_faligndata(TMP0, TMP2, REF_S6); | |
1981 vis_faligndata(TMP4, TMP6, REF_S2); | |
1982 } else { | |
1983 vis_src1(TMP2, REF_S6); | |
1984 vis_src1(TMP6, REF_S2); | |
1985 } | |
1986 | |
1987 vis_mul8x16al(DST_0, CONST_1024, TMP30); | |
1988 vis_pmerge(ZERO, REF_S4, TMP22); | |
1989 | |
1990 vis_mul8x16al(DST_1, CONST_1024, TMP32); | |
1991 vis_pmerge(ZERO, REF_S4_1, TMP24); | |
1992 | |
1993 vis_mul8x16au(REF_S6, CONST_256, TMP26); | |
1994 vis_pmerge(ZERO, REF_S6_1, TMP28); | |
1995 | |
1996 vis_mul8x16au(REF_S0, CONST_256, REF_S4); | |
1997 vis_padd16(TMP22, CONST_6, TMP22); | |
1998 | |
1999 vis_mul8x16au(REF_S0_1, CONST_256, REF_S6); | |
2000 vis_padd16(TMP24, CONST_6, TMP24); | |
2001 | |
2002 vis_mul8x16al(DST_2, CONST_1024, REF_0); | |
2003 vis_padd16(TMP22, TMP26, TMP22); | |
2004 | |
2005 vis_mul8x16al(DST_3, CONST_1024, REF_2); | |
2006 vis_padd16(TMP24, TMP28, TMP24); | |
2007 | |
2008 vis_mul8x16au(REF_S2, CONST_256, TMP26); | |
2009 vis_padd16(TMP8, TMP22, TMP8); | |
2010 | |
2011 vis_mul8x16au(REF_S2_1, CONST_256, TMP28); | |
2012 vis_padd16(TMP10, TMP24, TMP10); | |
2013 | |
2014 vis_padd16(TMP8, TMP12, TMP8); | |
2015 | |
2016 vis_padd16(TMP10, TMP14, TMP10); | |
2017 | |
2018 vis_padd16(TMP8, TMP30, TMP8); | |
2019 | |
2020 vis_padd16(TMP10, TMP32, TMP10); | |
2021 vis_pack16(TMP8, DST_0); | |
2022 | |
2023 vis_pack16(TMP10, DST_1); | |
2024 vis_st64(DST_0, dest[0]); | |
2025 dest += stride; | |
2026 | |
2027 vis_padd16(REF_S4, TMP22, TMP12); | |
2028 | |
2029 vis_padd16(REF_S6, TMP24, TMP14); | |
2030 | |
2031 vis_padd16(TMP12, TMP26, TMP12); | |
2032 | |
2033 vis_padd16(TMP14, TMP28, TMP14); | |
2034 | |
2035 vis_padd16(TMP12, REF_0, TMP12); | |
2036 | |
2037 vis_padd16(TMP14, REF_2, TMP14); | |
2038 vis_pack16(TMP12, DST_2); | |
2039 | |
2040 vis_pack16(TMP14, DST_3); | |
2041 vis_st64(DST_2, dest[0]); | |
2042 dest += stride; | |
2043 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2044 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2045 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2046 /* End of rounding code */ |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2047 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2048 /* Start of no rounding code */ |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2049 /* The trick used in some of this file is the formula from the MMX |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2050 * motion comp code, which is: |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2051 * |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2052 * (x+y)>>1 == (x&y)+((x^y)>>1) |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2053 * |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2054 * This allows us to average 8 bytes at a time in a 64-bit FPU reg. |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2055 * We avoid overflows by masking before we do the shift, and we |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2056 * implement the shift by multiplying by 1/2 using mul8x16. So in |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2057 * VIS this is (assume 'x' is in f0, 'y' is in f2, a repeating mask |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2058 * of '0xfe' is in f4, a repeating mask of '0x7f' is in f6, and |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2059 * the value 0x80808080 is in f8): |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2060 * |
2979 | 2061 * fxor f0, f2, f10 |
2062 * fand f10, f4, f10 | |
2063 * fmul8x16 f8, f10, f10 | |
2064 * fand f10, f6, f10 | |
2065 * fand f0, f2, f12 | |
2066 * fpadd16 f12, f10, f10 | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2067 */ |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2068 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2069 static void MC_put_no_round_o_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2070 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2071 { |
2979 | 2072 uint8_t *ref = (uint8_t *) _ref; |
2073 | |
2074 ref = vis_alignaddr(ref); | |
2075 do { /* 5 cycles */ | |
2076 vis_ld64(ref[0], TMP0); | |
2077 | |
2078 vis_ld64_2(ref, 8, TMP2); | |
2079 | |
2080 vis_ld64_2(ref, 16, TMP4); | |
2081 ref += stride; | |
2082 | |
2083 vis_faligndata(TMP0, TMP2, REF_0); | |
2084 vis_st64(REF_0, dest[0]); | |
2085 | |
2086 vis_faligndata(TMP2, TMP4, REF_2); | |
2087 vis_st64_2(REF_2, dest, 8); | |
2088 dest += stride; | |
2089 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2090 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2091 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2092 static void MC_put_no_round_o_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2093 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2094 { |
2979 | 2095 uint8_t *ref = (uint8_t *) _ref; |
2096 | |
2097 ref = vis_alignaddr(ref); | |
2098 do { /* 4 cycles */ | |
2099 vis_ld64(ref[0], TMP0); | |
2100 | |
2101 vis_ld64(ref[8], TMP2); | |
2102 ref += stride; | |
2103 | |
2104 /* stall */ | |
2105 | |
2106 vis_faligndata(TMP0, TMP2, REF_0); | |
2107 vis_st64(REF_0, dest[0]); | |
2108 dest += stride; | |
2109 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2110 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2111 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2112 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2113 static void MC_avg_no_round_o_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2114 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2115 { |
2979 | 2116 uint8_t *ref = (uint8_t *) _ref; |
2117 int stride_8 = stride + 8; | |
2118 | |
2119 ref = vis_alignaddr(ref); | |
2120 | |
2121 vis_ld64(ref[0], TMP0); | |
2122 | |
2123 vis_ld64(ref[8], TMP2); | |
2124 | |
2125 vis_ld64(ref[16], TMP4); | |
2126 | |
2127 vis_ld64(dest[0], DST_0); | |
2128 | |
2129 vis_ld64(dest[8], DST_2); | |
2130 | |
2131 vis_ld64(constants_fe[0], MASK_fe); | |
2132 vis_faligndata(TMP0, TMP2, REF_0); | |
2133 | |
2134 vis_ld64(constants_7f[0], MASK_7f); | |
2135 vis_faligndata(TMP2, TMP4, REF_2); | |
2136 | |
2137 vis_ld64(constants128[0], CONST_128); | |
2138 | |
2139 ref += stride; | |
2140 height = (height >> 1) - 1; | |
2141 | |
2142 do { /* 24 cycles */ | |
2143 vis_ld64(ref[0], TMP0); | |
2144 vis_xor(DST_0, REF_0, TMP6); | |
2145 | |
2146 vis_ld64_2(ref, 8, TMP2); | |
2147 vis_and(TMP6, MASK_fe, TMP6); | |
2148 | |
2149 vis_ld64_2(ref, 16, TMP4); | |
2150 ref += stride; | |
2151 vis_mul8x16(CONST_128, TMP6, TMP6); | |
2152 vis_xor(DST_2, REF_2, TMP8); | |
2153 | |
2154 vis_and(TMP8, MASK_fe, TMP8); | |
2155 | |
2156 vis_and(DST_0, REF_0, TMP10); | |
2157 vis_ld64_2(dest, stride, DST_0); | |
2158 vis_mul8x16(CONST_128, TMP8, TMP8); | |
2159 | |
2160 vis_and(DST_2, REF_2, TMP12); | |
2161 vis_ld64_2(dest, stride_8, DST_2); | |
2162 | |
2163 vis_ld64(ref[0], TMP14); | |
2164 vis_and(TMP6, MASK_7f, TMP6); | |
2165 | |
2166 vis_and(TMP8, MASK_7f, TMP8); | |
2167 | |
2168 vis_padd16(TMP10, TMP6, TMP6); | |
2169 vis_st64(TMP6, dest[0]); | |
2170 | |
2171 vis_padd16(TMP12, TMP8, TMP8); | |
2172 vis_st64_2(TMP8, dest, 8); | |
2173 | |
2174 dest += stride; | |
2175 vis_ld64_2(ref, 8, TMP16); | |
2176 vis_faligndata(TMP0, TMP2, REF_0); | |
2177 | |
2178 vis_ld64_2(ref, 16, TMP18); | |
2179 vis_faligndata(TMP2, TMP4, REF_2); | |
2180 ref += stride; | |
2181 | |
2182 vis_xor(DST_0, REF_0, TMP20); | |
2183 | |
2184 vis_and(TMP20, MASK_fe, TMP20); | |
2185 | |
2186 vis_xor(DST_2, REF_2, TMP22); | |
2187 vis_mul8x16(CONST_128, TMP20, TMP20); | |
2188 | |
2189 vis_and(TMP22, MASK_fe, TMP22); | |
2190 | |
2191 vis_and(DST_0, REF_0, TMP24); | |
2192 vis_mul8x16(CONST_128, TMP22, TMP22); | |
2193 | |
2194 vis_and(DST_2, REF_2, TMP26); | |
2195 | |
2196 vis_ld64_2(dest, stride, DST_0); | |
2197 vis_faligndata(TMP14, TMP16, REF_0); | |
2198 | |
2199 vis_ld64_2(dest, stride_8, DST_2); | |
2200 vis_faligndata(TMP16, TMP18, REF_2); | |
2201 | |
2202 vis_and(TMP20, MASK_7f, TMP20); | |
2203 | |
2204 vis_and(TMP22, MASK_7f, TMP22); | |
2205 | |
2206 vis_padd16(TMP24, TMP20, TMP20); | |
2207 vis_st64(TMP20, dest[0]); | |
2208 | |
2209 vis_padd16(TMP26, TMP22, TMP22); | |
2210 vis_st64_2(TMP22, dest, 8); | |
2211 dest += stride; | |
2212 } while (--height); | |
2213 | |
2214 vis_ld64(ref[0], TMP0); | |
2215 vis_xor(DST_0, REF_0, TMP6); | |
2216 | |
2217 vis_ld64_2(ref, 8, TMP2); | |
2218 vis_and(TMP6, MASK_fe, TMP6); | |
2219 | |
2220 vis_ld64_2(ref, 16, TMP4); | |
2221 vis_mul8x16(CONST_128, TMP6, TMP6); | |
2222 vis_xor(DST_2, REF_2, TMP8); | |
2223 | |
2224 vis_and(TMP8, MASK_fe, TMP8); | |
2225 | |
2226 vis_and(DST_0, REF_0, TMP10); | |
2227 vis_ld64_2(dest, stride, DST_0); | |
2228 vis_mul8x16(CONST_128, TMP8, TMP8); | |
2229 | |
2230 vis_and(DST_2, REF_2, TMP12); | |
2231 vis_ld64_2(dest, stride_8, DST_2); | |
2232 | |
2233 vis_ld64(ref[0], TMP14); | |
2234 vis_and(TMP6, MASK_7f, TMP6); | |
2235 | |
2236 vis_and(TMP8, MASK_7f, TMP8); | |
2237 | |
2238 vis_padd16(TMP10, TMP6, TMP6); | |
2239 vis_st64(TMP6, dest[0]); | |
2240 | |
2241 vis_padd16(TMP12, TMP8, TMP8); | |
2242 vis_st64_2(TMP8, dest, 8); | |
2243 | |
2244 dest += stride; | |
2245 vis_faligndata(TMP0, TMP2, REF_0); | |
2246 | |
2247 vis_faligndata(TMP2, TMP4, REF_2); | |
2248 | |
2249 vis_xor(DST_0, REF_0, TMP20); | |
2250 | |
2251 vis_and(TMP20, MASK_fe, TMP20); | |
2252 | |
2253 vis_xor(DST_2, REF_2, TMP22); | |
2254 vis_mul8x16(CONST_128, TMP20, TMP20); | |
2255 | |
2256 vis_and(TMP22, MASK_fe, TMP22); | |
2257 | |
2258 vis_and(DST_0, REF_0, TMP24); | |
2259 vis_mul8x16(CONST_128, TMP22, TMP22); | |
2260 | |
2261 vis_and(DST_2, REF_2, TMP26); | |
2262 | |
2263 vis_and(TMP20, MASK_7f, TMP20); | |
2264 | |
2265 vis_and(TMP22, MASK_7f, TMP22); | |
2266 | |
2267 vis_padd16(TMP24, TMP20, TMP20); | |
2268 vis_st64(TMP20, dest[0]); | |
2269 | |
2270 vis_padd16(TMP26, TMP22, TMP22); | |
2271 vis_st64_2(TMP22, dest, 8); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2272 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2273 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2274 static void MC_avg_no_round_o_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2275 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2276 { |
2979 | 2277 uint8_t *ref = (uint8_t *) _ref; |
2278 | |
2279 ref = vis_alignaddr(ref); | |
2280 | |
2281 vis_ld64(ref[0], TMP0); | |
2282 | |
2283 vis_ld64(ref[8], TMP2); | |
2284 | |
2285 vis_ld64(dest[0], DST_0); | |
2286 | |
2287 vis_ld64(constants_fe[0], MASK_fe); | |
2288 | |
2289 vis_ld64(constants_7f[0], MASK_7f); | |
2290 vis_faligndata(TMP0, TMP2, REF_0); | |
2291 | |
2292 vis_ld64(constants128[0], CONST_128); | |
2293 | |
2294 ref += stride; | |
2295 height = (height >> 1) - 1; | |
2296 | |
2297 do { /* 12 cycles */ | |
2298 vis_ld64(ref[0], TMP0); | |
2299 vis_xor(DST_0, REF_0, TMP4); | |
2300 | |
2301 vis_ld64(ref[8], TMP2); | |
2302 vis_and(TMP4, MASK_fe, TMP4); | |
2303 | |
2304 vis_and(DST_0, REF_0, TMP6); | |
2305 vis_ld64_2(dest, stride, DST_0); | |
2306 ref += stride; | |
2307 vis_mul8x16(CONST_128, TMP4, TMP4); | |
2308 | |
2309 vis_ld64(ref[0], TMP12); | |
2310 vis_faligndata(TMP0, TMP2, REF_0); | |
2311 | |
2312 vis_ld64(ref[8], TMP2); | |
2313 vis_xor(DST_0, REF_0, TMP0); | |
2314 ref += stride; | |
2315 | |
2316 vis_and(TMP0, MASK_fe, TMP0); | |
2317 | |
2318 vis_and(TMP4, MASK_7f, TMP4); | |
2319 | |
2320 vis_padd16(TMP6, TMP4, TMP4); | |
2321 vis_st64(TMP4, dest[0]); | |
2322 dest += stride; | |
2323 vis_mul8x16(CONST_128, TMP0, TMP0); | |
2324 | |
2325 vis_and(DST_0, REF_0, TMP6); | |
2326 vis_ld64_2(dest, stride, DST_0); | |
2327 | |
2328 vis_faligndata(TMP12, TMP2, REF_0); | |
2329 | |
2330 vis_and(TMP0, MASK_7f, TMP0); | |
2331 | |
2332 vis_padd16(TMP6, TMP0, TMP4); | |
2333 vis_st64(TMP4, dest[0]); | |
2334 dest += stride; | |
2335 } while (--height); | |
2336 | |
2337 vis_ld64(ref[0], TMP0); | |
2338 vis_xor(DST_0, REF_0, TMP4); | |
2339 | |
2340 vis_ld64(ref[8], TMP2); | |
2341 vis_and(TMP4, MASK_fe, TMP4); | |
2342 | |
2343 vis_and(DST_0, REF_0, TMP6); | |
2344 vis_ld64_2(dest, stride, DST_0); | |
2345 vis_mul8x16(CONST_128, TMP4, TMP4); | |
2346 | |
2347 vis_faligndata(TMP0, TMP2, REF_0); | |
2348 | |
2349 vis_xor(DST_0, REF_0, TMP0); | |
2350 | |
2351 vis_and(TMP0, MASK_fe, TMP0); | |
2352 | |
2353 vis_and(TMP4, MASK_7f, TMP4); | |
2354 | |
2355 vis_padd16(TMP6, TMP4, TMP4); | |
2356 vis_st64(TMP4, dest[0]); | |
2357 dest += stride; | |
2358 vis_mul8x16(CONST_128, TMP0, TMP0); | |
2359 | |
2360 vis_and(DST_0, REF_0, TMP6); | |
2361 | |
2362 vis_and(TMP0, MASK_7f, TMP0); | |
2363 | |
2364 vis_padd16(TMP6, TMP0, TMP4); | |
2365 vis_st64(TMP4, dest[0]); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2366 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2367 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2368 static void MC_put_no_round_x_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2369 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2370 { |
2979 | 2371 uint8_t *ref = (uint8_t *) _ref; |
2372 unsigned long off = (unsigned long) ref & 0x7; | |
2373 unsigned long off_plus_1 = off + 1; | |
2374 | |
2375 ref = vis_alignaddr(ref); | |
2376 | |
2377 vis_ld64(ref[0], TMP0); | |
2378 | |
2379 vis_ld64_2(ref, 8, TMP2); | |
2380 | |
2381 vis_ld64_2(ref, 16, TMP4); | |
2382 | |
2383 vis_ld64(constants_fe[0], MASK_fe); | |
2384 | |
2385 vis_ld64(constants_7f[0], MASK_7f); | |
2386 vis_faligndata(TMP0, TMP2, REF_0); | |
2387 | |
2388 vis_ld64(constants128[0], CONST_128); | |
2389 vis_faligndata(TMP2, TMP4, REF_4); | |
2390 | |
2391 if (off != 0x7) { | |
2392 vis_alignaddr_g0((void *)off_plus_1); | |
2393 vis_faligndata(TMP0, TMP2, REF_2); | |
2394 vis_faligndata(TMP2, TMP4, REF_6); | |
2395 } else { | |
2396 vis_src1(TMP2, REF_2); | |
2397 vis_src1(TMP4, REF_6); | |
2398 } | |
2399 | |
2400 ref += stride; | |
2401 height = (height >> 1) - 1; | |
2402 | |
2403 do { /* 34 cycles */ | |
2404 vis_ld64(ref[0], TMP0); | |
2405 vis_xor(REF_0, REF_2, TMP6); | |
2406 | |
2407 vis_ld64_2(ref, 8, TMP2); | |
2408 vis_xor(REF_4, REF_6, TMP8); | |
2409 | |
2410 vis_ld64_2(ref, 16, TMP4); | |
2411 vis_and(TMP6, MASK_fe, TMP6); | |
2412 ref += stride; | |
2413 | |
2414 vis_ld64(ref[0], TMP14); | |
2415 vis_mul8x16(CONST_128, TMP6, TMP6); | |
2416 vis_and(TMP8, MASK_fe, TMP8); | |
2417 | |
2418 vis_ld64_2(ref, 8, TMP16); | |
2419 vis_mul8x16(CONST_128, TMP8, TMP8); | |
2420 vis_and(REF_0, REF_2, TMP10); | |
2421 | |
2422 vis_ld64_2(ref, 16, TMP18); | |
2423 ref += stride; | |
2424 vis_and(REF_4, REF_6, TMP12); | |
2425 | |
2426 vis_alignaddr_g0((void *)off); | |
2427 | |
2428 vis_faligndata(TMP0, TMP2, REF_0); | |
2429 | |
2430 vis_faligndata(TMP2, TMP4, REF_4); | |
2431 | |
2432 if (off != 0x7) { | |
2433 vis_alignaddr_g0((void *)off_plus_1); | |
2434 vis_faligndata(TMP0, TMP2, REF_2); | |
2435 vis_faligndata(TMP2, TMP4, REF_6); | |
2436 } else { | |
2437 vis_src1(TMP2, REF_2); | |
2438 vis_src1(TMP4, REF_6); | |
2439 } | |
2440 | |
2441 vis_and(TMP6, MASK_7f, TMP6); | |
2442 | |
2443 vis_and(TMP8, MASK_7f, TMP8); | |
2444 | |
2445 vis_padd16(TMP10, TMP6, TMP6); | |
2446 vis_st64(TMP6, dest[0]); | |
2447 | |
2448 vis_padd16(TMP12, TMP8, TMP8); | |
2449 vis_st64_2(TMP8, dest, 8); | |
2450 dest += stride; | |
2451 | |
2452 vis_xor(REF_0, REF_2, TMP6); | |
2453 | |
2454 vis_xor(REF_4, REF_6, TMP8); | |
2455 | |
2456 vis_and(TMP6, MASK_fe, TMP6); | |
2457 | |
2458 vis_mul8x16(CONST_128, TMP6, TMP6); | |
2459 vis_and(TMP8, MASK_fe, TMP8); | |
2460 | |
2461 vis_mul8x16(CONST_128, TMP8, TMP8); | |
2462 vis_and(REF_0, REF_2, TMP10); | |
2463 | |
2464 vis_and(REF_4, REF_6, TMP12); | |
2465 | |
2466 vis_alignaddr_g0((void *)off); | |
2467 | |
2468 vis_faligndata(TMP14, TMP16, REF_0); | |
2469 | |
2470 vis_faligndata(TMP16, TMP18, REF_4); | |
2471 | |
2472 if (off != 0x7) { | |
2473 vis_alignaddr_g0((void *)off_plus_1); | |
2474 vis_faligndata(TMP14, TMP16, REF_2); | |
2475 vis_faligndata(TMP16, TMP18, REF_6); | |
2476 } else { | |
2477 vis_src1(TMP16, REF_2); | |
2478 vis_src1(TMP18, REF_6); | |
2479 } | |
2480 | |
2481 vis_and(TMP6, MASK_7f, TMP6); | |
2482 | |
2483 vis_and(TMP8, MASK_7f, TMP8); | |
2484 | |
2485 vis_padd16(TMP10, TMP6, TMP6); | |
2486 vis_st64(TMP6, dest[0]); | |
2487 | |
2488 vis_padd16(TMP12, TMP8, TMP8); | |
2489 vis_st64_2(TMP8, dest, 8); | |
2490 dest += stride; | |
2491 } while (--height); | |
2492 | |
2493 vis_ld64(ref[0], TMP0); | |
2494 vis_xor(REF_0, REF_2, TMP6); | |
2495 | |
2496 vis_ld64_2(ref, 8, TMP2); | |
2497 vis_xor(REF_4, REF_6, TMP8); | |
2498 | |
2499 vis_ld64_2(ref, 16, TMP4); | |
2500 vis_and(TMP6, MASK_fe, TMP6); | |
2501 | |
2502 vis_mul8x16(CONST_128, TMP6, TMP6); | |
2503 vis_and(TMP8, MASK_fe, TMP8); | |
2504 | |
2505 vis_mul8x16(CONST_128, TMP8, TMP8); | |
2506 vis_and(REF_0, REF_2, TMP10); | |
2507 | |
2508 vis_and(REF_4, REF_6, TMP12); | |
2509 | |
2510 vis_alignaddr_g0((void *)off); | |
2511 | |
2512 vis_faligndata(TMP0, TMP2, REF_0); | |
2513 | |
2514 vis_faligndata(TMP2, TMP4, REF_4); | |
2515 | |
2516 if (off != 0x7) { | |
2517 vis_alignaddr_g0((void *)off_plus_1); | |
2518 vis_faligndata(TMP0, TMP2, REF_2); | |
2519 vis_faligndata(TMP2, TMP4, REF_6); | |
2520 } else { | |
2521 vis_src1(TMP2, REF_2); | |
2522 vis_src1(TMP4, REF_6); | |
2523 } | |
2524 | |
2525 vis_and(TMP6, MASK_7f, TMP6); | |
2526 | |
2527 vis_and(TMP8, MASK_7f, TMP8); | |
2528 | |
2529 vis_padd16(TMP10, TMP6, TMP6); | |
2530 vis_st64(TMP6, dest[0]); | |
2531 | |
2532 vis_padd16(TMP12, TMP8, TMP8); | |
2533 vis_st64_2(TMP8, dest, 8); | |
2534 dest += stride; | |
2535 | |
2536 vis_xor(REF_0, REF_2, TMP6); | |
2537 | |
2538 vis_xor(REF_4, REF_6, TMP8); | |
2539 | |
2540 vis_and(TMP6, MASK_fe, TMP6); | |
2541 | |
2542 vis_mul8x16(CONST_128, TMP6, TMP6); | |
2543 vis_and(TMP8, MASK_fe, TMP8); | |
2544 | |
2545 vis_mul8x16(CONST_128, TMP8, TMP8); | |
2546 vis_and(REF_0, REF_2, TMP10); | |
2547 | |
2548 vis_and(REF_4, REF_6, TMP12); | |
2549 | |
2550 vis_and(TMP6, MASK_7f, TMP6); | |
2551 | |
2552 vis_and(TMP8, MASK_7f, TMP8); | |
2553 | |
2554 vis_padd16(TMP10, TMP6, TMP6); | |
2555 vis_st64(TMP6, dest[0]); | |
2556 | |
2557 vis_padd16(TMP12, TMP8, TMP8); | |
2558 vis_st64_2(TMP8, dest, 8); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2559 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2560 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2561 static void MC_put_no_round_x_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2562 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2563 { |
2979 | 2564 uint8_t *ref = (uint8_t *) _ref; |
2565 unsigned long off = (unsigned long) ref & 0x7; | |
2566 unsigned long off_plus_1 = off + 1; | |
2567 | |
2568 ref = vis_alignaddr(ref); | |
2569 | |
2570 vis_ld64(ref[0], TMP0); | |
2571 | |
2572 vis_ld64(ref[8], TMP2); | |
2573 | |
2574 vis_ld64(constants_fe[0], MASK_fe); | |
2575 | |
2576 vis_ld64(constants_7f[0], MASK_7f); | |
2577 | |
2578 vis_ld64(constants128[0], CONST_128); | |
2579 vis_faligndata(TMP0, TMP2, REF_0); | |
2580 | |
2581 if (off != 0x7) { | |
2582 vis_alignaddr_g0((void *)off_plus_1); | |
2583 vis_faligndata(TMP0, TMP2, REF_2); | |
2584 } else { | |
2585 vis_src1(TMP2, REF_2); | |
2586 } | |
2587 | |
2588 ref += stride; | |
2589 height = (height >> 1) - 1; | |
2590 | |
2591 do { /* 20 cycles */ | |
2592 vis_ld64(ref[0], TMP0); | |
2593 vis_xor(REF_0, REF_2, TMP4); | |
2594 | |
2595 vis_ld64_2(ref, 8, TMP2); | |
2596 vis_and(TMP4, MASK_fe, TMP4); | |
2597 ref += stride; | |
2598 | |
2599 vis_ld64(ref[0], TMP8); | |
2600 vis_and(REF_0, REF_2, TMP6); | |
2601 vis_mul8x16(CONST_128, TMP4, TMP4); | |
2602 | |
2603 vis_alignaddr_g0((void *)off); | |
2604 | |
2605 vis_ld64_2(ref, 8, TMP10); | |
2606 ref += stride; | |
2607 vis_faligndata(TMP0, TMP2, REF_0); | |
2608 | |
2609 if (off != 0x7) { | |
2610 vis_alignaddr_g0((void *)off_plus_1); | |
2611 vis_faligndata(TMP0, TMP2, REF_2); | |
2612 } else { | |
2613 vis_src1(TMP2, REF_2); | |
2614 } | |
2615 | |
2616 vis_and(TMP4, MASK_7f, TMP4); | |
2617 | |
2618 vis_padd16(TMP6, TMP4, DST_0); | |
2619 vis_st64(DST_0, dest[0]); | |
2620 dest += stride; | |
2621 | |
2622 vis_xor(REF_0, REF_2, TMP12); | |
2623 | |
2624 vis_and(TMP12, MASK_fe, TMP12); | |
2625 | |
2626 vis_and(REF_0, REF_2, TMP14); | |
2627 vis_mul8x16(CONST_128, TMP12, TMP12); | |
2628 | |
2629 vis_alignaddr_g0((void *)off); | |
2630 vis_faligndata(TMP8, TMP10, REF_0); | |
2631 if (off != 0x7) { | |
2632 vis_alignaddr_g0((void *)off_plus_1); | |
2633 vis_faligndata(TMP8, TMP10, REF_2); | |
2634 } else { | |
2635 vis_src1(TMP10, REF_2); | |
2636 } | |
2637 | |
2638 vis_and(TMP12, MASK_7f, TMP12); | |
2639 | |
2640 vis_padd16(TMP14, TMP12, DST_0); | |
2641 vis_st64(DST_0, dest[0]); | |
2642 dest += stride; | |
2643 } while (--height); | |
2644 | |
2645 vis_ld64(ref[0], TMP0); | |
2646 vis_xor(REF_0, REF_2, TMP4); | |
2647 | |
2648 vis_ld64_2(ref, 8, TMP2); | |
2649 vis_and(TMP4, MASK_fe, TMP4); | |
2650 | |
2651 vis_and(REF_0, REF_2, TMP6); | |
2652 vis_mul8x16(CONST_128, TMP4, TMP4); | |
2653 | |
2654 vis_alignaddr_g0((void *)off); | |
2655 | |
2656 vis_faligndata(TMP0, TMP2, REF_0); | |
2657 | |
2658 if (off != 0x7) { | |
2659 vis_alignaddr_g0((void *)off_plus_1); | |
2660 vis_faligndata(TMP0, TMP2, REF_2); | |
2661 } else { | |
2662 vis_src1(TMP2, REF_2); | |
2663 } | |
2664 | |
2665 vis_and(TMP4, MASK_7f, TMP4); | |
2666 | |
2667 vis_padd16(TMP6, TMP4, DST_0); | |
2668 vis_st64(DST_0, dest[0]); | |
2669 dest += stride; | |
2670 | |
2671 vis_xor(REF_0, REF_2, TMP12); | |
2672 | |
2673 vis_and(TMP12, MASK_fe, TMP12); | |
2674 | |
2675 vis_and(REF_0, REF_2, TMP14); | |
2676 vis_mul8x16(CONST_128, TMP12, TMP12); | |
2677 | |
2678 vis_and(TMP12, MASK_7f, TMP12); | |
2679 | |
2680 vis_padd16(TMP14, TMP12, DST_0); | |
2681 vis_st64(DST_0, dest[0]); | |
2682 dest += stride; | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2683 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2684 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2685 static void MC_avg_no_round_x_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2686 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2687 { |
2979 | 2688 uint8_t *ref = (uint8_t *) _ref; |
2689 unsigned long off = (unsigned long) ref & 0x7; | |
2690 unsigned long off_plus_1 = off + 1; | |
2691 | |
2692 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
2693 | |
2694 vis_ld64(constants3[0], CONST_3); | |
2695 vis_fzero(ZERO); | |
2696 vis_ld64(constants256_512[0], CONST_256); | |
2697 | |
2698 ref = vis_alignaddr(ref); | |
2699 do { /* 26 cycles */ | |
2700 vis_ld64(ref[0], TMP0); | |
2701 | |
2702 vis_ld64(ref[8], TMP2); | |
2703 | |
2704 vis_alignaddr_g0((void *)off); | |
2705 | |
2706 vis_ld64(ref[16], TMP4); | |
2707 | |
2708 vis_ld64(dest[0], DST_0); | |
2709 vis_faligndata(TMP0, TMP2, REF_0); | |
2710 | |
2711 vis_ld64(dest[8], DST_2); | |
2712 vis_faligndata(TMP2, TMP4, REF_4); | |
2713 | |
2714 if (off != 0x7) { | |
2715 vis_alignaddr_g0((void *)off_plus_1); | |
2716 vis_faligndata(TMP0, TMP2, REF_2); | |
2717 vis_faligndata(TMP2, TMP4, REF_6); | |
2718 } else { | |
2719 vis_src1(TMP2, REF_2); | |
2720 vis_src1(TMP4, REF_6); | |
2721 } | |
2722 | |
2723 vis_mul8x16au(REF_0, CONST_256, TMP0); | |
2724 | |
2725 vis_pmerge(ZERO, REF_2, TMP4); | |
2726 vis_mul8x16au(REF_0_1, CONST_256, TMP2); | |
2727 | |
2728 vis_pmerge(ZERO, REF_2_1, TMP6); | |
2729 | |
2730 vis_padd16(TMP0, TMP4, TMP0); | |
2731 | |
2732 vis_mul8x16al(DST_0, CONST_512, TMP4); | |
2733 vis_padd16(TMP2, TMP6, TMP2); | |
2734 | |
2735 vis_mul8x16al(DST_1, CONST_512, TMP6); | |
2736 | |
2737 vis_mul8x16au(REF_6, CONST_256, TMP12); | |
2738 | |
2739 vis_padd16(TMP0, TMP4, TMP0); | |
2740 vis_mul8x16au(REF_6_1, CONST_256, TMP14); | |
2741 | |
2742 vis_padd16(TMP2, TMP6, TMP2); | |
2743 vis_mul8x16au(REF_4, CONST_256, TMP16); | |
2744 | |
2745 vis_padd16(TMP0, CONST_3, TMP8); | |
2746 vis_mul8x16au(REF_4_1, CONST_256, TMP18); | |
2747 | |
2748 vis_padd16(TMP2, CONST_3, TMP10); | |
2749 vis_pack16(TMP8, DST_0); | |
2750 | |
2751 vis_pack16(TMP10, DST_1); | |
2752 vis_padd16(TMP16, TMP12, TMP0); | |
2753 | |
2754 vis_st64(DST_0, dest[0]); | |
2755 vis_mul8x16al(DST_2, CONST_512, TMP4); | |
2756 vis_padd16(TMP18, TMP14, TMP2); | |
2757 | |
2758 vis_mul8x16al(DST_3, CONST_512, TMP6); | |
2759 vis_padd16(TMP0, CONST_3, TMP0); | |
2760 | |
2761 vis_padd16(TMP2, CONST_3, TMP2); | |
2762 | |
2763 vis_padd16(TMP0, TMP4, TMP0); | |
2764 | |
2765 vis_padd16(TMP2, TMP6, TMP2); | |
2766 vis_pack16(TMP0, DST_2); | |
2767 | |
2768 vis_pack16(TMP2, DST_3); | |
2769 vis_st64(DST_2, dest[8]); | |
2770 | |
2771 ref += stride; | |
2772 dest += stride; | |
2773 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2774 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2775 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2776 static void MC_avg_no_round_x_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2777 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2778 { |
2979 | 2779 uint8_t *ref = (uint8_t *) _ref; |
2780 unsigned long off = (unsigned long) ref & 0x7; | |
2781 unsigned long off_plus_1 = off + 1; | |
2782 int stride_times_2 = stride << 1; | |
2783 | |
2784 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
2785 | |
2786 vis_ld64(constants3[0], CONST_3); | |
2787 vis_fzero(ZERO); | |
2788 vis_ld64(constants256_512[0], CONST_256); | |
2789 | |
2790 ref = vis_alignaddr(ref); | |
2791 height >>= 2; | |
2792 do { /* 47 cycles */ | |
2793 vis_ld64(ref[0], TMP0); | |
2794 | |
2795 vis_ld64_2(ref, 8, TMP2); | |
2796 ref += stride; | |
2797 | |
2798 vis_alignaddr_g0((void *)off); | |
2799 | |
2800 vis_ld64(ref[0], TMP4); | |
2801 vis_faligndata(TMP0, TMP2, REF_0); | |
2802 | |
2803 vis_ld64_2(ref, 8, TMP6); | |
2804 ref += stride; | |
2805 | |
2806 vis_ld64(ref[0], TMP8); | |
2807 | |
2808 vis_ld64_2(ref, 8, TMP10); | |
2809 ref += stride; | |
2810 vis_faligndata(TMP4, TMP6, REF_4); | |
2811 | |
2812 vis_ld64(ref[0], TMP12); | |
2813 | |
2814 vis_ld64_2(ref, 8, TMP14); | |
2815 ref += stride; | |
2816 vis_faligndata(TMP8, TMP10, REF_S0); | |
2817 | |
2818 vis_faligndata(TMP12, TMP14, REF_S4); | |
2819 | |
2820 if (off != 0x7) { | |
2821 vis_alignaddr_g0((void *)off_plus_1); | |
2822 | |
2823 vis_ld64(dest[0], DST_0); | |
2824 vis_faligndata(TMP0, TMP2, REF_2); | |
2825 | |
2826 vis_ld64_2(dest, stride, DST_2); | |
2827 vis_faligndata(TMP4, TMP6, REF_6); | |
2828 | |
2829 vis_faligndata(TMP8, TMP10, REF_S2); | |
2830 | |
2831 vis_faligndata(TMP12, TMP14, REF_S6); | |
2832 } else { | |
2833 vis_ld64(dest[0], DST_0); | |
2834 vis_src1(TMP2, REF_2); | |
2835 | |
2836 vis_ld64_2(dest, stride, DST_2); | |
2837 vis_src1(TMP6, REF_6); | |
2838 | |
2839 vis_src1(TMP10, REF_S2); | |
2840 | |
2841 vis_src1(TMP14, REF_S6); | |
2842 } | |
2843 | |
2844 vis_pmerge(ZERO, REF_0, TMP0); | |
2845 vis_mul8x16au(REF_0_1, CONST_256, TMP2); | |
2846 | |
2847 vis_pmerge(ZERO, REF_2, TMP4); | |
2848 vis_mul8x16au(REF_2_1, CONST_256, TMP6); | |
2849 | |
2850 vis_padd16(TMP0, CONST_3, TMP0); | |
2851 vis_mul8x16al(DST_0, CONST_512, TMP16); | |
2852 | |
2853 vis_padd16(TMP2, CONST_3, TMP2); | |
2854 vis_mul8x16al(DST_1, CONST_512, TMP18); | |
2855 | |
2856 vis_padd16(TMP0, TMP4, TMP0); | |
2857 vis_mul8x16au(REF_4, CONST_256, TMP8); | |
2858 | |
2859 vis_padd16(TMP2, TMP6, TMP2); | |
2860 vis_mul8x16au(REF_4_1, CONST_256, TMP10); | |
2861 | |
2862 vis_padd16(TMP0, TMP16, TMP0); | |
2863 vis_mul8x16au(REF_6, CONST_256, TMP12); | |
2864 | |
2865 vis_padd16(TMP2, TMP18, TMP2); | |
2866 vis_mul8x16au(REF_6_1, CONST_256, TMP14); | |
2867 | |
2868 vis_padd16(TMP8, CONST_3, TMP8); | |
2869 vis_mul8x16al(DST_2, CONST_512, TMP16); | |
2870 | |
2871 vis_padd16(TMP8, TMP12, TMP8); | |
2872 vis_mul8x16al(DST_3, CONST_512, TMP18); | |
2873 | |
2874 vis_padd16(TMP10, TMP14, TMP10); | |
2875 vis_pack16(TMP0, DST_0); | |
2876 | |
2877 vis_pack16(TMP2, DST_1); | |
2878 vis_st64(DST_0, dest[0]); | |
2879 dest += stride; | |
2880 vis_padd16(TMP10, CONST_3, TMP10); | |
2881 | |
2882 vis_ld64_2(dest, stride, DST_0); | |
2883 vis_padd16(TMP8, TMP16, TMP8); | |
2884 | |
2885 vis_ld64_2(dest, stride_times_2, TMP4/*DST_2*/); | |
2886 vis_padd16(TMP10, TMP18, TMP10); | |
2887 vis_pack16(TMP8, DST_2); | |
2888 | |
2889 vis_pack16(TMP10, DST_3); | |
2890 vis_st64(DST_2, dest[0]); | |
2891 dest += stride; | |
2892 | |
2893 vis_mul8x16au(REF_S0_1, CONST_256, TMP2); | |
2894 vis_pmerge(ZERO, REF_S0, TMP0); | |
2895 | |
2896 vis_pmerge(ZERO, REF_S2, TMP24); | |
2897 vis_mul8x16au(REF_S2_1, CONST_256, TMP6); | |
2898 | |
2899 vis_padd16(TMP0, CONST_3, TMP0); | |
2900 vis_mul8x16au(REF_S4, CONST_256, TMP8); | |
2901 | |
2902 vis_padd16(TMP2, CONST_3, TMP2); | |
2903 vis_mul8x16au(REF_S4_1, CONST_256, TMP10); | |
2904 | |
2905 vis_padd16(TMP0, TMP24, TMP0); | |
2906 vis_mul8x16au(REF_S6, CONST_256, TMP12); | |
2907 | |
2908 vis_padd16(TMP2, TMP6, TMP2); | |
2909 vis_mul8x16au(REF_S6_1, CONST_256, TMP14); | |
2910 | |
2911 vis_padd16(TMP8, CONST_3, TMP8); | |
2912 vis_mul8x16al(DST_0, CONST_512, TMP16); | |
2913 | |
2914 vis_padd16(TMP10, CONST_3, TMP10); | |
2915 vis_mul8x16al(DST_1, CONST_512, TMP18); | |
2916 | |
2917 vis_padd16(TMP8, TMP12, TMP8); | |
2918 vis_mul8x16al(TMP4/*DST_2*/, CONST_512, TMP20); | |
2919 | |
2920 vis_mul8x16al(TMP5/*DST_3*/, CONST_512, TMP22); | |
2921 vis_padd16(TMP0, TMP16, TMP0); | |
2922 | |
2923 vis_padd16(TMP2, TMP18, TMP2); | |
2924 vis_pack16(TMP0, DST_0); | |
2925 | |
2926 vis_padd16(TMP10, TMP14, TMP10); | |
2927 vis_pack16(TMP2, DST_1); | |
2928 vis_st64(DST_0, dest[0]); | |
2929 dest += stride; | |
2930 | |
2931 vis_padd16(TMP8, TMP20, TMP8); | |
2932 | |
2933 vis_padd16(TMP10, TMP22, TMP10); | |
2934 vis_pack16(TMP8, DST_2); | |
2935 | |
2936 vis_pack16(TMP10, DST_3); | |
2937 vis_st64(DST_2, dest[0]); | |
2938 dest += stride; | |
2939 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2940 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2941 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2942 static void MC_put_no_round_y_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 2943 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
2944 { |
2979 | 2945 uint8_t *ref = (uint8_t *) _ref; |
2946 | |
2947 ref = vis_alignaddr(ref); | |
2948 vis_ld64(ref[0], TMP0); | |
2949 | |
2950 vis_ld64_2(ref, 8, TMP2); | |
2951 | |
2952 vis_ld64_2(ref, 16, TMP4); | |
2953 ref += stride; | |
2954 | |
2955 vis_ld64(ref[0], TMP6); | |
2956 vis_faligndata(TMP0, TMP2, REF_0); | |
2957 | |
2958 vis_ld64_2(ref, 8, TMP8); | |
2959 vis_faligndata(TMP2, TMP4, REF_4); | |
2960 | |
2961 vis_ld64_2(ref, 16, TMP10); | |
2962 ref += stride; | |
2963 | |
2964 vis_ld64(constants_fe[0], MASK_fe); | |
2965 vis_faligndata(TMP6, TMP8, REF_2); | |
2966 | |
2967 vis_ld64(constants_7f[0], MASK_7f); | |
2968 vis_faligndata(TMP8, TMP10, REF_6); | |
2969 | |
2970 vis_ld64(constants128[0], CONST_128); | |
2971 height = (height >> 1) - 1; | |
2972 do { /* 24 cycles */ | |
2973 vis_ld64(ref[0], TMP0); | |
2974 vis_xor(REF_0, REF_2, TMP12); | |
2975 | |
2976 vis_ld64_2(ref, 8, TMP2); | |
2977 vis_xor(REF_4, REF_6, TMP16); | |
2978 | |
2979 vis_ld64_2(ref, 16, TMP4); | |
2980 ref += stride; | |
2981 vis_and(REF_0, REF_2, TMP14); | |
2982 | |
2983 vis_ld64(ref[0], TMP6); | |
2984 vis_and(REF_4, REF_6, TMP18); | |
2985 | |
2986 vis_ld64_2(ref, 8, TMP8); | |
2987 vis_faligndata(TMP0, TMP2, REF_0); | |
2988 | |
2989 vis_ld64_2(ref, 16, TMP10); | |
2990 ref += stride; | |
2991 vis_faligndata(TMP2, TMP4, REF_4); | |
2992 | |
2993 vis_and(TMP12, MASK_fe, TMP12); | |
2994 | |
2995 vis_and(TMP16, MASK_fe, TMP16); | |
2996 vis_mul8x16(CONST_128, TMP12, TMP12); | |
2997 | |
2998 vis_mul8x16(CONST_128, TMP16, TMP16); | |
2999 vis_xor(REF_0, REF_2, TMP0); | |
3000 | |
3001 vis_xor(REF_4, REF_6, TMP2); | |
3002 | |
3003 vis_and(REF_0, REF_2, TMP20); | |
3004 | |
3005 vis_and(TMP12, MASK_7f, TMP12); | |
3006 | |
3007 vis_and(TMP16, MASK_7f, TMP16); | |
3008 | |
3009 vis_padd16(TMP14, TMP12, TMP12); | |
3010 vis_st64(TMP12, dest[0]); | |
3011 | |
3012 vis_padd16(TMP18, TMP16, TMP16); | |
3013 vis_st64_2(TMP16, dest, 8); | |
3014 dest += stride; | |
3015 | |
3016 vis_and(REF_4, REF_6, TMP18); | |
3017 | |
3018 vis_and(TMP0, MASK_fe, TMP0); | |
3019 | |
3020 vis_and(TMP2, MASK_fe, TMP2); | |
3021 vis_mul8x16(CONST_128, TMP0, TMP0); | |
3022 | |
3023 vis_faligndata(TMP6, TMP8, REF_2); | |
3024 vis_mul8x16(CONST_128, TMP2, TMP2); | |
3025 | |
3026 vis_faligndata(TMP8, TMP10, REF_6); | |
3027 | |
3028 vis_and(TMP0, MASK_7f, TMP0); | |
3029 | |
3030 vis_and(TMP2, MASK_7f, TMP2); | |
3031 | |
3032 vis_padd16(TMP20, TMP0, TMP0); | |
3033 vis_st64(TMP0, dest[0]); | |
3034 | |
3035 vis_padd16(TMP18, TMP2, TMP2); | |
3036 vis_st64_2(TMP2, dest, 8); | |
3037 dest += stride; | |
3038 } while (--height); | |
3039 | |
3040 vis_ld64(ref[0], TMP0); | |
3041 vis_xor(REF_0, REF_2, TMP12); | |
3042 | |
3043 vis_ld64_2(ref, 8, TMP2); | |
3044 vis_xor(REF_4, REF_6, TMP16); | |
3045 | |
3046 vis_ld64_2(ref, 16, TMP4); | |
3047 vis_and(REF_0, REF_2, TMP14); | |
3048 | |
3049 vis_and(REF_4, REF_6, TMP18); | |
3050 | |
3051 vis_faligndata(TMP0, TMP2, REF_0); | |
3052 | |
3053 vis_faligndata(TMP2, TMP4, REF_4); | |
3054 | |
3055 vis_and(TMP12, MASK_fe, TMP12); | |
3056 | |
3057 vis_and(TMP16, MASK_fe, TMP16); | |
3058 vis_mul8x16(CONST_128, TMP12, TMP12); | |
3059 | |
3060 vis_mul8x16(CONST_128, TMP16, TMP16); | |
3061 vis_xor(REF_0, REF_2, TMP0); | |
3062 | |
3063 vis_xor(REF_4, REF_6, TMP2); | |
3064 | |
3065 vis_and(REF_0, REF_2, TMP20); | |
3066 | |
3067 vis_and(TMP12, MASK_7f, TMP12); | |
3068 | |
3069 vis_and(TMP16, MASK_7f, TMP16); | |
3070 | |
3071 vis_padd16(TMP14, TMP12, TMP12); | |
3072 vis_st64(TMP12, dest[0]); | |
3073 | |
3074 vis_padd16(TMP18, TMP16, TMP16); | |
3075 vis_st64_2(TMP16, dest, 8); | |
3076 dest += stride; | |
3077 | |
3078 vis_and(REF_4, REF_6, TMP18); | |
3079 | |
3080 vis_and(TMP0, MASK_fe, TMP0); | |
3081 | |
3082 vis_and(TMP2, MASK_fe, TMP2); | |
3083 vis_mul8x16(CONST_128, TMP0, TMP0); | |
3084 | |
3085 vis_mul8x16(CONST_128, TMP2, TMP2); | |
3086 | |
3087 vis_and(TMP0, MASK_7f, TMP0); | |
3088 | |
3089 vis_and(TMP2, MASK_7f, TMP2); | |
3090 | |
3091 vis_padd16(TMP20, TMP0, TMP0); | |
3092 vis_st64(TMP0, dest[0]); | |
3093 | |
3094 vis_padd16(TMP18, TMP2, TMP2); | |
3095 vis_st64_2(TMP2, dest, 8); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3096 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3097 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3098 static void MC_put_no_round_y_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 3099 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3100 { |
2979 | 3101 uint8_t *ref = (uint8_t *) _ref; |
3102 | |
3103 ref = vis_alignaddr(ref); | |
3104 vis_ld64(ref[0], TMP0); | |
3105 | |
3106 vis_ld64_2(ref, 8, TMP2); | |
3107 ref += stride; | |
3108 | |
3109 vis_ld64(ref[0], TMP4); | |
3110 | |
3111 vis_ld64_2(ref, 8, TMP6); | |
3112 ref += stride; | |
3113 | |
3114 vis_ld64(constants_fe[0], MASK_fe); | |
3115 vis_faligndata(TMP0, TMP2, REF_0); | |
3116 | |
3117 vis_ld64(constants_7f[0], MASK_7f); | |
3118 vis_faligndata(TMP4, TMP6, REF_2); | |
3119 | |
3120 vis_ld64(constants128[0], CONST_128); | |
3121 height = (height >> 1) - 1; | |
3122 do { /* 12 cycles */ | |
3123 vis_ld64(ref[0], TMP0); | |
3124 vis_xor(REF_0, REF_2, TMP4); | |
3125 | |
3126 vis_ld64_2(ref, 8, TMP2); | |
3127 ref += stride; | |
3128 vis_and(TMP4, MASK_fe, TMP4); | |
3129 | |
3130 vis_and(REF_0, REF_2, TMP6); | |
3131 vis_mul8x16(CONST_128, TMP4, TMP4); | |
3132 | |
3133 vis_faligndata(TMP0, TMP2, REF_0); | |
3134 vis_ld64(ref[0], TMP0); | |
3135 | |
3136 vis_ld64_2(ref, 8, TMP2); | |
3137 ref += stride; | |
3138 vis_xor(REF_0, REF_2, TMP12); | |
3139 | |
3140 vis_and(TMP4, MASK_7f, TMP4); | |
3141 | |
3142 vis_and(TMP12, MASK_fe, TMP12); | |
3143 | |
3144 vis_mul8x16(CONST_128, TMP12, TMP12); | |
3145 vis_and(REF_0, REF_2, TMP14); | |
3146 | |
3147 vis_padd16(TMP6, TMP4, DST_0); | |
3148 vis_st64(DST_0, dest[0]); | |
3149 dest += stride; | |
3150 | |
3151 vis_faligndata(TMP0, TMP2, REF_2); | |
3152 | |
3153 vis_and(TMP12, MASK_7f, TMP12); | |
3154 | |
3155 vis_padd16(TMP14, TMP12, DST_0); | |
3156 vis_st64(DST_0, dest[0]); | |
3157 dest += stride; | |
3158 } while (--height); | |
3159 | |
3160 vis_ld64(ref[0], TMP0); | |
3161 vis_xor(REF_0, REF_2, TMP4); | |
3162 | |
3163 vis_ld64_2(ref, 8, TMP2); | |
3164 vis_and(TMP4, MASK_fe, TMP4); | |
3165 | |
3166 vis_and(REF_0, REF_2, TMP6); | |
3167 vis_mul8x16(CONST_128, TMP4, TMP4); | |
3168 | |
3169 vis_faligndata(TMP0, TMP2, REF_0); | |
3170 | |
3171 vis_xor(REF_0, REF_2, TMP12); | |
3172 | |
3173 vis_and(TMP4, MASK_7f, TMP4); | |
3174 | |
3175 vis_and(TMP12, MASK_fe, TMP12); | |
3176 | |
3177 vis_mul8x16(CONST_128, TMP12, TMP12); | |
3178 vis_and(REF_0, REF_2, TMP14); | |
3179 | |
3180 vis_padd16(TMP6, TMP4, DST_0); | |
3181 vis_st64(DST_0, dest[0]); | |
3182 dest += stride; | |
3183 | |
3184 vis_and(TMP12, MASK_7f, TMP12); | |
3185 | |
3186 vis_padd16(TMP14, TMP12, DST_0); | |
3187 vis_st64(DST_0, dest[0]); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3188 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3189 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3190 static void MC_avg_no_round_y_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 3191 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3192 { |
2979 | 3193 uint8_t *ref = (uint8_t *) _ref; |
3194 int stride_8 = stride + 8; | |
3195 int stride_16 = stride + 16; | |
3196 | |
3197 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
3198 | |
3199 ref = vis_alignaddr(ref); | |
3200 | |
3201 vis_ld64(ref[ 0], TMP0); | |
3202 vis_fzero(ZERO); | |
3203 | |
3204 vis_ld64(ref[ 8], TMP2); | |
3205 | |
3206 vis_ld64(ref[16], TMP4); | |
3207 | |
3208 vis_ld64(constants3[0], CONST_3); | |
3209 vis_faligndata(TMP0, TMP2, REF_2); | |
3210 | |
3211 vis_ld64(constants256_512[0], CONST_256); | |
3212 vis_faligndata(TMP2, TMP4, REF_6); | |
3213 height >>= 1; | |
3214 | |
3215 do { /* 31 cycles */ | |
3216 vis_ld64_2(ref, stride, TMP0); | |
3217 vis_pmerge(ZERO, REF_2, TMP12); | |
3218 vis_mul8x16au(REF_2_1, CONST_256, TMP14); | |
3219 | |
3220 vis_ld64_2(ref, stride_8, TMP2); | |
3221 vis_pmerge(ZERO, REF_6, TMP16); | |
3222 vis_mul8x16au(REF_6_1, CONST_256, TMP18); | |
3223 | |
3224 vis_ld64_2(ref, stride_16, TMP4); | |
3225 ref += stride; | |
3226 | |
3227 vis_ld64(dest[0], DST_0); | |
3228 vis_faligndata(TMP0, TMP2, REF_0); | |
3229 | |
3230 vis_ld64_2(dest, 8, DST_2); | |
3231 vis_faligndata(TMP2, TMP4, REF_4); | |
3232 | |
3233 vis_ld64_2(ref, stride, TMP6); | |
3234 vis_pmerge(ZERO, REF_0, TMP0); | |
3235 vis_mul8x16au(REF_0_1, CONST_256, TMP2); | |
3236 | |
3237 vis_ld64_2(ref, stride_8, TMP8); | |
3238 vis_pmerge(ZERO, REF_4, TMP4); | |
3239 | |
3240 vis_ld64_2(ref, stride_16, TMP10); | |
3241 ref += stride; | |
3242 | |
3243 vis_ld64_2(dest, stride, REF_S0/*DST_4*/); | |
3244 vis_faligndata(TMP6, TMP8, REF_2); | |
3245 vis_mul8x16au(REF_4_1, CONST_256, TMP6); | |
3246 | |
3247 vis_ld64_2(dest, stride_8, REF_S2/*DST_6*/); | |
3248 vis_faligndata(TMP8, TMP10, REF_6); | |
3249 vis_mul8x16al(DST_0, CONST_512, TMP20); | |
3250 | |
3251 vis_padd16(TMP0, CONST_3, TMP0); | |
3252 vis_mul8x16al(DST_1, CONST_512, TMP22); | |
3253 | |
3254 vis_padd16(TMP2, CONST_3, TMP2); | |
3255 vis_mul8x16al(DST_2, CONST_512, TMP24); | |
3256 | |
3257 vis_padd16(TMP4, CONST_3, TMP4); | |
3258 vis_mul8x16al(DST_3, CONST_512, TMP26); | |
3259 | |
3260 vis_padd16(TMP6, CONST_3, TMP6); | |
3261 | |
3262 vis_padd16(TMP12, TMP20, TMP12); | |
3263 vis_mul8x16al(REF_S0, CONST_512, TMP20); | |
3264 | |
3265 vis_padd16(TMP14, TMP22, TMP14); | |
3266 vis_mul8x16al(REF_S0_1, CONST_512, TMP22); | |
3267 | |
3268 vis_padd16(TMP16, TMP24, TMP16); | |
3269 vis_mul8x16al(REF_S2, CONST_512, TMP24); | |
3270 | |
3271 vis_padd16(TMP18, TMP26, TMP18); | |
3272 vis_mul8x16al(REF_S2_1, CONST_512, TMP26); | |
3273 | |
3274 vis_padd16(TMP12, TMP0, TMP12); | |
3275 vis_mul8x16au(REF_2, CONST_256, TMP28); | |
3276 | |
3277 vis_padd16(TMP14, TMP2, TMP14); | |
3278 vis_mul8x16au(REF_2_1, CONST_256, TMP30); | |
3279 | |
3280 vis_padd16(TMP16, TMP4, TMP16); | |
3281 vis_mul8x16au(REF_6, CONST_256, REF_S4); | |
3282 | |
3283 vis_padd16(TMP18, TMP6, TMP18); | |
3284 vis_mul8x16au(REF_6_1, CONST_256, REF_S6); | |
3285 | |
3286 vis_pack16(TMP12, DST_0); | |
3287 vis_padd16(TMP28, TMP0, TMP12); | |
3288 | |
3289 vis_pack16(TMP14, DST_1); | |
3290 vis_st64(DST_0, dest[0]); | |
3291 vis_padd16(TMP30, TMP2, TMP14); | |
3292 | |
3293 vis_pack16(TMP16, DST_2); | |
3294 vis_padd16(REF_S4, TMP4, TMP16); | |
3295 | |
3296 vis_pack16(TMP18, DST_3); | |
3297 vis_st64_2(DST_2, dest, 8); | |
3298 dest += stride; | |
3299 vis_padd16(REF_S6, TMP6, TMP18); | |
3300 | |
3301 vis_padd16(TMP12, TMP20, TMP12); | |
3302 | |
3303 vis_padd16(TMP14, TMP22, TMP14); | |
3304 vis_pack16(TMP12, DST_0); | |
3305 | |
3306 vis_padd16(TMP16, TMP24, TMP16); | |
3307 vis_pack16(TMP14, DST_1); | |
3308 vis_st64(DST_0, dest[0]); | |
3309 | |
3310 vis_padd16(TMP18, TMP26, TMP18); | |
3311 vis_pack16(TMP16, DST_2); | |
3312 | |
3313 vis_pack16(TMP18, DST_3); | |
3314 vis_st64_2(DST_2, dest, 8); | |
3315 dest += stride; | |
3316 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3317 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3318 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3319 static void MC_avg_no_round_y_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 3320 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3321 { |
2979 | 3322 uint8_t *ref = (uint8_t *) _ref; |
3323 int stride_8 = stride + 8; | |
3324 | |
3325 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
3326 | |
3327 ref = vis_alignaddr(ref); | |
3328 | |
3329 vis_ld64(ref[ 0], TMP0); | |
3330 vis_fzero(ZERO); | |
3331 | |
3332 vis_ld64(ref[ 8], TMP2); | |
3333 | |
3334 vis_ld64(constants3[0], CONST_3); | |
3335 vis_faligndata(TMP0, TMP2, REF_2); | |
3336 | |
3337 vis_ld64(constants256_512[0], CONST_256); | |
3338 | |
3339 height >>= 1; | |
3340 do { /* 20 cycles */ | |
3341 vis_ld64_2(ref, stride, TMP0); | |
3342 vis_pmerge(ZERO, REF_2, TMP8); | |
3343 vis_mul8x16au(REF_2_1, CONST_256, TMP10); | |
3344 | |
3345 vis_ld64_2(ref, stride_8, TMP2); | |
3346 ref += stride; | |
3347 | |
3348 vis_ld64(dest[0], DST_0); | |
3349 | |
3350 vis_ld64_2(dest, stride, DST_2); | |
3351 vis_faligndata(TMP0, TMP2, REF_0); | |
3352 | |
3353 vis_ld64_2(ref, stride, TMP4); | |
3354 vis_mul8x16al(DST_0, CONST_512, TMP16); | |
3355 vis_pmerge(ZERO, REF_0, TMP12); | |
3356 | |
3357 vis_ld64_2(ref, stride_8, TMP6); | |
3358 ref += stride; | |
3359 vis_mul8x16al(DST_1, CONST_512, TMP18); | |
3360 vis_pmerge(ZERO, REF_0_1, TMP14); | |
3361 | |
3362 vis_padd16(TMP12, CONST_3, TMP12); | |
3363 vis_mul8x16al(DST_2, CONST_512, TMP24); | |
3364 | |
3365 vis_padd16(TMP14, CONST_3, TMP14); | |
3366 vis_mul8x16al(DST_3, CONST_512, TMP26); | |
3367 | |
3368 vis_faligndata(TMP4, TMP6, REF_2); | |
3369 | |
3370 vis_padd16(TMP8, TMP12, TMP8); | |
3371 | |
3372 vis_padd16(TMP10, TMP14, TMP10); | |
3373 vis_mul8x16au(REF_2, CONST_256, TMP20); | |
3374 | |
3375 vis_padd16(TMP8, TMP16, TMP0); | |
3376 vis_mul8x16au(REF_2_1, CONST_256, TMP22); | |
3377 | |
3378 vis_padd16(TMP10, TMP18, TMP2); | |
3379 vis_pack16(TMP0, DST_0); | |
3380 | |
3381 vis_pack16(TMP2, DST_1); | |
3382 vis_st64(DST_0, dest[0]); | |
3383 dest += stride; | |
3384 vis_padd16(TMP12, TMP20, TMP12); | |
3385 | |
3386 vis_padd16(TMP14, TMP22, TMP14); | |
3387 | |
3388 vis_padd16(TMP12, TMP24, TMP0); | |
3389 | |
3390 vis_padd16(TMP14, TMP26, TMP2); | |
3391 vis_pack16(TMP0, DST_2); | |
3392 | |
3393 vis_pack16(TMP2, DST_3); | |
3394 vis_st64(DST_2, dest[0]); | |
3395 dest += stride; | |
3396 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3397 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3398 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3399 static void MC_put_no_round_xy_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 3400 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3401 { |
2979 | 3402 uint8_t *ref = (uint8_t *) _ref; |
3403 unsigned long off = (unsigned long) ref & 0x7; | |
3404 unsigned long off_plus_1 = off + 1; | |
3405 int stride_8 = stride + 8; | |
3406 int stride_16 = stride + 16; | |
3407 | |
3408 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
3409 | |
3410 ref = vis_alignaddr(ref); | |
3411 | |
3412 vis_ld64(ref[ 0], TMP0); | |
3413 vis_fzero(ZERO); | |
3414 | |
3415 vis_ld64(ref[ 8], TMP2); | |
3416 | |
3417 vis_ld64(ref[16], TMP4); | |
3418 | |
3419 vis_ld64(constants1[0], CONST_1); | |
3420 vis_faligndata(TMP0, TMP2, REF_S0); | |
3421 | |
3422 vis_ld64(constants256_512[0], CONST_256); | |
3423 vis_faligndata(TMP2, TMP4, REF_S4); | |
3424 | |
3425 if (off != 0x7) { | |
3426 vis_alignaddr_g0((void *)off_plus_1); | |
3427 vis_faligndata(TMP0, TMP2, REF_S2); | |
3428 vis_faligndata(TMP2, TMP4, REF_S6); | |
3429 } else { | |
3430 vis_src1(TMP2, REF_S2); | |
3431 vis_src1(TMP4, REF_S6); | |
3432 } | |
3433 | |
3434 height >>= 1; | |
3435 do { | |
3436 vis_ld64_2(ref, stride, TMP0); | |
3437 vis_mul8x16au(REF_S0, CONST_256, TMP12); | |
3438 vis_pmerge(ZERO, REF_S0_1, TMP14); | |
3439 | |
3440 vis_alignaddr_g0((void *)off); | |
3441 | |
3442 vis_ld64_2(ref, stride_8, TMP2); | |
3443 vis_mul8x16au(REF_S2, CONST_256, TMP16); | |
3444 vis_pmerge(ZERO, REF_S2_1, TMP18); | |
3445 | |
3446 vis_ld64_2(ref, stride_16, TMP4); | |
3447 ref += stride; | |
3448 vis_mul8x16au(REF_S4, CONST_256, TMP20); | |
3449 vis_pmerge(ZERO, REF_S4_1, TMP22); | |
3450 | |
3451 vis_ld64_2(ref, stride, TMP6); | |
3452 vis_mul8x16au(REF_S6, CONST_256, TMP24); | |
3453 vis_pmerge(ZERO, REF_S6_1, TMP26); | |
3454 | |
3455 vis_ld64_2(ref, stride_8, TMP8); | |
3456 vis_faligndata(TMP0, TMP2, REF_0); | |
3457 | |
3458 vis_ld64_2(ref, stride_16, TMP10); | |
3459 ref += stride; | |
3460 vis_faligndata(TMP2, TMP4, REF_4); | |
3461 | |
3462 vis_faligndata(TMP6, TMP8, REF_S0); | |
3463 | |
3464 vis_faligndata(TMP8, TMP10, REF_S4); | |
3465 | |
3466 if (off != 0x7) { | |
3467 vis_alignaddr_g0((void *)off_plus_1); | |
3468 vis_faligndata(TMP0, TMP2, REF_2); | |
3469 vis_faligndata(TMP2, TMP4, REF_6); | |
3470 vis_faligndata(TMP6, TMP8, REF_S2); | |
3471 vis_faligndata(TMP8, TMP10, REF_S6); | |
3472 } else { | |
3473 vis_src1(TMP2, REF_2); | |
3474 vis_src1(TMP4, REF_6); | |
3475 vis_src1(TMP8, REF_S2); | |
3476 vis_src1(TMP10, REF_S6); | |
3477 } | |
3478 | |
3479 vis_mul8x16au(REF_0, CONST_256, TMP0); | |
3480 vis_pmerge(ZERO, REF_0_1, TMP2); | |
3481 | |
3482 vis_mul8x16au(REF_2, CONST_256, TMP4); | |
3483 vis_pmerge(ZERO, REF_2_1, TMP6); | |
3484 | |
3485 vis_padd16(TMP0, CONST_2, TMP8); | |
3486 vis_mul8x16au(REF_4, CONST_256, TMP0); | |
3487 | |
3488 vis_padd16(TMP2, CONST_1, TMP10); | |
3489 vis_mul8x16au(REF_4_1, CONST_256, TMP2); | |
3490 | |
3491 vis_padd16(TMP8, TMP4, TMP8); | |
3492 vis_mul8x16au(REF_6, CONST_256, TMP4); | |
3493 | |
3494 vis_padd16(TMP10, TMP6, TMP10); | |
3495 vis_mul8x16au(REF_6_1, CONST_256, TMP6); | |
3496 | |
3497 vis_padd16(TMP12, TMP8, TMP12); | |
3498 | |
3499 vis_padd16(TMP14, TMP10, TMP14); | |
3500 | |
3501 vis_padd16(TMP12, TMP16, TMP12); | |
3502 | |
3503 vis_padd16(TMP14, TMP18, TMP14); | |
3504 vis_pack16(TMP12, DST_0); | |
3505 | |
3506 vis_pack16(TMP14, DST_1); | |
3507 vis_st64(DST_0, dest[0]); | |
3508 vis_padd16(TMP0, CONST_1, TMP12); | |
3509 | |
3510 vis_mul8x16au(REF_S0, CONST_256, TMP0); | |
3511 vis_padd16(TMP2, CONST_1, TMP14); | |
3512 | |
3513 vis_mul8x16au(REF_S0_1, CONST_256, TMP2); | |
3514 vis_padd16(TMP12, TMP4, TMP12); | |
3515 | |
3516 vis_mul8x16au(REF_S2, CONST_256, TMP4); | |
3517 vis_padd16(TMP14, TMP6, TMP14); | |
3518 | |
3519 vis_mul8x16au(REF_S2_1, CONST_256, TMP6); | |
3520 vis_padd16(TMP20, TMP12, TMP20); | |
3521 | |
3522 vis_padd16(TMP22, TMP14, TMP22); | |
3523 | |
3524 vis_padd16(TMP20, TMP24, TMP20); | |
3525 | |
3526 vis_padd16(TMP22, TMP26, TMP22); | |
3527 vis_pack16(TMP20, DST_2); | |
3528 | |
3529 vis_pack16(TMP22, DST_3); | |
3530 vis_st64_2(DST_2, dest, 8); | |
3531 dest += stride; | |
3532 vis_padd16(TMP0, TMP4, TMP24); | |
3533 | |
3534 vis_mul8x16au(REF_S4, CONST_256, TMP0); | |
3535 vis_padd16(TMP2, TMP6, TMP26); | |
3536 | |
3537 vis_mul8x16au(REF_S4_1, CONST_256, TMP2); | |
3538 vis_padd16(TMP24, TMP8, TMP24); | |
3539 | |
3540 vis_padd16(TMP26, TMP10, TMP26); | |
3541 vis_pack16(TMP24, DST_0); | |
3542 | |
3543 vis_pack16(TMP26, DST_1); | |
3544 vis_st64(DST_0, dest[0]); | |
3545 vis_pmerge(ZERO, REF_S6, TMP4); | |
3546 | |
3547 vis_pmerge(ZERO, REF_S6_1, TMP6); | |
3548 | |
3549 vis_padd16(TMP0, TMP4, TMP0); | |
3550 | |
3551 vis_padd16(TMP2, TMP6, TMP2); | |
3552 | |
3553 vis_padd16(TMP0, TMP12, TMP0); | |
3554 | |
3555 vis_padd16(TMP2, TMP14, TMP2); | |
3556 vis_pack16(TMP0, DST_2); | |
3557 | |
3558 vis_pack16(TMP2, DST_3); | |
3559 vis_st64_2(DST_2, dest, 8); | |
3560 dest += stride; | |
3561 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3562 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3563 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3564 static void MC_put_no_round_xy_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 3565 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3566 { |
2979 | 3567 uint8_t *ref = (uint8_t *) _ref; |
3568 unsigned long off = (unsigned long) ref & 0x7; | |
3569 unsigned long off_plus_1 = off + 1; | |
3570 int stride_8 = stride + 8; | |
3571 | |
3572 vis_set_gsr(5 << VIS_GSR_SCALEFACT_SHIFT); | |
3573 | |
3574 ref = vis_alignaddr(ref); | |
3575 | |
3576 vis_ld64(ref[ 0], TMP0); | |
3577 vis_fzero(ZERO); | |
3578 | |
3579 vis_ld64(ref[ 8], TMP2); | |
3580 | |
3581 vis_ld64(constants1[0], CONST_1); | |
3582 | |
3583 vis_ld64(constants256_512[0], CONST_256); | |
3584 vis_faligndata(TMP0, TMP2, REF_S0); | |
3585 | |
3586 if (off != 0x7) { | |
3587 vis_alignaddr_g0((void *)off_plus_1); | |
3588 vis_faligndata(TMP0, TMP2, REF_S2); | |
3589 } else { | |
3590 vis_src1(TMP2, REF_S2); | |
3591 } | |
3592 | |
3593 height >>= 1; | |
3594 do { /* 26 cycles */ | |
3595 vis_ld64_2(ref, stride, TMP0); | |
3596 vis_mul8x16au(REF_S0, CONST_256, TMP8); | |
3597 vis_pmerge(ZERO, REF_S2, TMP12); | |
3598 | |
3599 vis_alignaddr_g0((void *)off); | |
3600 | |
3601 vis_ld64_2(ref, stride_8, TMP2); | |
3602 ref += stride; | |
3603 vis_mul8x16au(REF_S0_1, CONST_256, TMP10); | |
3604 vis_pmerge(ZERO, REF_S2_1, TMP14); | |
3605 | |
3606 vis_ld64_2(ref, stride, TMP4); | |
3607 | |
3608 vis_ld64_2(ref, stride_8, TMP6); | |
3609 ref += stride; | |
3610 vis_faligndata(TMP0, TMP2, REF_S4); | |
3611 | |
3612 vis_pmerge(ZERO, REF_S4, TMP18); | |
3613 | |
3614 vis_pmerge(ZERO, REF_S4_1, TMP20); | |
3615 | |
3616 vis_faligndata(TMP4, TMP6, REF_S0); | |
3617 | |
3618 if (off != 0x7) { | |
3619 vis_alignaddr_g0((void *)off_plus_1); | |
3620 vis_faligndata(TMP0, TMP2, REF_S6); | |
3621 vis_faligndata(TMP4, TMP6, REF_S2); | |
3622 } else { | |
3623 vis_src1(TMP2, REF_S6); | |
3624 vis_src1(TMP6, REF_S2); | |
3625 } | |
3626 | |
3627 vis_padd16(TMP18, CONST_1, TMP18); | |
3628 vis_mul8x16au(REF_S6, CONST_256, TMP22); | |
3629 | |
3630 vis_padd16(TMP20, CONST_1, TMP20); | |
3631 vis_mul8x16au(REF_S6_1, CONST_256, TMP24); | |
3632 | |
3633 vis_mul8x16au(REF_S0, CONST_256, TMP26); | |
3634 vis_pmerge(ZERO, REF_S0_1, TMP28); | |
3635 | |
3636 vis_mul8x16au(REF_S2, CONST_256, TMP30); | |
3637 vis_padd16(TMP18, TMP22, TMP18); | |
3638 | |
3639 vis_mul8x16au(REF_S2_1, CONST_256, TMP32); | |
3640 vis_padd16(TMP20, TMP24, TMP20); | |
3641 | |
3642 vis_padd16(TMP8, TMP18, TMP8); | |
3643 | |
3644 vis_padd16(TMP10, TMP20, TMP10); | |
3645 | |
3646 vis_padd16(TMP8, TMP12, TMP8); | |
3647 | |
3648 vis_padd16(TMP10, TMP14, TMP10); | |
3649 vis_pack16(TMP8, DST_0); | |
3650 | |
3651 vis_pack16(TMP10, DST_1); | |
3652 vis_st64(DST_0, dest[0]); | |
3653 dest += stride; | |
3654 vis_padd16(TMP18, TMP26, TMP18); | |
3655 | |
3656 vis_padd16(TMP20, TMP28, TMP20); | |
3657 | |
3658 vis_padd16(TMP18, TMP30, TMP18); | |
3659 | |
3660 vis_padd16(TMP20, TMP32, TMP20); | |
3661 vis_pack16(TMP18, DST_2); | |
3662 | |
3663 vis_pack16(TMP20, DST_3); | |
3664 vis_st64(DST_2, dest[0]); | |
3665 dest += stride; | |
3666 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3667 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3668 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3669 static void MC_avg_no_round_xy_16_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 3670 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3671 { |
2979 | 3672 uint8_t *ref = (uint8_t *) _ref; |
3673 unsigned long off = (unsigned long) ref & 0x7; | |
3674 unsigned long off_plus_1 = off + 1; | |
3675 int stride_8 = stride + 8; | |
3676 int stride_16 = stride + 16; | |
3677 | |
3678 vis_set_gsr(4 << VIS_GSR_SCALEFACT_SHIFT); | |
3679 | |
3680 ref = vis_alignaddr(ref); | |
3681 | |
3682 vis_ld64(ref[ 0], TMP0); | |
3683 vis_fzero(ZERO); | |
3684 | |
3685 vis_ld64(ref[ 8], TMP2); | |
3686 | |
3687 vis_ld64(ref[16], TMP4); | |
3688 | |
3689 vis_ld64(constants6[0], CONST_6); | |
3690 vis_faligndata(TMP0, TMP2, REF_S0); | |
3691 | |
3692 vis_ld64(constants256_1024[0], CONST_256); | |
3693 vis_faligndata(TMP2, TMP4, REF_S4); | |
3694 | |
3695 if (off != 0x7) { | |
3696 vis_alignaddr_g0((void *)off_plus_1); | |
3697 vis_faligndata(TMP0, TMP2, REF_S2); | |
3698 vis_faligndata(TMP2, TMP4, REF_S6); | |
3699 } else { | |
3700 vis_src1(TMP2, REF_S2); | |
3701 vis_src1(TMP4, REF_S6); | |
3702 } | |
3703 | |
3704 height >>= 1; | |
3705 do { /* 55 cycles */ | |
3706 vis_ld64_2(ref, stride, TMP0); | |
3707 vis_mul8x16au(REF_S0, CONST_256, TMP12); | |
3708 vis_pmerge(ZERO, REF_S0_1, TMP14); | |
3709 | |
3710 vis_alignaddr_g0((void *)off); | |
3711 | |
3712 vis_ld64_2(ref, stride_8, TMP2); | |
3713 vis_mul8x16au(REF_S2, CONST_256, TMP16); | |
3714 vis_pmerge(ZERO, REF_S2_1, TMP18); | |
3715 | |
3716 vis_ld64_2(ref, stride_16, TMP4); | |
3717 ref += stride; | |
3718 vis_mul8x16au(REF_S4, CONST_256, TMP20); | |
3719 vis_pmerge(ZERO, REF_S4_1, TMP22); | |
3720 | |
3721 vis_ld64_2(ref, stride, TMP6); | |
3722 vis_mul8x16au(REF_S6, CONST_256, TMP24); | |
3723 vis_pmerge(ZERO, REF_S6_1, TMP26); | |
3724 | |
3725 vis_ld64_2(ref, stride_8, TMP8); | |
3726 vis_faligndata(TMP0, TMP2, REF_0); | |
3727 | |
3728 vis_ld64_2(ref, stride_16, TMP10); | |
3729 ref += stride; | |
3730 vis_faligndata(TMP2, TMP4, REF_4); | |
3731 | |
3732 vis_ld64(dest[0], DST_0); | |
3733 vis_faligndata(TMP6, TMP8, REF_S0); | |
3734 | |
3735 vis_ld64_2(dest, 8, DST_2); | |
3736 vis_faligndata(TMP8, TMP10, REF_S4); | |
3737 | |
3738 if (off != 0x7) { | |
3739 vis_alignaddr_g0((void *)off_plus_1); | |
3740 vis_faligndata(TMP0, TMP2, REF_2); | |
3741 vis_faligndata(TMP2, TMP4, REF_6); | |
3742 vis_faligndata(TMP6, TMP8, REF_S2); | |
3743 vis_faligndata(TMP8, TMP10, REF_S6); | |
3744 } else { | |
3745 vis_src1(TMP2, REF_2); | |
3746 vis_src1(TMP4, REF_6); | |
3747 vis_src1(TMP8, REF_S2); | |
3748 vis_src1(TMP10, REF_S6); | |
3749 } | |
3750 | |
3751 vis_mul8x16al(DST_0, CONST_1024, TMP30); | |
3752 vis_pmerge(ZERO, REF_0, TMP0); | |
3753 | |
3754 vis_mul8x16al(DST_1, CONST_1024, TMP32); | |
3755 vis_pmerge(ZERO, REF_0_1, TMP2); | |
3756 | |
3757 vis_mul8x16au(REF_2, CONST_256, TMP4); | |
3758 vis_pmerge(ZERO, REF_2_1, TMP6); | |
3759 | |
3760 vis_mul8x16al(DST_2, CONST_1024, REF_0); | |
3761 vis_padd16(TMP0, CONST_6, TMP0); | |
3762 | |
3763 vis_mul8x16al(DST_3, CONST_1024, REF_2); | |
3764 vis_padd16(TMP2, CONST_6, TMP2); | |
3765 | |
3766 vis_padd16(TMP0, TMP4, TMP0); | |
3767 vis_mul8x16au(REF_4, CONST_256, TMP4); | |
3768 | |
3769 vis_padd16(TMP2, TMP6, TMP2); | |
3770 vis_mul8x16au(REF_4_1, CONST_256, TMP6); | |
3771 | |
3772 vis_padd16(TMP12, TMP0, TMP12); | |
3773 vis_mul8x16au(REF_6, CONST_256, TMP8); | |
3774 | |
3775 vis_padd16(TMP14, TMP2, TMP14); | |
3776 vis_mul8x16au(REF_6_1, CONST_256, TMP10); | |
3777 | |
3778 vis_padd16(TMP12, TMP16, TMP12); | |
3779 vis_mul8x16au(REF_S0, CONST_256, REF_4); | |
3780 | |
3781 vis_padd16(TMP14, TMP18, TMP14); | |
3782 vis_mul8x16au(REF_S0_1, CONST_256, REF_6); | |
3783 | |
3784 vis_padd16(TMP12, TMP30, TMP12); | |
3785 | |
3786 vis_padd16(TMP14, TMP32, TMP14); | |
3787 vis_pack16(TMP12, DST_0); | |
3788 | |
3789 vis_pack16(TMP14, DST_1); | |
3790 vis_st64(DST_0, dest[0]); | |
3791 vis_padd16(TMP4, CONST_6, TMP4); | |
3792 | |
3793 vis_ld64_2(dest, stride, DST_0); | |
3794 vis_padd16(TMP6, CONST_6, TMP6); | |
3795 vis_mul8x16au(REF_S2, CONST_256, TMP12); | |
3796 | |
3797 vis_padd16(TMP4, TMP8, TMP4); | |
3798 vis_mul8x16au(REF_S2_1, CONST_256, TMP14); | |
3799 | |
3800 vis_padd16(TMP6, TMP10, TMP6); | |
3801 | |
3802 vis_padd16(TMP20, TMP4, TMP20); | |
3803 | |
3804 vis_padd16(TMP22, TMP6, TMP22); | |
3805 | |
3806 vis_padd16(TMP20, TMP24, TMP20); | |
3807 | |
3808 vis_padd16(TMP22, TMP26, TMP22); | |
3809 | |
3810 vis_padd16(TMP20, REF_0, TMP20); | |
3811 vis_mul8x16au(REF_S4, CONST_256, REF_0); | |
3812 | |
3813 vis_padd16(TMP22, REF_2, TMP22); | |
3814 vis_pack16(TMP20, DST_2); | |
3815 | |
3816 vis_pack16(TMP22, DST_3); | |
3817 vis_st64_2(DST_2, dest, 8); | |
3818 dest += stride; | |
3819 | |
3820 vis_ld64_2(dest, 8, DST_2); | |
3821 vis_mul8x16al(DST_0, CONST_1024, TMP30); | |
3822 vis_pmerge(ZERO, REF_S4_1, REF_2); | |
3823 | |
3824 vis_mul8x16al(DST_1, CONST_1024, TMP32); | |
3825 vis_padd16(REF_4, TMP0, TMP8); | |
3826 | |
3827 vis_mul8x16au(REF_S6, CONST_256, REF_4); | |
3828 vis_padd16(REF_6, TMP2, TMP10); | |
3829 | |
3830 vis_mul8x16au(REF_S6_1, CONST_256, REF_6); | |
3831 vis_padd16(TMP8, TMP12, TMP8); | |
3832 | |
3833 vis_padd16(TMP10, TMP14, TMP10); | |
3834 | |
3835 vis_padd16(TMP8, TMP30, TMP8); | |
3836 | |
3837 vis_padd16(TMP10, TMP32, TMP10); | |
3838 vis_pack16(TMP8, DST_0); | |
3839 | |
3840 vis_pack16(TMP10, DST_1); | |
3841 vis_st64(DST_0, dest[0]); | |
3842 | |
3843 vis_padd16(REF_0, TMP4, REF_0); | |
3844 | |
3845 vis_mul8x16al(DST_2, CONST_1024, TMP30); | |
3846 vis_padd16(REF_2, TMP6, REF_2); | |
3847 | |
3848 vis_mul8x16al(DST_3, CONST_1024, TMP32); | |
3849 vis_padd16(REF_0, REF_4, REF_0); | |
3850 | |
3851 vis_padd16(REF_2, REF_6, REF_2); | |
3852 | |
3853 vis_padd16(REF_0, TMP30, REF_0); | |
3854 | |
3855 /* stall */ | |
3856 | |
3857 vis_padd16(REF_2, TMP32, REF_2); | |
3858 vis_pack16(REF_0, DST_2); | |
3859 | |
3860 vis_pack16(REF_2, DST_3); | |
3861 vis_st64_2(DST_2, dest, 8); | |
3862 dest += stride; | |
3863 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3864 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3865 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3866 static void MC_avg_no_round_xy_8_vis (uint8_t * dest, const uint8_t * _ref, |
2979 | 3867 const int stride, int height) |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3868 { |
2979 | 3869 uint8_t *ref = (uint8_t *) _ref; |
3870 unsigned long off = (unsigned long) ref & 0x7; | |
3871 unsigned long off_plus_1 = off + 1; | |
3872 int stride_8 = stride + 8; | |
3873 | |
3874 vis_set_gsr(4 << VIS_GSR_SCALEFACT_SHIFT); | |
3875 | |
3876 ref = vis_alignaddr(ref); | |
3877 | |
3878 vis_ld64(ref[0], TMP0); | |
3879 vis_fzero(ZERO); | |
3880 | |
3881 vis_ld64_2(ref, 8, TMP2); | |
3882 | |
3883 vis_ld64(constants6[0], CONST_6); | |
3884 | |
3885 vis_ld64(constants256_1024[0], CONST_256); | |
3886 vis_faligndata(TMP0, TMP2, REF_S0); | |
3887 | |
3888 if (off != 0x7) { | |
3889 vis_alignaddr_g0((void *)off_plus_1); | |
3890 vis_faligndata(TMP0, TMP2, REF_S2); | |
3891 } else { | |
3892 vis_src1(TMP2, REF_S2); | |
3893 } | |
3894 | |
3895 height >>= 1; | |
3896 do { /* 31 cycles */ | |
3897 vis_ld64_2(ref, stride, TMP0); | |
3898 vis_mul8x16au(REF_S0, CONST_256, TMP8); | |
3899 vis_pmerge(ZERO, REF_S0_1, TMP10); | |
3900 | |
3901 vis_ld64_2(ref, stride_8, TMP2); | |
3902 ref += stride; | |
3903 vis_mul8x16au(REF_S2, CONST_256, TMP12); | |
3904 vis_pmerge(ZERO, REF_S2_1, TMP14); | |
3905 | |
3906 vis_alignaddr_g0((void *)off); | |
3907 | |
3908 vis_ld64_2(ref, stride, TMP4); | |
3909 vis_faligndata(TMP0, TMP2, REF_S4); | |
3910 | |
3911 vis_ld64_2(ref, stride_8, TMP6); | |
3912 ref += stride; | |
3913 | |
3914 vis_ld64(dest[0], DST_0); | |
3915 vis_faligndata(TMP4, TMP6, REF_S0); | |
3916 | |
3917 vis_ld64_2(dest, stride, DST_2); | |
3918 | |
3919 if (off != 0x7) { | |
3920 vis_alignaddr_g0((void *)off_plus_1); | |
3921 vis_faligndata(TMP0, TMP2, REF_S6); | |
3922 vis_faligndata(TMP4, TMP6, REF_S2); | |
3923 } else { | |
3924 vis_src1(TMP2, REF_S6); | |
3925 vis_src1(TMP6, REF_S2); | |
3926 } | |
3927 | |
3928 vis_mul8x16al(DST_0, CONST_1024, TMP30); | |
3929 vis_pmerge(ZERO, REF_S4, TMP22); | |
3930 | |
3931 vis_mul8x16al(DST_1, CONST_1024, TMP32); | |
3932 vis_pmerge(ZERO, REF_S4_1, TMP24); | |
3933 | |
3934 vis_mul8x16au(REF_S6, CONST_256, TMP26); | |
3935 vis_pmerge(ZERO, REF_S6_1, TMP28); | |
3936 | |
3937 vis_mul8x16au(REF_S0, CONST_256, REF_S4); | |
3938 vis_padd16(TMP22, CONST_6, TMP22); | |
3939 | |
3940 vis_mul8x16au(REF_S0_1, CONST_256, REF_S6); | |
3941 vis_padd16(TMP24, CONST_6, TMP24); | |
3942 | |
3943 vis_mul8x16al(DST_2, CONST_1024, REF_0); | |
3944 vis_padd16(TMP22, TMP26, TMP22); | |
3945 | |
3946 vis_mul8x16al(DST_3, CONST_1024, REF_2); | |
3947 vis_padd16(TMP24, TMP28, TMP24); | |
3948 | |
3949 vis_mul8x16au(REF_S2, CONST_256, TMP26); | |
3950 vis_padd16(TMP8, TMP22, TMP8); | |
3951 | |
3952 vis_mul8x16au(REF_S2_1, CONST_256, TMP28); | |
3953 vis_padd16(TMP10, TMP24, TMP10); | |
3954 | |
3955 vis_padd16(TMP8, TMP12, TMP8); | |
3956 | |
3957 vis_padd16(TMP10, TMP14, TMP10); | |
3958 | |
3959 vis_padd16(TMP8, TMP30, TMP8); | |
3960 | |
3961 vis_padd16(TMP10, TMP32, TMP10); | |
3962 vis_pack16(TMP8, DST_0); | |
3963 | |
3964 vis_pack16(TMP10, DST_1); | |
3965 vis_st64(DST_0, dest[0]); | |
3966 dest += stride; | |
3967 | |
3968 vis_padd16(REF_S4, TMP22, TMP12); | |
3969 | |
3970 vis_padd16(REF_S6, TMP24, TMP14); | |
3971 | |
3972 vis_padd16(TMP12, TMP26, TMP12); | |
3973 | |
3974 vis_padd16(TMP14, TMP28, TMP14); | |
3975 | |
3976 vis_padd16(TMP12, REF_0, TMP12); | |
3977 | |
3978 vis_padd16(TMP14, REF_2, TMP14); | |
3979 vis_pack16(TMP12, DST_2); | |
3980 | |
3981 vis_pack16(TMP14, DST_3); | |
3982 vis_st64(DST_2, dest[0]); | |
3983 dest += stride; | |
3984 } while (--height); | |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3985 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3986 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3987 /* End of no rounding code */ |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
3988 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3989 static sigjmp_buf jmpbuf; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3990 static volatile sig_atomic_t canjump = 0; |
2967 | 3991 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3992 static void sigill_handler (int sig) |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3993 { |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3994 if (!canjump) { |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3995 signal (sig, SIG_DFL); |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3996 raise (sig); |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3997 } |
2361
8616fd2dd2ef
whitespace cleanup patch by (James A. Morrison <ja2morri>@<csclub>dot<uwaterloo>point<ca>)
michael
parents:
2136
diff
changeset
|
3998 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
3999 canjump = 0; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4000 siglongjmp (jmpbuf, 1); |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4001 } |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4002 |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4003 #define ACCEL_SPARC_VIS 1 |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4004 #define ACCEL_SPARC_VIS2 2 |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4005 |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4006 static int vis_level () |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4007 { |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4008 int accel = 0; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4009 |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4010 signal (SIGILL, sigill_handler); |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4011 if (sigsetjmp (jmpbuf, 1)) { |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4012 signal (SIGILL, SIG_DFL); |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4013 return accel; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4014 } |
2967 | 4015 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4016 canjump = 1; |
2967 | 4017 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4018 /* pdist %f0, %f0, %f0 */ |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4019 __asm__ __volatile__(".word\t0x81b007c0"); |
2361
8616fd2dd2ef
whitespace cleanup patch by (James A. Morrison <ja2morri>@<csclub>dot<uwaterloo>point<ca>)
michael
parents:
2136
diff
changeset
|
4020 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4021 canjump = 0; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4022 accel |= ACCEL_SPARC_VIS; |
2361
8616fd2dd2ef
whitespace cleanup patch by (James A. Morrison <ja2morri>@<csclub>dot<uwaterloo>point<ca>)
michael
parents:
2136
diff
changeset
|
4023 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4024 if (sigsetjmp (jmpbuf, 1)) { |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4025 signal (SIGILL, SIG_DFL); |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4026 return accel; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4027 } |
2361
8616fd2dd2ef
whitespace cleanup patch by (James A. Morrison <ja2morri>@<csclub>dot<uwaterloo>point<ca>)
michael
parents:
2136
diff
changeset
|
4028 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4029 canjump = 1; |
2361
8616fd2dd2ef
whitespace cleanup patch by (James A. Morrison <ja2morri>@<csclub>dot<uwaterloo>point<ca>)
michael
parents:
2136
diff
changeset
|
4030 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4031 /* edge8n %g0, %g0, %g0 */ |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4032 __asm__ __volatile__(".word\t0x81b00020"); |
2361
8616fd2dd2ef
whitespace cleanup patch by (James A. Morrison <ja2morri>@<csclub>dot<uwaterloo>point<ca>)
michael
parents:
2136
diff
changeset
|
4033 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4034 canjump = 0; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4035 accel |= ACCEL_SPARC_VIS2; |
2361
8616fd2dd2ef
whitespace cleanup patch by (James A. Morrison <ja2morri>@<csclub>dot<uwaterloo>point<ca>)
michael
parents:
2136
diff
changeset
|
4036 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4037 signal (SIGILL, SIG_DFL); |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4038 |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4039 return accel; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4040 } |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4041 |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
4042 /* libavcodec initialization code */ |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
4043 void dsputil_init_vis(DSPContext* c, AVCodecContext *avctx) |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
4044 { |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
4045 /* VIS specific optimisations */ |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4046 int accel = vis_level (); |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4047 |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4048 if (accel & ACCEL_SPARC_VIS) { |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4049 c->put_pixels_tab[0][0] = MC_put_o_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4050 c->put_pixels_tab[0][1] = MC_put_x_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4051 c->put_pixels_tab[0][2] = MC_put_y_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4052 c->put_pixels_tab[0][3] = MC_put_xy_16_vis; |
2967 | 4053 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4054 c->put_pixels_tab[1][0] = MC_put_o_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4055 c->put_pixels_tab[1][1] = MC_put_x_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4056 c->put_pixels_tab[1][2] = MC_put_y_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4057 c->put_pixels_tab[1][3] = MC_put_xy_8_vis; |
2967 | 4058 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4059 c->avg_pixels_tab[0][0] = MC_avg_o_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4060 c->avg_pixels_tab[0][1] = MC_avg_x_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4061 c->avg_pixels_tab[0][2] = MC_avg_y_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4062 c->avg_pixels_tab[0][3] = MC_avg_xy_16_vis; |
2967 | 4063 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4064 c->avg_pixels_tab[1][0] = MC_avg_o_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4065 c->avg_pixels_tab[1][1] = MC_avg_x_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4066 c->avg_pixels_tab[1][2] = MC_avg_y_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4067 c->avg_pixels_tab[1][3] = MC_avg_xy_8_vis; |
2967 | 4068 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4069 c->put_no_rnd_pixels_tab[0][0] = MC_put_no_round_o_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4070 c->put_no_rnd_pixels_tab[0][1] = MC_put_no_round_x_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4071 c->put_no_rnd_pixels_tab[0][2] = MC_put_no_round_y_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4072 c->put_no_rnd_pixels_tab[0][3] = MC_put_no_round_xy_16_vis; |
2967 | 4073 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4074 c->put_no_rnd_pixels_tab[1][0] = MC_put_no_round_o_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4075 c->put_no_rnd_pixels_tab[1][1] = MC_put_no_round_x_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4076 c->put_no_rnd_pixels_tab[1][2] = MC_put_no_round_y_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4077 c->put_no_rnd_pixels_tab[1][3] = MC_put_no_round_xy_8_vis; |
2967 | 4078 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4079 c->avg_no_rnd_pixels_tab[0][0] = MC_avg_no_round_o_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4080 c->avg_no_rnd_pixels_tab[0][1] = MC_avg_no_round_x_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4081 c->avg_no_rnd_pixels_tab[0][2] = MC_avg_no_round_y_16_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4082 c->avg_no_rnd_pixels_tab[0][3] = MC_avg_no_round_xy_16_vis; |
2967 | 4083 |
1966
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4084 c->avg_no_rnd_pixels_tab[1][0] = MC_avg_no_round_o_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4085 c->avg_no_rnd_pixels_tab[1][1] = MC_avg_no_round_x_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4086 c->avg_no_rnd_pixels_tab[1][2] = MC_avg_no_round_y_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4087 c->avg_no_rnd_pixels_tab[1][3] = MC_avg_no_round_xy_8_vis; |
e1fc7c598558
License change and cpu detection patch by (James Morrison <ja2morri at csclub dot uwaterloo dot ca>)
michael
parents:
1959
diff
changeset
|
4088 } |
1959
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
4089 } |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
4090 |
55b7435c59b8
VIS optimized motion compensation code. by (David S. Miller <davem at redhat dot com>)
michael
parents:
diff
changeset
|
4091 #endif /* !(ARCH_SPARC) */ |