Mercurial > libavcodec.hg
annotate ppc/dsputil_altivec.c @ 10952:ea8f891d997d libavcodec
H264 DXVA2 implementation
It allows VLD H264 decoding using DXVA2 (GPU assisted decoding API under
VISTA and Windows 7).
It is implemented by using AVHWAccel API. It has been tested successfully
for some time in VLC using an nvidia card on Windows 7.
To compile it, you need to have the system header dxva2api.h (either from
microsoft or using http://downloads.videolan.org/pub/videolan/testing/contrib/dxva2api.h)
The generated libavcodec.dll does not depend directly on any new lib as
the necessary objects are given by the application using FFmpeg.
author | fenrir |
---|---|
date | Wed, 20 Jan 2010 18:54:51 +0000 |
parents | 9f4b529bd5c0 |
children | 50415a8f1451 |
rev | line source |
---|---|
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
1 /* |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
2 * Copyright (c) 2002 Brian Foley |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
3 * Copyright (c) 2002 Dieter Shirley |
1949
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
4 * Copyright (c) 2003-2004 Romain Dolbeau <romain@dolbeau.org> |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
5 * |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3572
diff
changeset
|
6 * This file is part of FFmpeg. |
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3572
diff
changeset
|
7 * |
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3572
diff
changeset
|
8 * FFmpeg is free software; you can redistribute it and/or |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
9 * modify it under the terms of the GNU Lesser General Public |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
10 * License as published by the Free Software Foundation; either |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3572
diff
changeset
|
11 * version 2.1 of the License, or (at your option) any later version. |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
12 * |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3572
diff
changeset
|
13 * FFmpeg is distributed in the hope that it will be useful, |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
14 * but WITHOUT ANY WARRANTY; without even the implied warranty of |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
15 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
16 * Lesser General Public License for more details. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
17 * |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
18 * You should have received a copy of the GNU Lesser General Public |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3572
diff
changeset
|
19 * License along with FFmpeg; if not, write to the Free Software |
3036
0b546eab515d
Update licensing information: The FSF changed postal address.
diego
parents:
2979
diff
changeset
|
20 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
21 */ |
2967 | 22 |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
23 #include "config.h" |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
24 #if HAVE_ALTIVEC_H |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
25 #include <altivec.h> |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
26 #endif |
6763 | 27 #include "libavcodec/dsputil.h" |
6105
33674fb857b5
Change some files to only include the necessary headers.
diego
parents:
5964
diff
changeset
|
28 #include "dsputil_ppc.h" |
5750
09f99af1db40
Sanitize altivec code so it can be built with runtime check properly
lu_zero
parents:
5749
diff
changeset
|
29 #include "util_altivec.h" |
8307 | 30 #include "types_altivec.h" |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
31 |
1708 | 32 int sad16_x2_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h) |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
33 { |
981 | 34 int i; |
10082 | 35 int s; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
36 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0); |
981 | 37 vector unsigned char *tv; |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
38 vector unsigned char pix1v, pix2v, pix2iv, avgv, t5; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
39 vector unsigned int sad; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
40 vector signed int sumdiffs; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
41 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
42 s = 0; |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
43 sad = (vector unsigned int)vec_splat_u32(0); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
44 for (i = 0; i < h; i++) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
45 /* Read unaligned pixels into our vectors. The vectors are as follows: |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
46 pix1v: pix1[0]-pix1[15] |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
47 pix2v: pix2[0]-pix2[15] pix2iv: pix2[1]-pix2[16] */ |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
48 tv = (vector unsigned char *) pix1; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
49 pix1v = vec_perm(tv[0], tv[1], vec_lvsl(0, pix1)); |
2967 | 50 |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
51 tv = (vector unsigned char *) &pix2[0]; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
52 pix2v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[0])); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
53 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
54 tv = (vector unsigned char *) &pix2[1]; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
55 pix2iv = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[1])); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
56 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
57 /* Calculate the average vector */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
58 avgv = vec_avg(pix2v, pix2iv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
59 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
60 /* Calculate a sum of abs differences vector */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
61 t5 = vec_sub(vec_max(pix1v, avgv), vec_min(pix1v, avgv)); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
62 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
63 /* Add each 4 pixel group together and put 4 results into sad */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
64 sad = vec_sum4s(t5, sad); |
2967 | 65 |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
66 pix1 += line_size; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
67 pix2 += line_size; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
68 } |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
69 /* Sum up the four partial sums, and put the result into s */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
70 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
71 sumdiffs = vec_splat(sumdiffs, 3); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
72 vec_ste(sumdiffs, 0, &s); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
73 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
74 return s; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
75 } |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
76 |
1708 | 77 int sad16_y2_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h) |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
78 { |
981 | 79 int i; |
10082 | 80 int s; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
81 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0); |
981 | 82 vector unsigned char *tv; |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
83 vector unsigned char pix1v, pix2v, pix3v, avgv, t5; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
84 vector unsigned int sad; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
85 vector signed int sumdiffs; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
86 uint8_t *pix3 = pix2 + line_size; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
87 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
88 s = 0; |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
89 sad = (vector unsigned int)vec_splat_u32(0); |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
90 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
91 /* Due to the fact that pix3 = pix2 + line_size, the pix3 of one |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
92 iteration becomes pix2 in the next iteration. We can use this |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
93 fact to avoid a potentially expensive unaligned read, each |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
94 time around the loop. |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
95 Read unaligned pixels into our vectors. The vectors are as follows: |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
96 pix2v: pix2[0]-pix2[15] |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
97 Split the pixel vectors into shorts */ |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
98 tv = (vector unsigned char *) &pix2[0]; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
99 pix2v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[0])); |
2967 | 100 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
101 for (i = 0; i < h; i++) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
102 /* Read unaligned pixels into our vectors. The vectors are as follows: |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
103 pix1v: pix1[0]-pix1[15] |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
104 pix3v: pix3[0]-pix3[15] */ |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
105 tv = (vector unsigned char *) pix1; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
106 pix1v = vec_perm(tv[0], tv[1], vec_lvsl(0, pix1)); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
107 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
108 tv = (vector unsigned char *) &pix3[0]; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
109 pix3v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix3[0])); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
110 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
111 /* Calculate the average vector */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
112 avgv = vec_avg(pix2v, pix3v); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
113 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
114 /* Calculate a sum of abs differences vector */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
115 t5 = vec_sub(vec_max(pix1v, avgv), vec_min(pix1v, avgv)); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
116 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
117 /* Add each 4 pixel group together and put 4 results into sad */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
118 sad = vec_sum4s(t5, sad); |
2967 | 119 |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
120 pix1 += line_size; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
121 pix2v = pix3v; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
122 pix3 += line_size; |
2967 | 123 |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
124 } |
2967 | 125 |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
126 /* Sum up the four partial sums, and put the result into s */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
127 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
128 sumdiffs = vec_splat(sumdiffs, 3); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
129 vec_ste(sumdiffs, 0, &s); |
2967 | 130 return s; |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
131 } |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
132 |
1708 | 133 int sad16_xy2_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h) |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
134 { |
981 | 135 int i; |
10082 | 136 int s; |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
137 uint8_t *pix3 = pix2 + line_size; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
138 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0); |
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
139 const vector unsigned short two = (const vector unsigned short)vec_splat_u16(2); |
981 | 140 vector unsigned char *tv, avgv, t5; |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
141 vector unsigned char pix1v, pix2v, pix3v, pix2iv, pix3iv; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
142 vector unsigned short pix2lv, pix2hv, pix2ilv, pix2ihv; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
143 vector unsigned short pix3lv, pix3hv, pix3ilv, pix3ihv; |
981 | 144 vector unsigned short avghv, avglv; |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
145 vector unsigned short t1, t2, t3, t4; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
146 vector unsigned int sad; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
147 vector signed int sumdiffs; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
148 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
149 sad = (vector unsigned int)vec_splat_u32(0); |
2967 | 150 |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
151 s = 0; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
152 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
153 /* Due to the fact that pix3 = pix2 + line_size, the pix3 of one |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
154 iteration becomes pix2 in the next iteration. We can use this |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
155 fact to avoid a potentially expensive unaligned read, as well |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
156 as some splitting, and vector addition each time around the loop. |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
157 Read unaligned pixels into our vectors. The vectors are as follows: |
2979 | 158 pix2v: pix2[0]-pix2[15] pix2iv: pix2[1]-pix2[16] |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
159 Split the pixel vectors into shorts */ |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
160 tv = (vector unsigned char *) &pix2[0]; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
161 pix2v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[0])); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
162 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
163 tv = (vector unsigned char *) &pix2[1]; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
164 pix2iv = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[1])); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
165 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
166 pix2hv = (vector unsigned short) vec_mergeh(zero, pix2v); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
167 pix2lv = (vector unsigned short) vec_mergel(zero, pix2v); |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
168 pix2ihv = (vector unsigned short) vec_mergeh(zero, pix2iv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
169 pix2ilv = (vector unsigned short) vec_mergel(zero, pix2iv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
170 t1 = vec_add(pix2hv, pix2ihv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
171 t2 = vec_add(pix2lv, pix2ilv); |
2967 | 172 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
173 for (i = 0; i < h; i++) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
174 /* Read unaligned pixels into our vectors. The vectors are as follows: |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
175 pix1v: pix1[0]-pix1[15] |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
176 pix3v: pix3[0]-pix3[15] pix3iv: pix3[1]-pix3[16] */ |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
177 tv = (vector unsigned char *) pix1; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
178 pix1v = vec_perm(tv[0], tv[1], vec_lvsl(0, pix1)); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
179 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
180 tv = (vector unsigned char *) &pix3[0]; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
181 pix3v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix3[0])); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
182 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
183 tv = (vector unsigned char *) &pix3[1]; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
184 pix3iv = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix3[1])); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
185 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
186 /* Note that AltiVec does have vec_avg, but this works on vector pairs |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
187 and rounds up. We could do avg(avg(a,b),avg(c,d)), but the rounding |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
188 would mean that, for example, avg(3,0,0,1) = 2, when it should be 1. |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
189 Instead, we have to split the pixel vectors into vectors of shorts, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
190 and do the averaging by hand. */ |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
191 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
192 /* Split the pixel vectors into shorts */ |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
193 pix3hv = (vector unsigned short) vec_mergeh(zero, pix3v); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
194 pix3lv = (vector unsigned short) vec_mergel(zero, pix3v); |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
195 pix3ihv = (vector unsigned short) vec_mergeh(zero, pix3iv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
196 pix3ilv = (vector unsigned short) vec_mergel(zero, pix3iv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
197 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
198 /* Do the averaging on them */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
199 t3 = vec_add(pix3hv, pix3ihv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
200 t4 = vec_add(pix3lv, pix3ilv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
201 |
884 | 202 avghv = vec_sr(vec_add(vec_add(t1, t3), two), two); |
203 avglv = vec_sr(vec_add(vec_add(t2, t4), two), two); | |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
204 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
205 /* Pack the shorts back into a result */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
206 avgv = vec_pack(avghv, avglv); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
207 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
208 /* Calculate a sum of abs differences vector */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
209 t5 = vec_sub(vec_max(pix1v, avgv), vec_min(pix1v, avgv)); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
210 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
211 /* Add each 4 pixel group together and put 4 results into sad */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
212 sad = vec_sum4s(t5, sad); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
213 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
214 pix1 += line_size; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
215 pix3 += line_size; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
216 /* Transfer the calculated values for pix3 into pix2 */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
217 t1 = t3; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
218 t2 = t4; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
219 } |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
220 /* Sum up the four partial sums, and put the result into s */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
221 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
222 sumdiffs = vec_splat(sumdiffs, 3); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
223 vec_ste(sumdiffs, 0, &s); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
224 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
225 return s; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
226 } |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
227 |
1708 | 228 int sad16_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h) |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
229 { |
981 | 230 int i; |
10082 | 231 int s; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
232 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0); |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
233 vector unsigned char perm1, perm2, *pix1v, *pix2v; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
234 vector unsigned char t1, t2, t3,t4, t5; |
981 | 235 vector unsigned int sad; |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
236 vector signed int sumdiffs; |
2967 | 237 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
238 sad = (vector unsigned int)vec_splat_u32(0); |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
239 |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
240 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
241 for (i = 0; i < h; i++) { |
2979 | 242 /* Read potentially unaligned pixels into t1 and t2 */ |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
243 perm1 = vec_lvsl(0, pix1); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
244 pix1v = (vector unsigned char *) pix1; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
245 perm2 = vec_lvsl(0, pix2); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
246 pix2v = (vector unsigned char *) pix2; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
247 t1 = vec_perm(pix1v[0], pix1v[1], perm1); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
248 t2 = vec_perm(pix2v[0], pix2v[1], perm2); |
2967 | 249 |
2979 | 250 /* Calculate a sum of abs differences vector */ |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
251 t3 = vec_max(t1, t2); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
252 t4 = vec_min(t1, t2); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
253 t5 = vec_sub(t3, t4); |
2967 | 254 |
2979 | 255 /* Add each 4 pixel group together and put 4 results into sad */ |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
256 sad = vec_sum4s(t5, sad); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
257 |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
258 pix1 += line_size; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
259 pix2 += line_size; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
260 } |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
261 |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
262 /* Sum up the four partial sums, and put the result into s */ |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
263 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
264 sumdiffs = vec_splat(sumdiffs, 3); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
265 vec_ste(sumdiffs, 0, &s); |
2967 | 266 |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
267 return s; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
268 } |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
269 |
1708 | 270 int sad8_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h) |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
271 { |
981 | 272 int i; |
10082 | 273 int s; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
274 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0); |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
275 vector unsigned char perm1, perm2, permclear, *pix1v, *pix2v; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
276 vector unsigned char t1, t2, t3,t4, t5; |
981 | 277 vector unsigned int sad; |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
278 vector signed int sumdiffs; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
279 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
280 sad = (vector unsigned int)vec_splat_u32(0); |
1277
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
281 |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
282 permclear = (vector unsigned char){255,255,255,255,255,255,255,255,0,0,0,0,0,0,0,0}; |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
283 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
284 for (i = 0; i < h; i++) { |
2979 | 285 /* Read potentially unaligned pixels into t1 and t2 |
286 Since we're reading 16 pixels, and actually only want 8, | |
287 mask out the last 8 pixels. The 0s don't change the sum. */ | |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
288 perm1 = vec_lvsl(0, pix1); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
289 pix1v = (vector unsigned char *) pix1; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
290 perm2 = vec_lvsl(0, pix2); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
291 pix2v = (vector unsigned char *) pix2; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
292 t1 = vec_and(vec_perm(pix1v[0], pix1v[1], perm1), permclear); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
293 t2 = vec_and(vec_perm(pix2v[0], pix2v[1], perm2), permclear); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
294 |
2979 | 295 /* Calculate a sum of abs differences vector */ |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
296 t3 = vec_max(t1, t2); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
297 t4 = vec_min(t1, t2); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
298 t5 = vec_sub(t3, t4); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
299 |
2979 | 300 /* Add each 4 pixel group together and put 4 results into sad */ |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
301 sad = vec_sum4s(t5, sad); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
302 |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
303 pix1 += line_size; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
304 pix2 += line_size; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
305 } |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
306 |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
307 /* Sum up the four partial sums, and put the result into s */ |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
308 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
309 sumdiffs = vec_splat(sumdiffs, 3); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
310 vec_ste(sumdiffs, 0, &s); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
311 |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
312 return s; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
313 } |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
314 |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
315 int pix_norm1_altivec(uint8_t *pix, int line_size) |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
316 { |
981 | 317 int i; |
10082 | 318 int s; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
319 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0); |
981 | 320 vector unsigned char *tv; |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
321 vector unsigned char pixv; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
322 vector unsigned int sv; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
323 vector signed int sum; |
2967 | 324 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
325 sv = (vector unsigned int)vec_splat_u32(0); |
2967 | 326 |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
327 s = 0; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
328 for (i = 0; i < 16; i++) { |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
329 /* Read in the potentially unaligned pixels */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
330 tv = (vector unsigned char *) pix; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
331 pixv = vec_perm(tv[0], tv[1], vec_lvsl(0, pix)); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
332 |
884 | 333 /* Square the values, and add them to our sum */ |
334 sv = vec_msum(pixv, pixv, sv); | |
878
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
335 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
336 pix += line_size; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
337 } |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
338 /* Sum up the four partial sums, and put the result into s */ |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
339 sum = vec_sums((vector signed int) sv, (vector signed int) zero); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
340 sum = vec_splat(sum, 3); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
341 vec_ste(sum, 0, &s); |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
342 |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
343 return s; |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
344 } |
6ea69518e5f7
altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
828
diff
changeset
|
345 |
981 | 346 /** |
347 * Sum of Squared Errors for a 8x8 block. | |
348 * AltiVec-enhanced. | |
1708 | 349 * It's the sad8_altivec code above w/ squaring added. |
981 | 350 */ |
1708 | 351 int sse8_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h) |
981 | 352 { |
353 int i; | |
10082 | 354 int s; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
355 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0); |
981 | 356 vector unsigned char perm1, perm2, permclear, *pix1v, *pix2v; |
357 vector unsigned char t1, t2, t3,t4, t5; | |
358 vector unsigned int sum; | |
359 vector signed int sumsqr; | |
2967 | 360 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
361 sum = (vector unsigned int)vec_splat_u32(0); |
1277
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
362 |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
363 permclear = (vector unsigned char){255,255,255,255,255,255,255,255,0,0,0,0,0,0,0,0}; |
1277
f3152eb76f1a
altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents:
1064
diff
changeset
|
364 |
2967 | 365 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
366 for (i = 0; i < h; i++) { |
2979 | 367 /* Read potentially unaligned pixels into t1 and t2 |
368 Since we're reading 16 pixels, and actually only want 8, | |
369 mask out the last 8 pixels. The 0s don't change the sum. */ | |
981 | 370 perm1 = vec_lvsl(0, pix1); |
371 pix1v = (vector unsigned char *) pix1; | |
372 perm2 = vec_lvsl(0, pix2); | |
373 pix2v = (vector unsigned char *) pix2; | |
374 t1 = vec_and(vec_perm(pix1v[0], pix1v[1], perm1), permclear); | |
375 t2 = vec_and(vec_perm(pix2v[0], pix2v[1], perm2), permclear); | |
376 | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
377 /* Since we want to use unsigned chars, we can take advantage |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
378 of the fact that abs(a-b)^2 = (a-b)^2. */ |
2967 | 379 |
2979 | 380 /* Calculate abs differences vector */ |
981 | 381 t3 = vec_max(t1, t2); |
382 t4 = vec_min(t1, t2); | |
383 t5 = vec_sub(t3, t4); | |
2967 | 384 |
981 | 385 /* Square the values and add them to our sum */ |
386 sum = vec_msum(t5, t5, sum); | |
2967 | 387 |
981 | 388 pix1 += line_size; |
389 pix2 += line_size; | |
390 } | |
2967 | 391 |
981 | 392 /* Sum up the four partial sums, and put the result into s */ |
393 sumsqr = vec_sums((vector signed int) sum, (vector signed int) zero); | |
394 sumsqr = vec_splat(sumsqr, 3); | |
395 vec_ste(sumsqr, 0, &s); | |
2967 | 396 |
981 | 397 return s; |
398 } | |
399 | |
400 /** | |
401 * Sum of Squared Errors for a 16x16 block. | |
402 * AltiVec-enhanced. | |
1708 | 403 * It's the sad16_altivec code above w/ squaring added. |
981 | 404 */ |
1708 | 405 int sse16_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h) |
981 | 406 { |
407 int i; | |
10082 | 408 int s; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
409 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0); |
981 | 410 vector unsigned char perm1, perm2, *pix1v, *pix2v; |
411 vector unsigned char t1, t2, t3,t4, t5; | |
412 vector unsigned int sum; | |
413 vector signed int sumsqr; | |
2967 | 414 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
415 sum = (vector unsigned int)vec_splat_u32(0); |
2967 | 416 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
417 for (i = 0; i < h; i++) { |
2979 | 418 /* Read potentially unaligned pixels into t1 and t2 */ |
981 | 419 perm1 = vec_lvsl(0, pix1); |
420 pix1v = (vector unsigned char *) pix1; | |
421 perm2 = vec_lvsl(0, pix2); | |
422 pix2v = (vector unsigned char *) pix2; | |
423 t1 = vec_perm(pix1v[0], pix1v[1], perm1); | |
424 t2 = vec_perm(pix2v[0], pix2v[1], perm2); | |
425 | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
426 /* Since we want to use unsigned chars, we can take advantage |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
427 of the fact that abs(a-b)^2 = (a-b)^2. */ |
2967 | 428 |
2979 | 429 /* Calculate abs differences vector */ |
981 | 430 t3 = vec_max(t1, t2); |
431 t4 = vec_min(t1, t2); | |
432 t5 = vec_sub(t3, t4); | |
2967 | 433 |
981 | 434 /* Square the values and add them to our sum */ |
435 sum = vec_msum(t5, t5, sum); | |
2967 | 436 |
981 | 437 pix1 += line_size; |
438 pix2 += line_size; | |
439 } | |
2967 | 440 |
981 | 441 /* Sum up the four partial sums, and put the result into s */ |
442 sumsqr = vec_sums((vector signed int) sum, (vector signed int) zero); | |
443 sumsqr = vec_splat(sumsqr, 3); | |
444 vec_ste(sumsqr, 0, &s); | |
2967 | 445 |
981 | 446 return s; |
447 } | |
448 | |
1064 | 449 int pix_sum_altivec(uint8_t * pix, int line_size) |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
450 { |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
451 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0); |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
452 vector unsigned char perm, *pixv; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
453 vector unsigned char t1; |
981 | 454 vector unsigned int sad; |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
455 vector signed int sumdiffs; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
456 |
981 | 457 int i; |
10082 | 458 int s; |
2967 | 459 |
1033
b4172ff70d27
Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents:
1024
diff
changeset
|
460 sad = (vector unsigned int)vec_splat_u32(0); |
2967 | 461 |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
462 for (i = 0; i < 16; i++) { |
2979 | 463 /* Read the potentially unaligned 16 pixels into t1 */ |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
464 perm = vec_lvsl(0, pix); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
465 pixv = (vector unsigned char *) pix; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
466 t1 = vec_perm(pixv[0], pixv[1], perm); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
467 |
2979 | 468 /* Add each 4 pixel group together and put 4 results into sad */ |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
469 sad = vec_sum4s(t1, sad); |
2967 | 470 |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
471 pix += line_size; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
472 } |
2967 | 473 |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
474 /* Sum up the four partial sums, and put the result into s */ |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
475 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
476 sumdiffs = vec_splat(sumdiffs, 3); |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
477 vec_ste(sumdiffs, 0, &s); |
2967 | 478 |
623
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
479 return s; |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
480 } |
92e99e506920
first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff
changeset
|
481 |
1064 | 482 void get_pixels_altivec(DCTELEM *restrict block, const uint8_t *pixels, int line_size) |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
483 { |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
484 int i; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
485 vector unsigned char perm, bytes, *pixv; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
486 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0); |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
487 vector signed short shorts; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
488 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
489 for (i = 0; i < 8; i++) { |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
490 // Read potentially unaligned pixels. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
491 // We're reading 16 pixels, and actually only want 8, |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
492 // but we simply ignore the extras. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
493 perm = vec_lvsl(0, pixels); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
494 pixv = (vector unsigned char *) pixels; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
495 bytes = vec_perm(pixv[0], pixv[1], perm); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
496 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
497 // convert the bytes into shorts |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
498 shorts = (vector signed short)vec_mergeh(zero, bytes); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
499 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
500 // save the data to the block, we assume the block is 16-byte aligned |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
501 vec_st(shorts, i*16, (vector signed short*)block); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
502 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
503 pixels += line_size; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
504 } |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
505 } |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
506 |
1064 | 507 void diff_pixels_altivec(DCTELEM *restrict block, const uint8_t *s1, |
508 const uint8_t *s2, int stride) | |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
509 { |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
510 int i; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
511 vector unsigned char perm, bytes, *pixv; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
512 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0); |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
513 vector signed short shorts1, shorts2; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
514 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
515 for (i = 0; i < 4; i++) { |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
516 // Read potentially unaligned pixels |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
517 // We're reading 16 pixels, and actually only want 8, |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
518 // but we simply ignore the extras. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
519 perm = vec_lvsl(0, s1); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
520 pixv = (vector unsigned char *) s1; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
521 bytes = vec_perm(pixv[0], pixv[1], perm); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
522 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
523 // convert the bytes into shorts |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
524 shorts1 = (vector signed short)vec_mergeh(zero, bytes); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
525 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
526 // Do the same for the second block of pixels |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
527 perm = vec_lvsl(0, s2); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
528 pixv = (vector unsigned char *) s2; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
529 bytes = vec_perm(pixv[0], pixv[1], perm); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
530 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
531 // convert the bytes into shorts |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
532 shorts2 = (vector signed short)vec_mergeh(zero, bytes); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
533 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
534 // Do the subtraction |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
535 shorts1 = vec_sub(shorts1, shorts2); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
536 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
537 // save the data to the block, we assume the block is 16-byte aligned |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
538 vec_st(shorts1, 0, (vector signed short*)block); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
539 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
540 s1 += stride; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
541 s2 += stride; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
542 block += 8; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
543 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
544 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
545 // The code below is a copy of the code above... This is a manual |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
546 // unroll. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
547 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
548 // Read potentially unaligned pixels |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
549 // We're reading 16 pixels, and actually only want 8, |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
550 // but we simply ignore the extras. |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
551 perm = vec_lvsl(0, s1); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
552 pixv = (vector unsigned char *) s1; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
553 bytes = vec_perm(pixv[0], pixv[1], perm); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
554 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
555 // convert the bytes into shorts |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
556 shorts1 = (vector signed short)vec_mergeh(zero, bytes); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
557 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
558 // Do the same for the second block of pixels |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
559 perm = vec_lvsl(0, s2); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
560 pixv = (vector unsigned char *) s2; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
561 bytes = vec_perm(pixv[0], pixv[1], perm); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
562 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
563 // convert the bytes into shorts |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
564 shorts2 = (vector signed short)vec_mergeh(zero, bytes); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
565 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
566 // Do the subtraction |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
567 shorts1 = vec_sub(shorts1, shorts2); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
568 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
569 // save the data to the block, we assume the block is 16-byte aligned |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
570 vec_st(shorts1, 0, (vector signed short*)block); |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
571 |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
572 s1 += stride; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
573 s2 += stride; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
574 block += 8; |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
575 } |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
576 } |
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
577 |
8307 | 578 |
579 static void clear_block_altivec(DCTELEM *block) { | |
580 LOAD_ZERO; | |
581 vec_st(zero_s16v, 0, block); | |
582 vec_st(zero_s16v, 16, block); | |
583 vec_st(zero_s16v, 32, block); | |
584 vec_st(zero_s16v, 48, block); | |
585 vec_st(zero_s16v, 64, block); | |
586 vec_st(zero_s16v, 80, block); | |
587 vec_st(zero_s16v, 96, block); | |
588 vec_st(zero_s16v, 112, block); | |
589 } | |
590 | |
591 | |
995
edc10966b081
altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents:
981
diff
changeset
|
592 void add_bytes_altivec(uint8_t *dst, uint8_t *src, int w) { |
edc10966b081
altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents:
981
diff
changeset
|
593 register int i; |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
594 register vector unsigned char vdst, vsrc; |
2967 | 595 |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
596 /* dst and src are 16 bytes-aligned (guaranteed) */ |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
597 for (i = 0 ; (i + 15) < w ; i+=16) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
598 vdst = vec_ld(i, (unsigned char*)dst); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
599 vsrc = vec_ld(i, (unsigned char*)src); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
600 vdst = vec_add(vsrc, vdst); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
601 vec_st(vdst, i, (unsigned char*)dst); |
995
edc10966b081
altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents:
981
diff
changeset
|
602 } |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
603 /* if w is not a multiple of 16 */ |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
604 for (; (i < w) ; i++) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
605 dst[i] = src[i]; |
995
edc10966b081
altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents:
981
diff
changeset
|
606 } |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
607 } |
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
608 |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
609 /* next one assumes that ((line_size % 16) == 0) */ |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
610 void put_pixels16_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h) |
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
611 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
612 POWERPC_PERF_DECLARE(altivec_put_pixels16_num, 1); |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
613 register vector unsigned char pixelsv1, pixelsv2; |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
614 register vector unsigned char pixelsv1B, pixelsv2B; |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
615 register vector unsigned char pixelsv1C, pixelsv2C; |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
616 register vector unsigned char pixelsv1D, pixelsv2D; |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
617 |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
618 register vector unsigned char perm = vec_lvsl(0, pixels); |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
619 int i; |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
620 register int line_size_2 = line_size << 1; |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
621 register int line_size_3 = line_size + line_size_2; |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
622 register int line_size_4 = line_size << 2; |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
623 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
624 POWERPC_PERF_START_COUNT(altivec_put_pixels16_num, 1); |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
625 // hand-unrolling the loop by 4 gains about 15% |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
626 // mininum execution time goes from 74 to 60 cycles |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
627 // it's faster than -funroll-loops, but using |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
628 // -funroll-loops w/ this is bad - 74 cycles again. |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
629 // all this is on a 7450, tuning for the 7450 |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
630 #if 0 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
631 for (i = 0; i < h; i++) { |
9668 | 632 pixelsv1 = vec_ld(0, pixels); |
633 pixelsv2 = vec_ld(16, pixels); | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
634 vec_st(vec_perm(pixelsv1, pixelsv2, perm), |
9668 | 635 0, block); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
636 pixels+=line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
637 block +=line_size; |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
638 } |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
639 #else |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
640 for (i = 0; i < h; i += 4) { |
9668 | 641 pixelsv1 = vec_ld( 0, pixels); |
642 pixelsv2 = vec_ld(15, pixels); | |
643 pixelsv1B = vec_ld(line_size, pixels); | |
644 pixelsv2B = vec_ld(15 + line_size, pixels); | |
645 pixelsv1C = vec_ld(line_size_2, pixels); | |
646 pixelsv2C = vec_ld(15 + line_size_2, pixels); | |
647 pixelsv1D = vec_ld(line_size_3, pixels); | |
648 pixelsv2D = vec_ld(15 + line_size_3, pixels); | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
649 vec_st(vec_perm(pixelsv1, pixelsv2, perm), |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
650 0, (unsigned char*)block); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
651 vec_st(vec_perm(pixelsv1B, pixelsv2B, perm), |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
652 line_size, (unsigned char*)block); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
653 vec_st(vec_perm(pixelsv1C, pixelsv2C, perm), |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
654 line_size_2, (unsigned char*)block); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
655 vec_st(vec_perm(pixelsv1D, pixelsv2D, perm), |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
656 line_size_3, (unsigned char*)block); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
657 pixels+=line_size_4; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
658 block +=line_size_4; |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
659 } |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
660 #endif |
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
661 POWERPC_PERF_STOP_COUNT(altivec_put_pixels16_num, 1); |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
662 } |
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
663 |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
664 /* next one assumes that ((line_size % 16) == 0) */ |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
665 #define op_avg(a,b) a = ( ((a)|(b)) - ((((a)^(b))&0xFEFEFEFEUL)>>1) ) |
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
666 void avg_pixels16_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h) |
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
667 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
668 POWERPC_PERF_DECLARE(altivec_avg_pixels16_num, 1); |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
669 register vector unsigned char pixelsv1, pixelsv2, pixelsv, blockv; |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
670 register vector unsigned char perm = vec_lvsl(0, pixels); |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
671 int i; |
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
672 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
673 POWERPC_PERF_START_COUNT(altivec_avg_pixels16_num, 1); |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
674 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
675 for (i = 0; i < h; i++) { |
9668 | 676 pixelsv1 = vec_ld( 0, pixels); |
677 pixelsv2 = vec_ld(16,pixels); | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
678 blockv = vec_ld(0, block); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
679 pixelsv = vec_perm(pixelsv1, pixelsv2, perm); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
680 blockv = vec_avg(blockv,pixelsv); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
681 vec_st(blockv, 0, (unsigned char*)block); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
682 pixels+=line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
683 block +=line_size; |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
684 } |
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
685 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
686 POWERPC_PERF_STOP_COUNT(altivec_avg_pixels16_num, 1); |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
687 } |
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
688 |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
689 /* next one assumes that ((line_size % 8) == 0) */ |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
690 void avg_pixels8_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h) |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
691 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
692 POWERPC_PERF_DECLARE(altivec_avg_pixels8_num, 1); |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
693 register vector unsigned char pixelsv1, pixelsv2, pixelsv, blockv; |
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
694 int i; |
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
695 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
696 POWERPC_PERF_START_COUNT(altivec_avg_pixels8_num, 1); |
2967 | 697 |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
698 for (i = 0; i < h; i++) { |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
699 /* block is 8 bytes-aligned, so we're either in the |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
700 left block (16 bytes-aligned) or in the right block (not) */ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
701 int rightside = ((unsigned long)block & 0x0000000F); |
2967 | 702 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
703 blockv = vec_ld(0, block); |
9668 | 704 pixelsv1 = vec_ld( 0, pixels); |
705 pixelsv2 = vec_ld(16, pixels); | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
706 pixelsv = vec_perm(pixelsv1, pixelsv2, vec_lvsl(0, pixels)); |
2967 | 707 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
708 if (rightside) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
709 pixelsv = vec_perm(blockv, pixelsv, vcprm(0,1,s0,s1)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
710 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
711 pixelsv = vec_perm(blockv, pixelsv, vcprm(s0,s1,2,3)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
712 } |
2967 | 713 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
714 blockv = vec_avg(blockv, pixelsv); |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
715 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
716 vec_st(blockv, 0, block); |
2967 | 717 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
718 pixels += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
719 block += line_size; |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
720 } |
2967 | 721 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
722 POWERPC_PERF_STOP_COUNT(altivec_avg_pixels8_num, 1); |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
723 } |
1009
3b7cc8e4b83f
AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
995
diff
changeset
|
724 |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
725 /* next one assumes that ((line_size % 8) == 0) */ |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
726 void put_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h) |
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
727 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
728 POWERPC_PERF_DECLARE(altivec_put_pixels8_xy2_num, 1); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
729 register int i; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
730 register vector unsigned char pixelsv1, pixelsv2, pixelsavg; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
731 register vector unsigned char blockv, temp1, temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
732 register vector unsigned short pixelssum1, pixelssum2, temp3; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
733 register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
734 register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2); |
2967 | 735 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
736 temp1 = vec_ld(0, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
737 temp2 = vec_ld(16, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
738 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
739 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
740 pixelsv2 = temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
741 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
742 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
743 } |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
744 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
745 pixelsv2 = vec_mergeh(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
746 pixelssum1 = vec_add((vector unsigned short)pixelsv1, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
747 (vector unsigned short)pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
748 pixelssum1 = vec_add(pixelssum1, vctwo); |
2967 | 749 |
750 POWERPC_PERF_START_COUNT(altivec_put_pixels8_xy2_num, 1); | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
751 for (i = 0; i < h ; i++) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
752 int rightside = ((unsigned long)block & 0x0000000F); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
753 blockv = vec_ld(0, block); |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
754 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
755 temp1 = vec_ld(line_size, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
756 temp2 = vec_ld(line_size + 16, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
757 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
758 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
759 pixelsv2 = temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
760 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
761 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
762 } |
1015
35cf2f4a0f8c
PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1009
diff
changeset
|
763 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
764 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
765 pixelsv2 = vec_mergeh(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
766 pixelssum2 = vec_add((vector unsigned short)pixelsv1, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
767 (vector unsigned short)pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
768 temp3 = vec_add(pixelssum1, pixelssum2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
769 temp3 = vec_sra(temp3, vctwo); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
770 pixelssum1 = vec_add(pixelssum2, vctwo); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
771 pixelsavg = vec_packsu(temp3, (vector unsigned short) vczero); |
2967 | 772 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
773 if (rightside) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
774 blockv = vec_perm(blockv, pixelsavg, vcprm(0, 1, s0, s1)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
775 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
776 blockv = vec_perm(blockv, pixelsavg, vcprm(s0, s1, 2, 3)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
777 } |
2967 | 778 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
779 vec_st(blockv, 0, block); |
2967 | 780 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
781 block += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
782 pixels += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
783 } |
2967 | 784 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
785 POWERPC_PERF_STOP_COUNT(altivec_put_pixels8_xy2_num, 1); |
995
edc10966b081
altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents:
981
diff
changeset
|
786 } |
828
ace3ccd18dd2
Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents:
638
diff
changeset
|
787 |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
788 /* next one assumes that ((line_size % 8) == 0) */ |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
789 void put_no_rnd_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h) |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
790 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
791 POWERPC_PERF_DECLARE(altivec_put_no_rnd_pixels8_xy2_num, 1); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
792 register int i; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
793 register vector unsigned char pixelsv1, pixelsv2, pixelsavg; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
794 register vector unsigned char blockv, temp1, temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
795 register vector unsigned short pixelssum1, pixelssum2, temp3; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
796 register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
797 register const vector unsigned short vcone = (const vector unsigned short)vec_splat_u16(1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
798 register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2); |
2967 | 799 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
800 temp1 = vec_ld(0, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
801 temp2 = vec_ld(16, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
802 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
803 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
804 pixelsv2 = temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
805 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
806 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
807 } |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
808 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
809 pixelsv2 = vec_mergeh(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
810 pixelssum1 = vec_add((vector unsigned short)pixelsv1, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
811 (vector unsigned short)pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
812 pixelssum1 = vec_add(pixelssum1, vcone); |
2967 | 813 |
814 POWERPC_PERF_START_COUNT(altivec_put_no_rnd_pixels8_xy2_num, 1); | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
815 for (i = 0; i < h ; i++) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
816 int rightside = ((unsigned long)block & 0x0000000F); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
817 blockv = vec_ld(0, block); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
818 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
819 temp1 = vec_ld(line_size, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
820 temp2 = vec_ld(line_size + 16, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
821 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
822 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
823 pixelsv2 = temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
824 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
825 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
826 } |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
827 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
828 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
829 pixelsv2 = vec_mergeh(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
830 pixelssum2 = vec_add((vector unsigned short)pixelsv1, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
831 (vector unsigned short)pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
832 temp3 = vec_add(pixelssum1, pixelssum2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
833 temp3 = vec_sra(temp3, vctwo); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
834 pixelssum1 = vec_add(pixelssum2, vcone); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
835 pixelsavg = vec_packsu(temp3, (vector unsigned short) vczero); |
2967 | 836 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
837 if (rightside) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
838 blockv = vec_perm(blockv, pixelsavg, vcprm(0, 1, s0, s1)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
839 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
840 blockv = vec_perm(blockv, pixelsavg, vcprm(s0, s1, 2, 3)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
841 } |
2967 | 842 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
843 vec_st(blockv, 0, block); |
2967 | 844 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
845 block += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
846 pixels += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
847 } |
2967 | 848 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
849 POWERPC_PERF_STOP_COUNT(altivec_put_no_rnd_pixels8_xy2_num, 1); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
850 } |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
851 |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
852 /* next one assumes that ((line_size % 16) == 0) */ |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
853 void put_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h) |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
854 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
855 POWERPC_PERF_DECLARE(altivec_put_pixels16_xy2_num, 1); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
856 register int i; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
857 register vector unsigned char pixelsv1, pixelsv2, pixelsv3, pixelsv4; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
858 register vector unsigned char blockv, temp1, temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
859 register vector unsigned short temp3, temp4, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
860 pixelssum1, pixelssum2, pixelssum3, pixelssum4; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
861 register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
862 register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2); |
1340
09b8fe0f0139
PPC fixes & clean-up patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1277
diff
changeset
|
863 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
864 POWERPC_PERF_START_COUNT(altivec_put_pixels16_xy2_num, 1); |
2967 | 865 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
866 temp1 = vec_ld(0, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
867 temp2 = vec_ld(16, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
868 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
869 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
870 pixelsv2 = temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
871 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
872 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
873 } |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
874 pixelsv3 = vec_mergel(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
875 pixelsv4 = vec_mergel(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
876 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
877 pixelsv2 = vec_mergeh(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
878 pixelssum3 = vec_add((vector unsigned short)pixelsv3, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
879 (vector unsigned short)pixelsv4); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
880 pixelssum3 = vec_add(pixelssum3, vctwo); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
881 pixelssum1 = vec_add((vector unsigned short)pixelsv1, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
882 (vector unsigned short)pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
883 pixelssum1 = vec_add(pixelssum1, vctwo); |
2967 | 884 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
885 for (i = 0; i < h ; i++) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
886 blockv = vec_ld(0, block); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
887 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
888 temp1 = vec_ld(line_size, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
889 temp2 = vec_ld(line_size + 16, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
890 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
891 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
892 pixelsv2 = temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
893 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
894 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
895 } |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
896 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
897 pixelsv3 = vec_mergel(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
898 pixelsv4 = vec_mergel(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
899 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
900 pixelsv2 = vec_mergeh(vczero, pixelsv2); |
2967 | 901 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
902 pixelssum4 = vec_add((vector unsigned short)pixelsv3, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
903 (vector unsigned short)pixelsv4); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
904 pixelssum2 = vec_add((vector unsigned short)pixelsv1, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
905 (vector unsigned short)pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
906 temp4 = vec_add(pixelssum3, pixelssum4); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
907 temp4 = vec_sra(temp4, vctwo); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
908 temp3 = vec_add(pixelssum1, pixelssum2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
909 temp3 = vec_sra(temp3, vctwo); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
910 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
911 pixelssum3 = vec_add(pixelssum4, vctwo); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
912 pixelssum1 = vec_add(pixelssum2, vctwo); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
913 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
914 blockv = vec_packsu(temp3, temp4); |
2967 | 915 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
916 vec_st(blockv, 0, block); |
2967 | 917 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
918 block += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
919 pixels += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
920 } |
2967 | 921 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
922 POWERPC_PERF_STOP_COUNT(altivec_put_pixels16_xy2_num, 1); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
923 } |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
924 |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
925 /* next one assumes that ((line_size % 16) == 0) */ |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
926 void put_no_rnd_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h) |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
927 { |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
928 POWERPC_PERF_DECLARE(altivec_put_no_rnd_pixels16_xy2_num, 1); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
929 register int i; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
930 register vector unsigned char pixelsv1, pixelsv2, pixelsv3, pixelsv4; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
931 register vector unsigned char blockv, temp1, temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
932 register vector unsigned short temp3, temp4, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
933 pixelssum1, pixelssum2, pixelssum3, pixelssum4; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
934 register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
935 register const vector unsigned short vcone = (const vector unsigned short)vec_splat_u16(1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
936 register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2); |
1340
09b8fe0f0139
PPC fixes & clean-up patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1277
diff
changeset
|
937 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
938 POWERPC_PERF_START_COUNT(altivec_put_no_rnd_pixels16_xy2_num, 1); |
2967 | 939 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
940 temp1 = vec_ld(0, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
941 temp2 = vec_ld(16, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
942 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
943 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
944 pixelsv2 = temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
945 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
946 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
947 } |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
948 pixelsv3 = vec_mergel(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
949 pixelsv4 = vec_mergel(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
950 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
951 pixelsv2 = vec_mergeh(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
952 pixelssum3 = vec_add((vector unsigned short)pixelsv3, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
953 (vector unsigned short)pixelsv4); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
954 pixelssum3 = vec_add(pixelssum3, vcone); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
955 pixelssum1 = vec_add((vector unsigned short)pixelsv1, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
956 (vector unsigned short)pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
957 pixelssum1 = vec_add(pixelssum1, vcone); |
2967 | 958 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
959 for (i = 0; i < h ; i++) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
960 blockv = vec_ld(0, block); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
961 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
962 temp1 = vec_ld(line_size, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
963 temp2 = vec_ld(line_size + 16, pixels); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
964 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
965 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F) { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
966 pixelsv2 = temp2; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
967 } else { |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
968 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels)); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
969 } |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
970 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
971 pixelsv3 = vec_mergel(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
972 pixelsv4 = vec_mergel(vczero, pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
973 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
974 pixelsv2 = vec_mergeh(vczero, pixelsv2); |
2967 | 975 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
976 pixelssum4 = vec_add((vector unsigned short)pixelsv3, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
977 (vector unsigned short)pixelsv4); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
978 pixelssum2 = vec_add((vector unsigned short)pixelsv1, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
979 (vector unsigned short)pixelsv2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
980 temp4 = vec_add(pixelssum3, pixelssum4); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
981 temp4 = vec_sra(temp4, vctwo); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
982 temp3 = vec_add(pixelssum1, pixelssum2); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
983 temp3 = vec_sra(temp3, vctwo); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
984 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
985 pixelssum3 = vec_add(pixelssum4, vcone); |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
986 pixelssum1 = vec_add(pixelssum2, vcone); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
987 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
988 blockv = vec_packsu(temp3, temp4); |
2967 | 989 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
990 vec_st(blockv, 0, block); |
2967 | 991 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
992 block += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
993 pixels += line_size; |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
994 } |
2967 | 995 |
1352
e8ff4783f188
1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents:
1340
diff
changeset
|
996 POWERPC_PERF_STOP_COUNT(altivec_put_no_rnd_pixels16_xy2_num, 1); |
1024
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
997 } |
9cc1031e1864
More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents:
1015
diff
changeset
|
998 |
1949
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
999 int hadamard8_diff8x8_altivec(/*MpegEncContext*/ void *s, uint8_t *dst, uint8_t *src, int stride, int h){ |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1000 POWERPC_PERF_DECLARE(altivec_hadamard8_diff8x8_num, 1); |
3554 | 1001 int sum; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1002 register const vector unsigned char vzero = |
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1003 (const vector unsigned char)vec_splat_u8(0); |
3554 | 1004 register vector signed short temp0, temp1, temp2, temp3, temp4, |
1005 temp5, temp6, temp7; | |
3346
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1006 POWERPC_PERF_START_COUNT(altivec_hadamard8_diff8x8_num, 1); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1007 { |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1008 register const vector signed short vprod1 =(const vector signed short) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1009 { 1,-1, 1,-1, 1,-1, 1,-1 }; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1010 register const vector signed short vprod2 =(const vector signed short) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1011 { 1, 1,-1,-1, 1, 1,-1,-1 }; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1012 register const vector signed short vprod3 =(const vector signed short) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1013 { 1, 1, 1, 1,-1,-1,-1,-1 }; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1014 register const vector unsigned char perm1 = (const vector unsigned char) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1015 {0x02, 0x03, 0x00, 0x01, 0x06, 0x07, 0x04, 0x05, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1016 0x0A, 0x0B, 0x08, 0x09, 0x0E, 0x0F, 0x0C, 0x0D}; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1017 register const vector unsigned char perm2 = (const vector unsigned char) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1018 {0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1019 0x0C, 0x0D, 0x0E, 0x0F, 0x08, 0x09, 0x0A, 0x0B}; |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1020 register const vector unsigned char perm3 = (const vector unsigned char) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1021 {0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1022 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07}; |
1949
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1023 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1024 #define ONEITERBUTTERFLY(i, res) \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1025 { \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1026 register vector unsigned char src1, src2, srcO; \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1027 register vector unsigned char dst1, dst2, dstO; \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1028 register vector signed short srcV, dstV; \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1029 register vector signed short but0, but1, but2, op1, op2, op3; \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1030 src1 = vec_ld(stride * i, src); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1031 src2 = vec_ld((stride * i) + 15, src); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1032 srcO = vec_perm(src1, src2, vec_lvsl(stride * i, src)); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1033 dst1 = vec_ld(stride * i, dst); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1034 dst2 = vec_ld((stride * i) + 15, dst); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1035 dstO = vec_perm(dst1, dst2, vec_lvsl(stride * i, dst)); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1036 /* promote the unsigned chars to signed shorts */ \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1037 /* we're in the 8x8 function, we only care for the first 8 */ \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1038 srcV = (vector signed short)vec_mergeh((vector signed char)vzero, \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1039 (vector signed char)srcO); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1040 dstV = (vector signed short)vec_mergeh((vector signed char)vzero, \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1041 (vector signed char)dstO); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1042 /* subtractions inside the first butterfly */ \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1043 but0 = vec_sub(srcV, dstV); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1044 op1 = vec_perm(but0, but0, perm1); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1045 but1 = vec_mladd(but0, vprod1, op1); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1046 op2 = vec_perm(but1, but1, perm2); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1047 but2 = vec_mladd(but1, vprod2, op2); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1048 op3 = vec_perm(but2, but2, perm3); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1049 res = vec_mladd(but2, vprod3, op3); \ |
1949
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1050 } |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1051 ONEITERBUTTERFLY(0, temp0); |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1052 ONEITERBUTTERFLY(1, temp1); |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1053 ONEITERBUTTERFLY(2, temp2); |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1054 ONEITERBUTTERFLY(3, temp3); |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1055 ONEITERBUTTERFLY(4, temp4); |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1056 ONEITERBUTTERFLY(5, temp5); |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1057 ONEITERBUTTERFLY(6, temp6); |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1058 ONEITERBUTTERFLY(7, temp7); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1059 } |
1949
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1060 #undef ONEITERBUTTERFLY |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1061 { |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1062 register vector signed int vsum; |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1063 register vector signed short line0 = vec_add(temp0, temp1); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1064 register vector signed short line1 = vec_sub(temp0, temp1); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1065 register vector signed short line2 = vec_add(temp2, temp3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1066 register vector signed short line3 = vec_sub(temp2, temp3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1067 register vector signed short line4 = vec_add(temp4, temp5); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1068 register vector signed short line5 = vec_sub(temp4, temp5); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1069 register vector signed short line6 = vec_add(temp6, temp7); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1070 register vector signed short line7 = vec_sub(temp6, temp7); |
2967 | 1071 |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1072 register vector signed short line0B = vec_add(line0, line2); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1073 register vector signed short line2B = vec_sub(line0, line2); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1074 register vector signed short line1B = vec_add(line1, line3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1075 register vector signed short line3B = vec_sub(line1, line3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1076 register vector signed short line4B = vec_add(line4, line6); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1077 register vector signed short line6B = vec_sub(line4, line6); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1078 register vector signed short line5B = vec_add(line5, line7); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1079 register vector signed short line7B = vec_sub(line5, line7); |
2967 | 1080 |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1081 register vector signed short line0C = vec_add(line0B, line4B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1082 register vector signed short line4C = vec_sub(line0B, line4B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1083 register vector signed short line1C = vec_add(line1B, line5B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1084 register vector signed short line5C = vec_sub(line1B, line5B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1085 register vector signed short line2C = vec_add(line2B, line6B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1086 register vector signed short line6C = vec_sub(line2B, line6B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1087 register vector signed short line3C = vec_add(line3B, line7B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1088 register vector signed short line7C = vec_sub(line3B, line7B); |
2967 | 1089 |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1090 vsum = vec_sum4s(vec_abs(line0C), vec_splat_s32(0)); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1091 vsum = vec_sum4s(vec_abs(line1C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1092 vsum = vec_sum4s(vec_abs(line2C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1093 vsum = vec_sum4s(vec_abs(line3C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1094 vsum = vec_sum4s(vec_abs(line4C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1095 vsum = vec_sum4s(vec_abs(line5C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1096 vsum = vec_sum4s(vec_abs(line6C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1097 vsum = vec_sum4s(vec_abs(line7C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1098 vsum = vec_sums(vsum, (vector signed int)vzero); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1099 vsum = vec_splat(vsum, 3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1100 vec_ste(vsum, 0, &sum); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1101 } |
1949
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1102 POWERPC_PERF_STOP_COUNT(altivec_hadamard8_diff8x8_num, 1); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1103 return sum; |
1949
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1104 } |
66215baae7b9
hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1839
diff
changeset
|
1105 |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1106 /* |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1107 16x8 works with 16 elements; it allows to avoid replicating loads, and |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1108 give the compiler more rooms for scheduling. It's only used from |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1109 inside hadamard8_diff16_altivec. |
2967 | 1110 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1111 Unfortunately, it seems gcc-3.3 is a bit dumb, and the compiled code has a LOT |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1112 of spill code, it seems gcc (unlike xlc) cannot keep everything in registers |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1113 by itself. The following code include hand-made registers allocation. It's not |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1114 clean, but on a 7450 the resulting code is much faster (best case fall from |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1115 700+ cycles to 550). |
2967 | 1116 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1117 xlc doesn't add spill code, but it doesn't know how to schedule for the 7450, |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1118 and its code isn't much faster than gcc-3.3 on the 7450 (but uses 25% less |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1119 instructions...) |
2967 | 1120 |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1121 On the 970, the hand-made RA is still a win (around 690 vs. around 780), but |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1122 xlc goes to around 660 on the regular C code... |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1123 */ |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1124 |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1125 static int hadamard8_diff16x8_altivec(/*MpegEncContext*/ void *s, uint8_t *dst, uint8_t *src, int stride, int h) { |
3554 | 1126 int sum; |
1127 register vector signed short | |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1128 temp0 __asm__ ("v0"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1129 temp1 __asm__ ("v1"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1130 temp2 __asm__ ("v2"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1131 temp3 __asm__ ("v3"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1132 temp4 __asm__ ("v4"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1133 temp5 __asm__ ("v5"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1134 temp6 __asm__ ("v6"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1135 temp7 __asm__ ("v7"); |
3554 | 1136 register vector signed short |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1137 temp0S __asm__ ("v8"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1138 temp1S __asm__ ("v9"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1139 temp2S __asm__ ("v10"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1140 temp3S __asm__ ("v11"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1141 temp4S __asm__ ("v12"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1142 temp5S __asm__ ("v13"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1143 temp6S __asm__ ("v14"), |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1144 temp7S __asm__ ("v15"); |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1145 register const vector unsigned char vzero __asm__ ("v31") = |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1146 (const vector unsigned char)vec_splat_u8(0); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1147 { |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1148 register const vector signed short vprod1 __asm__ ("v16") = |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1149 (const vector signed short){ 1,-1, 1,-1, 1,-1, 1,-1 }; |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1150 register const vector signed short vprod2 __asm__ ("v17") = |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1151 (const vector signed short){ 1, 1,-1,-1, 1, 1,-1,-1 }; |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1152 register const vector signed short vprod3 __asm__ ("v18") = |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1153 (const vector signed short){ 1, 1, 1, 1,-1,-1,-1,-1 }; |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1154 register const vector unsigned char perm1 __asm__ ("v19") = |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1155 (const vector unsigned char) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1156 {0x02, 0x03, 0x00, 0x01, 0x06, 0x07, 0x04, 0x05, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1157 0x0A, 0x0B, 0x08, 0x09, 0x0E, 0x0F, 0x0C, 0x0D}; |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1158 register const vector unsigned char perm2 __asm__ ("v20") = |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1159 (const vector unsigned char) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1160 {0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1161 0x0C, 0x0D, 0x0E, 0x0F, 0x08, 0x09, 0x0A, 0x0B}; |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1162 register const vector unsigned char perm3 __asm__ ("v21") = |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1163 (const vector unsigned char) |
7373
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1164 {0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, |
266d4949aa15
Remove AltiVec vector declaration compiler compatibility macros.
diego
parents:
7335
diff
changeset
|
1165 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07}; |
1980 | 1166 |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1167 #define ONEITERBUTTERFLY(i, res1, res2) \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1168 { \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1169 register vector unsigned char src1 __asm__ ("v22"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1170 src2 __asm__ ("v23"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1171 dst1 __asm__ ("v24"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1172 dst2 __asm__ ("v25"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1173 srcO __asm__ ("v22"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1174 dstO __asm__ ("v23"); \ |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1175 \ |
9421
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1176 register vector signed short srcV __asm__ ("v24"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1177 dstV __asm__ ("v25"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1178 srcW __asm__ ("v26"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1179 dstW __asm__ ("v27"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1180 but0 __asm__ ("v28"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1181 but0S __asm__ ("v29"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1182 op1 __asm__ ("v30"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1183 but1 __asm__ ("v22"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1184 op1S __asm__ ("v23"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1185 but1S __asm__ ("v24"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1186 op2 __asm__ ("v25"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1187 but2 __asm__ ("v26"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1188 op2S __asm__ ("v27"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1189 but2S __asm__ ("v28"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1190 op3 __asm__ ("v29"), \ |
dd2b5e52336a
Remove gcc_fixes.h. It only contains workarounds for unsupported gcc versions.
diego
parents:
8596
diff
changeset
|
1191 op3S __asm__ ("v30"); \ |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1192 \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1193 src1 = vec_ld(stride * i, src); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1194 src2 = vec_ld((stride * i) + 16, src); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1195 srcO = vec_perm(src1, src2, vec_lvsl(stride * i, src)); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1196 dst1 = vec_ld(stride * i, dst); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1197 dst2 = vec_ld((stride * i) + 16, dst); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1198 dstO = vec_perm(dst1, dst2, vec_lvsl(stride * i, dst)); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1199 /* promote the unsigned chars to signed shorts */ \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1200 srcV = (vector signed short)vec_mergeh((vector signed char)vzero, \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1201 (vector signed char)srcO); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1202 dstV = (vector signed short)vec_mergeh((vector signed char)vzero, \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1203 (vector signed char)dstO); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1204 srcW = (vector signed short)vec_mergel((vector signed char)vzero, \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1205 (vector signed char)srcO); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1206 dstW = (vector signed short)vec_mergel((vector signed char)vzero, \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1207 (vector signed char)dstO); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1208 /* subtractions inside the first butterfly */ \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1209 but0 = vec_sub(srcV, dstV); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1210 but0S = vec_sub(srcW, dstW); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1211 op1 = vec_perm(but0, but0, perm1); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1212 but1 = vec_mladd(but0, vprod1, op1); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1213 op1S = vec_perm(but0S, but0S, perm1); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1214 but1S = vec_mladd(but0S, vprod1, op1S); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1215 op2 = vec_perm(but1, but1, perm2); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1216 but2 = vec_mladd(but1, vprod2, op2); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1217 op2S = vec_perm(but1S, but1S, perm2); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1218 but2S = vec_mladd(but1S, vprod2, op2S); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1219 op3 = vec_perm(but2, but2, perm3); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1220 res1 = vec_mladd(but2, vprod3, op3); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1221 op3S = vec_perm(but2S, but2S, perm3); \ |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1222 res2 = vec_mladd(but2S, vprod3, op3S); \ |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1223 } |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1224 ONEITERBUTTERFLY(0, temp0, temp0S); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1225 ONEITERBUTTERFLY(1, temp1, temp1S); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1226 ONEITERBUTTERFLY(2, temp2, temp2S); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1227 ONEITERBUTTERFLY(3, temp3, temp3S); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1228 ONEITERBUTTERFLY(4, temp4, temp4S); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1229 ONEITERBUTTERFLY(5, temp5, temp5S); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1230 ONEITERBUTTERFLY(6, temp6, temp6S); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1231 ONEITERBUTTERFLY(7, temp7, temp7S); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1232 } |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1233 #undef ONEITERBUTTERFLY |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1234 { |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1235 register vector signed int vsum; |
3346
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1236 register vector signed short line0S, line1S, line2S, line3S, line4S, |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1237 line5S, line6S, line7S, line0BS,line2BS, |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1238 line1BS,line3BS,line4BS,line6BS,line5BS, |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1239 line7BS,line0CS,line4CS,line1CS,line5CS, |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1240 line2CS,line6CS,line3CS,line7CS; |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1241 |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1242 register vector signed short line0 = vec_add(temp0, temp1); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1243 register vector signed short line1 = vec_sub(temp0, temp1); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1244 register vector signed short line2 = vec_add(temp2, temp3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1245 register vector signed short line3 = vec_sub(temp2, temp3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1246 register vector signed short line4 = vec_add(temp4, temp5); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1247 register vector signed short line5 = vec_sub(temp4, temp5); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1248 register vector signed short line6 = vec_add(temp6, temp7); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1249 register vector signed short line7 = vec_sub(temp6, temp7); |
2967 | 1250 |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1251 register vector signed short line0B = vec_add(line0, line2); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1252 register vector signed short line2B = vec_sub(line0, line2); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1253 register vector signed short line1B = vec_add(line1, line3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1254 register vector signed short line3B = vec_sub(line1, line3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1255 register vector signed short line4B = vec_add(line4, line6); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1256 register vector signed short line6B = vec_sub(line4, line6); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1257 register vector signed short line5B = vec_add(line5, line7); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1258 register vector signed short line7B = vec_sub(line5, line7); |
2967 | 1259 |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1260 register vector signed short line0C = vec_add(line0B, line4B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1261 register vector signed short line4C = vec_sub(line0B, line4B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1262 register vector signed short line1C = vec_add(line1B, line5B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1263 register vector signed short line5C = vec_sub(line1B, line5B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1264 register vector signed short line2C = vec_add(line2B, line6B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1265 register vector signed short line6C = vec_sub(line2B, line6B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1266 register vector signed short line3C = vec_add(line3B, line7B); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1267 register vector signed short line7C = vec_sub(line3B, line7B); |
2967 | 1268 |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1269 vsum = vec_sum4s(vec_abs(line0C), vec_splat_s32(0)); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1270 vsum = vec_sum4s(vec_abs(line1C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1271 vsum = vec_sum4s(vec_abs(line2C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1272 vsum = vec_sum4s(vec_abs(line3C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1273 vsum = vec_sum4s(vec_abs(line4C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1274 vsum = vec_sum4s(vec_abs(line5C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1275 vsum = vec_sum4s(vec_abs(line6C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1276 vsum = vec_sum4s(vec_abs(line7C), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1277 |
3346
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1278 line0S = vec_add(temp0S, temp1S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1279 line1S = vec_sub(temp0S, temp1S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1280 line2S = vec_add(temp2S, temp3S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1281 line3S = vec_sub(temp2S, temp3S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1282 line4S = vec_add(temp4S, temp5S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1283 line5S = vec_sub(temp4S, temp5S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1284 line6S = vec_add(temp6S, temp7S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1285 line7S = vec_sub(temp6S, temp7S); |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1286 |
3346
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1287 line0BS = vec_add(line0S, line2S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1288 line2BS = vec_sub(line0S, line2S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1289 line1BS = vec_add(line1S, line3S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1290 line3BS = vec_sub(line1S, line3S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1291 line4BS = vec_add(line4S, line6S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1292 line6BS = vec_sub(line4S, line6S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1293 line5BS = vec_add(line5S, line7S); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1294 line7BS = vec_sub(line5S, line7S); |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1295 |
3346
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1296 line0CS = vec_add(line0BS, line4BS); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1297 line4CS = vec_sub(line0BS, line4BS); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1298 line1CS = vec_add(line1BS, line5BS); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1299 line5CS = vec_sub(line1BS, line5BS); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1300 line2CS = vec_add(line2BS, line6BS); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1301 line6CS = vec_sub(line2BS, line6BS); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1302 line3CS = vec_add(line3BS, line7BS); |
052765f11f1c
Cosmetics: should not hurt performance, scream if are
lu_zero
parents:
3252
diff
changeset
|
1303 line7CS = vec_sub(line3BS, line7BS); |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1304 |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1305 vsum = vec_sum4s(vec_abs(line0CS), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1306 vsum = vec_sum4s(vec_abs(line1CS), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1307 vsum = vec_sum4s(vec_abs(line2CS), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1308 vsum = vec_sum4s(vec_abs(line3CS), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1309 vsum = vec_sum4s(vec_abs(line4CS), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1310 vsum = vec_sum4s(vec_abs(line5CS), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1311 vsum = vec_sum4s(vec_abs(line6CS), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1312 vsum = vec_sum4s(vec_abs(line7CS), vsum); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1313 vsum = vec_sums(vsum, (vector signed int)vzero); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1314 vsum = vec_splat(vsum, 3); |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1315 vec_ste(vsum, 0, &sum); |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1316 } |
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1317 return sum; |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1318 } |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1319 |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1320 int hadamard8_diff16_altivec(/*MpegEncContext*/ void *s, uint8_t *dst, uint8_t *src, int stride, int h){ |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1321 POWERPC_PERF_DECLARE(altivec_hadamard8_diff16_num, 1); |
3554 | 1322 int score; |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1323 POWERPC_PERF_START_COUNT(altivec_hadamard8_diff16_num, 1); |
3554 | 1324 score = hadamard8_diff16x8_altivec(s, dst, src, stride, 8); |
1325 if (h==16) { | |
1326 dst += 8*stride; | |
1327 src += 8*stride; | |
1328 score += hadamard8_diff16x8_altivec(s, dst, src, stride, 8); | |
1329 } | |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1330 POWERPC_PERF_STOP_COUNT(altivec_hadamard8_diff16_num, 1); |
3554 | 1331 return score; |
1951
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1332 } |
2599b8444831
better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
1949
diff
changeset
|
1333 |
3543 | 1334 static void vorbis_inverse_coupling_altivec(float *mag, float *ang, |
1335 int blocksize) | |
1336 { | |
1337 int i; | |
3546
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1338 vector float m, a; |
3543 | 1339 vector bool int t0, t1; |
1340 const vector unsigned int v_31 = //XXX | |
1341 vec_add(vec_add(vec_splat_u32(15),vec_splat_u32(15)),vec_splat_u32(1)); | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1342 for (i = 0; i < blocksize; i += 4) { |
3543 | 1343 m = vec_ld(0, mag+i); |
1344 a = vec_ld(0, ang+i); | |
1345 t0 = vec_cmple(m, (vector float)vec_splat_u32(0)); | |
1346 t1 = vec_cmple(a, (vector float)vec_splat_u32(0)); | |
3545 | 1347 a = vec_xor(a, (vector float) vec_sl((vector unsigned int)t0, v_31)); |
3546
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1348 t0 = (vector bool int)vec_and(a, t1); |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1349 t1 = (vector bool int)vec_andc(a, t1); |
3550 | 1350 a = vec_sub(m, (vector float)t1); |
1351 m = vec_add(m, (vector float)t0); | |
3549
7b4e34f1ff1f
Fix a stupid typo and another error, thanks to Emanuele Giaquinta <exg@gentoo.org> for pointing out the issue and the patch
lu_zero
parents:
3546
diff
changeset
|
1352 vec_stl(a, 0, ang+i); |
7b4e34f1ff1f
Fix a stupid typo and another error, thanks to Emanuele Giaquinta <exg@gentoo.org> for pointing out the issue and the patch
lu_zero
parents:
3546
diff
changeset
|
1353 vec_stl(m, 0, mag+i); |
3543 | 1354 } |
1355 } | |
1356 | |
2057
4c663228e020
avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
2056
diff
changeset
|
1357 /* next one assumes that ((line_size % 8) == 0) */ |
4c663228e020
avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
2056
diff
changeset
|
1358 void avg_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h) |
4c663228e020
avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
2056
diff
changeset
|
1359 { |
4c663228e020
avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
2056
diff
changeset
|
1360 POWERPC_PERF_DECLARE(altivec_avg_pixels8_xy2_num, 1); |
3554 | 1361 register int i; |
1362 register vector unsigned char pixelsv1, pixelsv2, pixelsavg; | |
1363 register vector unsigned char blockv, temp1, temp2, blocktemp; | |
1364 register vector unsigned short pixelssum1, pixelssum2, temp3; | |
1365 | |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1366 register const vector unsigned char vczero = (const vector unsigned char) |
3554 | 1367 vec_splat_u8(0); |
5746
55ed6dc5d476
Remove const vector macro indirection that is useless and obfuscating
diego
parents:
5609
diff
changeset
|
1368 register const vector unsigned short vctwo = (const vector unsigned short) |
3554 | 1369 vec_splat_u16(2); |
2967 | 1370 |
3554 | 1371 temp1 = vec_ld(0, pixels); |
1372 temp2 = vec_ld(16, pixels); | |
1373 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels)); | |
1374 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F) { | |
1375 pixelsv2 = temp2; | |
1376 } else { | |
1377 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels)); | |
1378 } | |
1379 pixelsv1 = vec_mergeh(vczero, pixelsv1); | |
1380 pixelsv2 = vec_mergeh(vczero, pixelsv2); | |
1381 pixelssum1 = vec_add((vector unsigned short)pixelsv1, | |
1382 (vector unsigned short)pixelsv2); | |
1383 pixelssum1 = vec_add(pixelssum1, vctwo); | |
2967 | 1384 |
1385 POWERPC_PERF_START_COUNT(altivec_avg_pixels8_xy2_num, 1); | |
3554 | 1386 for (i = 0; i < h ; i++) { |
1387 int rightside = ((unsigned long)block & 0x0000000F); | |
1388 blockv = vec_ld(0, block); | |
2057
4c663228e020
avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
2056
diff
changeset
|
1389 |
3554 | 1390 temp1 = vec_ld(line_size, pixels); |
1391 temp2 = vec_ld(line_size + 16, pixels); | |
1392 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels)); | |
7335
d463d8ee7755
cosmetics: Make libavcodec/ppc/dsputil_altivec.c conform to style guidelines.
diego
parents:
6763
diff
changeset
|
1393 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F) { |
3554 | 1394 pixelsv2 = temp2; |
1395 } else { | |
1396 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels)); | |
1397 } | |
2057
4c663228e020
avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
2056
diff
changeset
|
1398 |
3554 | 1399 pixelsv1 = vec_mergeh(vczero, pixelsv1); |
1400 pixelsv2 = vec_mergeh(vczero, pixelsv2); | |
1401 pixelssum2 = vec_add((vector unsigned short)pixelsv1, | |
1402 (vector unsigned short)pixelsv2); | |
1403 temp3 = vec_add(pixelssum1, pixelssum2); | |
1404 temp3 = vec_sra(temp3, vctwo); | |
1405 pixelssum1 = vec_add(pixelssum2, vctwo); | |
1406 pixelsavg = vec_packsu(temp3, (vector unsigned short) vczero); | |
2967 | 1407 |
3554 | 1408 if (rightside) { |
1409 blocktemp = vec_perm(blockv, pixelsavg, vcprm(0, 1, s0, s1)); | |
1410 } else { | |
1411 blocktemp = vec_perm(blockv, pixelsavg, vcprm(s0, s1, 2, 3)); | |
1412 } | |
2967 | 1413 |
3554 | 1414 blockv = vec_avg(blocktemp, blockv); |
1415 vec_st(blockv, 0, block); | |
2967 | 1416 |
3554 | 1417 block += line_size; |
1418 pixels += line_size; | |
1419 } | |
2967 | 1420 |
2057
4c663228e020
avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
2056
diff
changeset
|
1421 POWERPC_PERF_STOP_COUNT(altivec_avg_pixels8_xy2_num, 1); |
4c663228e020
avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
2056
diff
changeset
|
1422 } |
3546
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1423 |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1424 void dsputil_init_altivec(DSPContext* c, AVCodecContext *avctx) |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1425 { |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1426 c->pix_abs[0][1] = sad16_x2_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1427 c->pix_abs[0][2] = sad16_y2_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1428 c->pix_abs[0][3] = sad16_xy2_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1429 c->pix_abs[0][0] = sad16_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1430 c->pix_abs[1][0] = sad8_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1431 c->sad[0]= sad16_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1432 c->sad[1]= sad8_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1433 c->pix_norm1 = pix_norm1_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1434 c->sse[1]= sse8_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1435 c->sse[0]= sse16_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1436 c->pix_sum = pix_sum_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1437 c->diff_pixels = diff_pixels_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1438 c->get_pixels = get_pixels_altivec; |
8307 | 1439 c->clear_block = clear_block_altivec; |
3546
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1440 c->add_bytes= add_bytes_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1441 c->put_pixels_tab[0][0] = put_pixels16_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1442 /* the two functions do the same thing, so use the same code */ |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1443 c->put_no_rnd_pixels_tab[0][0] = put_pixels16_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1444 c->avg_pixels_tab[0][0] = avg_pixels16_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1445 c->avg_pixels_tab[1][0] = avg_pixels8_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1446 c->avg_pixels_tab[1][3] = avg_pixels8_xy2_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1447 c->put_pixels_tab[1][3] = put_pixels8_xy2_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1448 c->put_no_rnd_pixels_tab[1][3] = put_no_rnd_pixels8_xy2_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1449 c->put_pixels_tab[0][3] = put_pixels16_xy2_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1450 c->put_no_rnd_pixels_tab[0][3] = put_no_rnd_pixels16_xy2_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1451 |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1452 c->hadamard8_diff[0] = hadamard8_diff16_altivec; |
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1453 c->hadamard8_diff[1] = hadamard8_diff8x8_altivec; |
8596
68e959302527
replace all occurrence of ENABLE_ by the corresponding CONFIG_, HAVE_ or ARCH_
aurel
parents:
8307
diff
changeset
|
1454 if (CONFIG_VORBIS_DECODER) |
5753 | 1455 c->vorbis_inverse_coupling = vorbis_inverse_coupling_altivec; |
3546
5f97ba9a4eaa
Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents:
3545
diff
changeset
|
1456 } |