annotate ppc/dsputil_altivec.c @ 6323:e6da66f378c7 libavcodec

mpegvideo.h has two function declarations with the 'inline' specifier but no definition for those functions. The C standard requires a definition to appear in the same translation unit for any function declared with 'inline'. Most of the files including mpegvideo.h do not define those functions. Fix this by removing the 'inline' specifiers from the header. patch by Uoti Urpala
author diego
date Sun, 03 Feb 2008 17:54:30 +0000
parents 33674fb857b5
children f7cbb7733146
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
1 /*
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
2 * Copyright (c) 2002 Brian Foley
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
3 * Copyright (c) 2002 Dieter Shirley
1949
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
4 * Copyright (c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
5 *
3947
c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents: 3572
diff changeset
6 * This file is part of FFmpeg.
c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents: 3572
diff changeset
7 *
c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents: 3572
diff changeset
8 * FFmpeg is free software; you can redistribute it and/or
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
9 * modify it under the terms of the GNU Lesser General Public
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
10 * License as published by the Free Software Foundation; either
3947
c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents: 3572
diff changeset
11 * version 2.1 of the License, or (at your option) any later version.
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
12 *
3947
c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents: 3572
diff changeset
13 * FFmpeg is distributed in the hope that it will be useful,
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
14 * but WITHOUT ANY WARRANTY; without even the implied warranty of
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
15 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
16 * Lesser General Public License for more details.
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
17 *
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
18 * You should have received a copy of the GNU Lesser General Public
3947
c8c591fe26f8 Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents: 3572
diff changeset
19 * License along with FFmpeg; if not, write to the Free Software
3036
0b546eab515d Update licensing information: The FSF changed postal address.
diego
parents: 2979
diff changeset
20 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
21 */
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
22
5010
d5ba514e3f4a Add libavcodec to compiler include flags in order to simplify header
diego
parents: 4422
diff changeset
23 #include "dsputil.h"
1277
f3152eb76f1a altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents: 1064
diff changeset
24
f3152eb76f1a altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents: 1064
diff changeset
25 #include "gcc_fixes.h"
f3152eb76f1a altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents: 1064
diff changeset
26
6105
33674fb857b5 Change some files to only include the necessary headers.
diego
parents: 5964
diff changeset
27 #include "dsputil_ppc.h"
5750
09f99af1db40 Sanitize altivec code so it can be built with runtime check properly
lu_zero
parents: 5749
diff changeset
28 #include "util_altivec.h"
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
29
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
30 int sad16_x2_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h)
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
31 {
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
32 int i;
5019
41cabe79ba25 use macro Use DECLARE_ALIGNED_16 to align stack-allocated variables
gpoirier
parents: 5010
diff changeset
33 DECLARE_ALIGNED_16(int, s);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
34 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0);
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
35 vector unsigned char *tv;
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
36 vector unsigned char pix1v, pix2v, pix2iv, avgv, t5;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
37 vector unsigned int sad;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
38 vector signed int sumdiffs;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
39
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
40 s = 0;
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
41 sad = (vector unsigned int)vec_splat_u32(0);
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
42 for(i=0;i<h;i++) {
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
43 /*
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
44 Read unaligned pixels into our vectors. The vectors are as follows:
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
45 pix1v: pix1[0]-pix1[15]
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
46 pix2v: pix2[0]-pix2[15] pix2iv: pix2[1]-pix2[16]
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
47 */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
48 tv = (vector unsigned char *) pix1;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
49 pix1v = vec_perm(tv[0], tv[1], vec_lvsl(0, pix1));
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
50
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
51 tv = (vector unsigned char *) &pix2[0];
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
52 pix2v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[0]));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
53
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
54 tv = (vector unsigned char *) &pix2[1];
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
55 pix2iv = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[1]));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
56
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
57 /* Calculate the average vector */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
58 avgv = vec_avg(pix2v, pix2iv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
59
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
60 /* Calculate a sum of abs differences vector */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
61 t5 = vec_sub(vec_max(pix1v, avgv), vec_min(pix1v, avgv));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
62
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
63 /* Add each 4 pixel group together and put 4 results into sad */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
64 sad = vec_sum4s(t5, sad);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
65
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
66 pix1 += line_size;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
67 pix2 += line_size;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
68 }
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
69 /* Sum up the four partial sums, and put the result into s */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
70 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
71 sumdiffs = vec_splat(sumdiffs, 3);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
72 vec_ste(sumdiffs, 0, &s);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
73
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
74 return s;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
75 }
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
76
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
77 int sad16_y2_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h)
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
78 {
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
79 int i;
5019
41cabe79ba25 use macro Use DECLARE_ALIGNED_16 to align stack-allocated variables
gpoirier
parents: 5010
diff changeset
80 DECLARE_ALIGNED_16(int, s);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
81 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0);
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
82 vector unsigned char *tv;
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
83 vector unsigned char pix1v, pix2v, pix3v, avgv, t5;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
84 vector unsigned int sad;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
85 vector signed int sumdiffs;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
86 uint8_t *pix3 = pix2 + line_size;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
87
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
88 s = 0;
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
89 sad = (vector unsigned int)vec_splat_u32(0);
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
90
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
91 /*
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
92 Due to the fact that pix3 = pix2 + line_size, the pix3 of one
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
93 iteration becomes pix2 in the next iteration. We can use this
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
94 fact to avoid a potentially expensive unaligned read, each
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
95 time around the loop.
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
96 Read unaligned pixels into our vectors. The vectors are as follows:
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
97 pix2v: pix2[0]-pix2[15]
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
98 Split the pixel vectors into shorts
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
99 */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
100 tv = (vector unsigned char *) &pix2[0];
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
101 pix2v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[0]));
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
102
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
103 for(i=0;i<h;i++) {
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
104 /*
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
105 Read unaligned pixels into our vectors. The vectors are as follows:
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
106 pix1v: pix1[0]-pix1[15]
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
107 pix3v: pix3[0]-pix3[15]
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
108 */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
109 tv = (vector unsigned char *) pix1;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
110 pix1v = vec_perm(tv[0], tv[1], vec_lvsl(0, pix1));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
111
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
112 tv = (vector unsigned char *) &pix3[0];
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
113 pix3v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix3[0]));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
114
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
115 /* Calculate the average vector */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
116 avgv = vec_avg(pix2v, pix3v);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
117
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
118 /* Calculate a sum of abs differences vector */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
119 t5 = vec_sub(vec_max(pix1v, avgv), vec_min(pix1v, avgv));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
120
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
121 /* Add each 4 pixel group together and put 4 results into sad */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
122 sad = vec_sum4s(t5, sad);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
123
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
124 pix1 += line_size;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
125 pix2v = pix3v;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
126 pix3 += line_size;
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
127
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
128 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
129
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
130 /* Sum up the four partial sums, and put the result into s */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
131 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
132 sumdiffs = vec_splat(sumdiffs, 3);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
133 vec_ste(sumdiffs, 0, &s);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
134 return s;
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
135 }
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
136
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
137 int sad16_xy2_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h)
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
138 {
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
139 int i;
5019
41cabe79ba25 use macro Use DECLARE_ALIGNED_16 to align stack-allocated variables
gpoirier
parents: 5010
diff changeset
140 DECLARE_ALIGNED_16(int, s);
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
141 uint8_t *pix3 = pix2 + line_size;
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
142 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
143 const vector unsigned short two = (const vector unsigned short)vec_splat_u16(2);
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
144 vector unsigned char *tv, avgv, t5;
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
145 vector unsigned char pix1v, pix2v, pix3v, pix2iv, pix3iv;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
146 vector unsigned short pix2lv, pix2hv, pix2ilv, pix2ihv;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
147 vector unsigned short pix3lv, pix3hv, pix3ilv, pix3ihv;
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
148 vector unsigned short avghv, avglv;
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
149 vector unsigned short t1, t2, t3, t4;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
150 vector unsigned int sad;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
151 vector signed int sumdiffs;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
152
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
153 sad = (vector unsigned int)vec_splat_u32(0);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
154
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
155 s = 0;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
156
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
157 /*
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
158 Due to the fact that pix3 = pix2 + line_size, the pix3 of one
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
159 iteration becomes pix2 in the next iteration. We can use this
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
160 fact to avoid a potentially expensive unaligned read, as well
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
161 as some splitting, and vector addition each time around the loop.
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
162 Read unaligned pixels into our vectors. The vectors are as follows:
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
163 pix2v: pix2[0]-pix2[15] pix2iv: pix2[1]-pix2[16]
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
164 Split the pixel vectors into shorts
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
165 */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
166 tv = (vector unsigned char *) &pix2[0];
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
167 pix2v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[0]));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
168
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
169 tv = (vector unsigned char *) &pix2[1];
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
170 pix2iv = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix2[1]));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
171
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
172 pix2hv = (vector unsigned short) vec_mergeh(zero, pix2v);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
173 pix2lv = (vector unsigned short) vec_mergel(zero, pix2v);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
174 pix2ihv = (vector unsigned short) vec_mergeh(zero, pix2iv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
175 pix2ilv = (vector unsigned short) vec_mergel(zero, pix2iv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
176 t1 = vec_add(pix2hv, pix2ihv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
177 t2 = vec_add(pix2lv, pix2ilv);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
178
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
179 for(i=0;i<h;i++) {
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
180 /*
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
181 Read unaligned pixels into our vectors. The vectors are as follows:
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
182 pix1v: pix1[0]-pix1[15]
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
183 pix3v: pix3[0]-pix3[15] pix3iv: pix3[1]-pix3[16]
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
184 */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
185 tv = (vector unsigned char *) pix1;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
186 pix1v = vec_perm(tv[0], tv[1], vec_lvsl(0, pix1));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
187
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
188 tv = (vector unsigned char *) &pix3[0];
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
189 pix3v = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix3[0]));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
190
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
191 tv = (vector unsigned char *) &pix3[1];
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
192 pix3iv = vec_perm(tv[0], tv[1], vec_lvsl(0, &pix3[1]));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
193
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
194 /*
5749
784dcbdc910f cosmetics: Fix AltiVec spelling.
diego
parents: 5746
diff changeset
195 Note that AltiVec does have vec_avg, but this works on vector pairs
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
196 and rounds up. We could do avg(avg(a,b),avg(c,d)), but the rounding
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
197 would mean that, for example, avg(3,0,0,1) = 2, when it should be 1.
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
198 Instead, we have to split the pixel vectors into vectors of shorts,
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
199 and do the averaging by hand.
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
200 */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
201
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
202 /* Split the pixel vectors into shorts */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
203 pix3hv = (vector unsigned short) vec_mergeh(zero, pix3v);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
204 pix3lv = (vector unsigned short) vec_mergel(zero, pix3v);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
205 pix3ihv = (vector unsigned short) vec_mergeh(zero, pix3iv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
206 pix3ilv = (vector unsigned short) vec_mergel(zero, pix3iv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
207
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
208 /* Do the averaging on them */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
209 t3 = vec_add(pix3hv, pix3ihv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
210 t4 = vec_add(pix3lv, pix3ilv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
211
884
2cef5c4c0ca6 * altivec and pix_norm patch by Brian Foley
kabi
parents: 878
diff changeset
212 avghv = vec_sr(vec_add(vec_add(t1, t3), two), two);
2cef5c4c0ca6 * altivec and pix_norm patch by Brian Foley
kabi
parents: 878
diff changeset
213 avglv = vec_sr(vec_add(vec_add(t2, t4), two), two);
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
214
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
215 /* Pack the shorts back into a result */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
216 avgv = vec_pack(avghv, avglv);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
217
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
218 /* Calculate a sum of abs differences vector */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
219 t5 = vec_sub(vec_max(pix1v, avgv), vec_min(pix1v, avgv));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
220
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
221 /* Add each 4 pixel group together and put 4 results into sad */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
222 sad = vec_sum4s(t5, sad);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
223
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
224 pix1 += line_size;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
225 pix3 += line_size;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
226 /* Transfer the calculated values for pix3 into pix2 */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
227 t1 = t3;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
228 t2 = t4;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
229 }
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
230 /* Sum up the four partial sums, and put the result into s */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
231 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
232 sumdiffs = vec_splat(sumdiffs, 3);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
233 vec_ste(sumdiffs, 0, &s);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
234
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
235 return s;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
236 }
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
237
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
238 int sad16_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h)
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
239 {
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
240 int i;
5019
41cabe79ba25 use macro Use DECLARE_ALIGNED_16 to align stack-allocated variables
gpoirier
parents: 5010
diff changeset
241 DECLARE_ALIGNED_16(int, s);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
242 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0);
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
243 vector unsigned char perm1, perm2, *pix1v, *pix2v;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
244 vector unsigned char t1, t2, t3,t4, t5;
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
245 vector unsigned int sad;
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
246 vector signed int sumdiffs;
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
247
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
248 sad = (vector unsigned int)vec_splat_u32(0);
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
249
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
250
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
251 for(i=0;i<h;i++) {
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
252 /* Read potentially unaligned pixels into t1 and t2 */
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
253 perm1 = vec_lvsl(0, pix1);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
254 pix1v = (vector unsigned char *) pix1;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
255 perm2 = vec_lvsl(0, pix2);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
256 pix2v = (vector unsigned char *) pix2;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
257 t1 = vec_perm(pix1v[0], pix1v[1], perm1);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
258 t2 = vec_perm(pix2v[0], pix2v[1], perm2);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
259
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
260 /* Calculate a sum of abs differences vector */
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
261 t3 = vec_max(t1, t2);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
262 t4 = vec_min(t1, t2);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
263 t5 = vec_sub(t3, t4);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
264
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
265 /* Add each 4 pixel group together and put 4 results into sad */
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
266 sad = vec_sum4s(t5, sad);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
267
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
268 pix1 += line_size;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
269 pix2 += line_size;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
270 }
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
271
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
272 /* Sum up the four partial sums, and put the result into s */
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
273 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
274 sumdiffs = vec_splat(sumdiffs, 3);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
275 vec_ste(sumdiffs, 0, &s);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
276
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
277 return s;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
278 }
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
279
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
280 int sad8_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h)
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
281 {
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
282 int i;
5584
e831a81fb25e Fix trivial mixed declarations and code warning caused by a double semicolon.
diego
parents: 5573
diff changeset
283 DECLARE_ALIGNED_16(int, s);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
284 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0);
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
285 vector unsigned char perm1, perm2, permclear, *pix1v, *pix2v;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
286 vector unsigned char t1, t2, t3,t4, t5;
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
287 vector unsigned int sad;
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
288 vector signed int sumdiffs;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
289
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
290 sad = (vector unsigned int)vec_splat_u32(0);
1277
f3152eb76f1a altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents: 1064
diff changeset
291
f3152eb76f1a altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents: 1064
diff changeset
292 permclear = (vector unsigned char)AVV(255,255,255,255,255,255,255,255,0,0,0,0,0,0,0,0);
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
293
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
294 for(i=0;i<h;i++) {
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
295 /* Read potentially unaligned pixels into t1 and t2
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
296 Since we're reading 16 pixels, and actually only want 8,
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
297 mask out the last 8 pixels. The 0s don't change the sum. */
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
298 perm1 = vec_lvsl(0, pix1);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
299 pix1v = (vector unsigned char *) pix1;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
300 perm2 = vec_lvsl(0, pix2);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
301 pix2v = (vector unsigned char *) pix2;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
302 t1 = vec_and(vec_perm(pix1v[0], pix1v[1], perm1), permclear);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
303 t2 = vec_and(vec_perm(pix2v[0], pix2v[1], perm2), permclear);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
304
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
305 /* Calculate a sum of abs differences vector */
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
306 t3 = vec_max(t1, t2);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
307 t4 = vec_min(t1, t2);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
308 t5 = vec_sub(t3, t4);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
309
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
310 /* Add each 4 pixel group together and put 4 results into sad */
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
311 sad = vec_sum4s(t5, sad);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
312
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
313 pix1 += line_size;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
314 pix2 += line_size;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
315 }
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
316
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
317 /* Sum up the four partial sums, and put the result into s */
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
318 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
319 sumdiffs = vec_splat(sumdiffs, 3);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
320 vec_ste(sumdiffs, 0, &s);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
321
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
322 return s;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
323 }
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
324
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
325 int pix_norm1_altivec(uint8_t *pix, int line_size)
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
326 {
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
327 int i;
5019
41cabe79ba25 use macro Use DECLARE_ALIGNED_16 to align stack-allocated variables
gpoirier
parents: 5010
diff changeset
328 DECLARE_ALIGNED_16(int, s);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
329 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0);
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
330 vector unsigned char *tv;
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
331 vector unsigned char pixv;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
332 vector unsigned int sv;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
333 vector signed int sum;
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
334
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
335 sv = (vector unsigned int)vec_splat_u32(0);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
336
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
337 s = 0;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
338 for (i = 0; i < 16; i++) {
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
339 /* Read in the potentially unaligned pixels */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
340 tv = (vector unsigned char *) pix;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
341 pixv = vec_perm(tv[0], tv[1], vec_lvsl(0, pix));
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
342
884
2cef5c4c0ca6 * altivec and pix_norm patch by Brian Foley
kabi
parents: 878
diff changeset
343 /* Square the values, and add them to our sum */
2cef5c4c0ca6 * altivec and pix_norm patch by Brian Foley
kabi
parents: 878
diff changeset
344 sv = vec_msum(pixv, pixv, sv);
878
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
345
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
346 pix += line_size;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
347 }
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
348 /* Sum up the four partial sums, and put the result into s */
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
349 sum = vec_sums((vector signed int) sv, (vector signed int) zero);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
350 sum = vec_splat(sum, 3);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
351 vec_ste(sum, 0, &s);
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
352
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
353 return s;
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
354 }
6ea69518e5f7 altivec optimizations patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents: 828
diff changeset
355
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
356 /**
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
357 * Sum of Squared Errors for a 8x8 block.
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
358 * AltiVec-enhanced.
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
359 * It's the sad8_altivec code above w/ squaring added.
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
360 */
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
361 int sse8_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h)
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
362 {
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
363 int i;
5019
41cabe79ba25 use macro Use DECLARE_ALIGNED_16 to align stack-allocated variables
gpoirier
parents: 5010
diff changeset
364 DECLARE_ALIGNED_16(int, s);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
365 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0);
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
366 vector unsigned char perm1, perm2, permclear, *pix1v, *pix2v;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
367 vector unsigned char t1, t2, t3,t4, t5;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
368 vector unsigned int sum;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
369 vector signed int sumsqr;
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
370
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
371 sum = (vector unsigned int)vec_splat_u32(0);
1277
f3152eb76f1a altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents: 1064
diff changeset
372
f3152eb76f1a altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents: 1064
diff changeset
373 permclear = (vector unsigned char)AVV(255,255,255,255,255,255,255,255,0,0,0,0,0,0,0,0);
f3152eb76f1a altivec gcc-3 fixes by (Magnus Damm <damm at opensource dot se>)
michaelni
parents: 1064
diff changeset
374
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
375
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
376 for(i=0;i<h;i++) {
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
377 /* Read potentially unaligned pixels into t1 and t2
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
378 Since we're reading 16 pixels, and actually only want 8,
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
379 mask out the last 8 pixels. The 0s don't change the sum. */
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
380 perm1 = vec_lvsl(0, pix1);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
381 pix1v = (vector unsigned char *) pix1;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
382 perm2 = vec_lvsl(0, pix2);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
383 pix2v = (vector unsigned char *) pix2;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
384 t1 = vec_and(vec_perm(pix1v[0], pix1v[1], perm1), permclear);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
385 t2 = vec_and(vec_perm(pix2v[0], pix2v[1], perm2), permclear);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
386
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
387 /*
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
388 Since we want to use unsigned chars, we can take advantage
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
389 of the fact that abs(a-b)^2 = (a-b)^2.
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
390 */
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
391
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
392 /* Calculate abs differences vector */
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
393 t3 = vec_max(t1, t2);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
394 t4 = vec_min(t1, t2);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
395 t5 = vec_sub(t3, t4);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
396
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
397 /* Square the values and add them to our sum */
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
398 sum = vec_msum(t5, t5, sum);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
399
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
400 pix1 += line_size;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
401 pix2 += line_size;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
402 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
403
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
404 /* Sum up the four partial sums, and put the result into s */
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
405 sumsqr = vec_sums((vector signed int) sum, (vector signed int) zero);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
406 sumsqr = vec_splat(sumsqr, 3);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
407 vec_ste(sumsqr, 0, &s);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
408
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
409 return s;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
410 }
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
411
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
412 /**
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
413 * Sum of Squared Errors for a 16x16 block.
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
414 * AltiVec-enhanced.
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
415 * It's the sad16_altivec code above w/ squaring added.
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
416 */
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
417 int sse16_altivec(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h)
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
418 {
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
419 int i;
5019
41cabe79ba25 use macro Use DECLARE_ALIGNED_16 to align stack-allocated variables
gpoirier
parents: 5010
diff changeset
420 DECLARE_ALIGNED_16(int, s);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
421 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0);
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
422 vector unsigned char perm1, perm2, *pix1v, *pix2v;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
423 vector unsigned char t1, t2, t3,t4, t5;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
424 vector unsigned int sum;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
425 vector signed int sumsqr;
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
426
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
427 sum = (vector unsigned int)vec_splat_u32(0);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
428
1708
dea5b2946999 interlaced motion estimation
michael
parents: 1352
diff changeset
429 for(i=0;i<h;i++) {
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
430 /* Read potentially unaligned pixels into t1 and t2 */
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
431 perm1 = vec_lvsl(0, pix1);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
432 pix1v = (vector unsigned char *) pix1;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
433 perm2 = vec_lvsl(0, pix2);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
434 pix2v = (vector unsigned char *) pix2;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
435 t1 = vec_perm(pix1v[0], pix1v[1], perm1);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
436 t2 = vec_perm(pix2v[0], pix2v[1], perm2);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
437
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
438 /*
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
439 Since we want to use unsigned chars, we can take advantage
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
440 of the fact that abs(a-b)^2 = (a-b)^2.
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
441 */
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
442
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
443 /* Calculate abs differences vector */
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
444 t3 = vec_max(t1, t2);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
445 t4 = vec_min(t1, t2);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
446 t5 = vec_sub(t3, t4);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
447
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
448 /* Square the values and add them to our sum */
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
449 sum = vec_msum(t5, t5, sum);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
450
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
451 pix1 += line_size;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
452 pix2 += line_size;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
453 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
454
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
455 /* Sum up the four partial sums, and put the result into s */
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
456 sumsqr = vec_sums((vector signed int) sum, (vector signed int) zero);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
457 sumsqr = vec_splat(sumsqr, 3);
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
458 vec_ste(sumsqr, 0, &s);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
459
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
460 return s;
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
461 }
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
462
1064
b32afefe7d33 * UINTX -> uintx_t INTX -> intx_t
kabi
parents: 1033
diff changeset
463 int pix_sum_altivec(uint8_t * pix, int line_size)
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
464 {
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
465 const vector unsigned int zero = (const vector unsigned int)vec_splat_u32(0);
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
466 vector unsigned char perm, *pixv;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
467 vector unsigned char t1;
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
468 vector unsigned int sad;
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
469 vector signed int sumdiffs;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
470
981
8bec850dc9c7 altivec patches by Romain Dolbeau
bellard
parents: 978
diff changeset
471 int i;
5019
41cabe79ba25 use macro Use DECLARE_ALIGNED_16 to align stack-allocated variables
gpoirier
parents: 5010
diff changeset
472 DECLARE_ALIGNED_16(int, s);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
473
1033
b4172ff70d27 Altivec on non darwin systems patch by Romain Dolbeau
bellard
parents: 1024
diff changeset
474 sad = (vector unsigned int)vec_splat_u32(0);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
475
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
476 for (i = 0; i < 16; i++) {
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
477 /* Read the potentially unaligned 16 pixels into t1 */
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
478 perm = vec_lvsl(0, pix);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
479 pixv = (vector unsigned char *) pix;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
480 t1 = vec_perm(pixv[0], pixv[1], perm);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
481
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
482 /* Add each 4 pixel group together and put 4 results into sad */
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
483 sad = vec_sum4s(t1, sad);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
484
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
485 pix += line_size;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
486 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
487
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
488 /* Sum up the four partial sums, and put the result into s */
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
489 sumdiffs = vec_sums((vector signed int) sad, (vector signed int) zero);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
490 sumdiffs = vec_splat(sumdiffs, 3);
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
491 vec_ste(sumdiffs, 0, &s);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
492
623
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
493 return s;
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
494 }
92e99e506920 first cut at altivec support on darwin patch by (Brian Foley <bfoley at compsoc dot nuigalway dot ie>)
michaelni
parents:
diff changeset
495
1064
b32afefe7d33 * UINTX -> uintx_t INTX -> intx_t
kabi
parents: 1033
diff changeset
496 void get_pixels_altivec(DCTELEM *restrict block, const uint8_t *pixels, int line_size)
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
497 {
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
498 int i;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
499 vector unsigned char perm, bytes, *pixv;
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
500 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0);
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
501 vector signed short shorts;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
502
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
503 for(i=0;i<8;i++)
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
504 {
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
505 // Read potentially unaligned pixels.
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
506 // We're reading 16 pixels, and actually only want 8,
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
507 // but we simply ignore the extras.
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
508 perm = vec_lvsl(0, pixels);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
509 pixv = (vector unsigned char *) pixels;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
510 bytes = vec_perm(pixv[0], pixv[1], perm);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
511
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
512 // convert the bytes into shorts
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
513 shorts = (vector signed short)vec_mergeh(zero, bytes);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
514
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
515 // save the data to the block, we assume the block is 16-byte aligned
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
516 vec_st(shorts, i*16, (vector signed short*)block);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
517
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
518 pixels += line_size;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
519 }
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
520 }
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
521
1064
b32afefe7d33 * UINTX -> uintx_t INTX -> intx_t
kabi
parents: 1033
diff changeset
522 void diff_pixels_altivec(DCTELEM *restrict block, const uint8_t *s1,
b32afefe7d33 * UINTX -> uintx_t INTX -> intx_t
kabi
parents: 1033
diff changeset
523 const uint8_t *s2, int stride)
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
524 {
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
525 int i;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
526 vector unsigned char perm, bytes, *pixv;
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
527 const vector unsigned char zero = (const vector unsigned char)vec_splat_u8(0);
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
528 vector signed short shorts1, shorts2;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
529
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
530 for(i=0;i<4;i++)
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
531 {
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
532 // Read potentially unaligned pixels
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
533 // We're reading 16 pixels, and actually only want 8,
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
534 // but we simply ignore the extras.
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
535 perm = vec_lvsl(0, s1);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
536 pixv = (vector unsigned char *) s1;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
537 bytes = vec_perm(pixv[0], pixv[1], perm);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
538
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
539 // convert the bytes into shorts
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
540 shorts1 = (vector signed short)vec_mergeh(zero, bytes);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
541
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
542 // Do the same for the second block of pixels
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
543 perm = vec_lvsl(0, s2);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
544 pixv = (vector unsigned char *) s2;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
545 bytes = vec_perm(pixv[0], pixv[1], perm);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
546
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
547 // convert the bytes into shorts
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
548 shorts2 = (vector signed short)vec_mergeh(zero, bytes);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
549
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
550 // Do the subtraction
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
551 shorts1 = vec_sub(shorts1, shorts2);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
552
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
553 // save the data to the block, we assume the block is 16-byte aligned
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
554 vec_st(shorts1, 0, (vector signed short*)block);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
555
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
556 s1 += stride;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
557 s2 += stride;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
558 block += 8;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
559
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
560
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
561 // The code below is a copy of the code above... This is a manual
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
562 // unroll.
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
563
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
564 // Read potentially unaligned pixels
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
565 // We're reading 16 pixels, and actually only want 8,
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
566 // but we simply ignore the extras.
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
567 perm = vec_lvsl(0, s1);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
568 pixv = (vector unsigned char *) s1;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
569 bytes = vec_perm(pixv[0], pixv[1], perm);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
570
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
571 // convert the bytes into shorts
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
572 shorts1 = (vector signed short)vec_mergeh(zero, bytes);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
573
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
574 // Do the same for the second block of pixels
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
575 perm = vec_lvsl(0, s2);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
576 pixv = (vector unsigned char *) s2;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
577 bytes = vec_perm(pixv[0], pixv[1], perm);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
578
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
579 // convert the bytes into shorts
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
580 shorts2 = (vector signed short)vec_mergeh(zero, bytes);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
581
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
582 // Do the subtraction
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
583 shorts1 = vec_sub(shorts1, shorts2);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
584
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
585 // save the data to the block, we assume the block is 16-byte aligned
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
586 vec_st(shorts1, 0, (vector signed short*)block);
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
587
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
588 s1 += stride;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
589 s2 += stride;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
590 block += 8;
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
591 }
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
592 }
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
593
995
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
594 void add_bytes_altivec(uint8_t *dst, uint8_t *src, int w) {
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
595 register int i;
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
596 register vector unsigned char vdst, vsrc;
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
597
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
598 /* dst and src are 16 bytes-aligned (guaranteed) */
3968
c86c7a54ba92 add_bytes passes tests
lu_zero
parents: 3947
diff changeset
599 for(i = 0 ; (i + 15) < w ; i+=16)
995
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
600 {
3968
c86c7a54ba92 add_bytes passes tests
lu_zero
parents: 3947
diff changeset
601 vdst = vec_ld(i, (unsigned char*)dst);
c86c7a54ba92 add_bytes passes tests
lu_zero
parents: 3947
diff changeset
602 vsrc = vec_ld(i, (unsigned char*)src);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
603 vdst = vec_add(vsrc, vdst);
3968
c86c7a54ba92 add_bytes passes tests
lu_zero
parents: 3947
diff changeset
604 vec_st(vdst, i, (unsigned char*)dst);
995
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
605 }
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
606 /* if w is not a multiple of 16 */
995
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
607 for (; (i < w) ; i++)
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
608 {
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
609 dst[i] = src[i];
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
610 }
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
611 }
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
612
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
613 /* next one assumes that ((line_size % 16) == 0) */
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
614 void put_pixels16_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
615 {
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
616 POWERPC_PERF_DECLARE(altivec_put_pixels16_num, 1);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
617 register vector unsigned char pixelsv1, pixelsv2;
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
618 register vector unsigned char pixelsv1B, pixelsv2B;
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
619 register vector unsigned char pixelsv1C, pixelsv2C;
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
620 register vector unsigned char pixelsv1D, pixelsv2D;
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
621
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
622 register vector unsigned char perm = vec_lvsl(0, pixels);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
623 int i;
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
624 register int line_size_2 = line_size << 1;
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
625 register int line_size_3 = line_size + line_size_2;
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
626 register int line_size_4 = line_size << 2;
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
627
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
628 POWERPC_PERF_START_COUNT(altivec_put_pixels16_num, 1);
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
629 // hand-unrolling the loop by 4 gains about 15%
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
630 // mininum execution time goes from 74 to 60 cycles
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
631 // it's faster than -funroll-loops, but using
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
632 // -funroll-loops w/ this is bad - 74 cycles again.
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
633 // all this is on a 7450, tuning for the 7450
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
634 #if 0
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
635 for(i=0; i<h; i++) {
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
636 pixelsv1 = vec_ld(0, (unsigned char*)pixels);
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
637 pixelsv2 = vec_ld(16, (unsigned char*)pixels);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
638 vec_st(vec_perm(pixelsv1, pixelsv2, perm),
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
639 0, (unsigned char*)block);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
640 pixels+=line_size;
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
641 block +=line_size;
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
642 }
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
643 #else
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
644 for(i=0; i<h; i+=4) {
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
645 pixelsv1 = vec_ld(0, (unsigned char*)pixels);
3533
0937cc91b574 avoid possible segfault situations
lu_zero
parents: 3346
diff changeset
646 pixelsv2 = vec_ld(15, (unsigned char*)pixels);
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
647 pixelsv1B = vec_ld(line_size, (unsigned char*)pixels);
3533
0937cc91b574 avoid possible segfault situations
lu_zero
parents: 3346
diff changeset
648 pixelsv2B = vec_ld(15 + line_size, (unsigned char*)pixels);
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
649 pixelsv1C = vec_ld(line_size_2, (unsigned char*)pixels);
3533
0937cc91b574 avoid possible segfault situations
lu_zero
parents: 3346
diff changeset
650 pixelsv2C = vec_ld(15 + line_size_2, (unsigned char*)pixels);
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
651 pixelsv1D = vec_ld(line_size_3, (unsigned char*)pixels);
3533
0937cc91b574 avoid possible segfault situations
lu_zero
parents: 3346
diff changeset
652 pixelsv2D = vec_ld(15 + line_size_3, (unsigned char*)pixels);
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
653 vec_st(vec_perm(pixelsv1, pixelsv2, perm),
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
654 0, (unsigned char*)block);
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
655 vec_st(vec_perm(pixelsv1B, pixelsv2B, perm),
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
656 line_size, (unsigned char*)block);
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
657 vec_st(vec_perm(pixelsv1C, pixelsv2C, perm),
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
658 line_size_2, (unsigned char*)block);
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
659 vec_st(vec_perm(pixelsv1D, pixelsv2D, perm),
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
660 line_size_3, (unsigned char*)block);
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
661 pixels+=line_size_4;
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
662 block +=line_size_4;
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
663 }
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
664 #endif
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
665 POWERPC_PERF_STOP_COUNT(altivec_put_pixels16_num, 1);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
666 }
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
667
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
668 /* next one assumes that ((line_size % 16) == 0) */
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
669 #define op_avg(a,b) a = ( ((a)|(b)) - ((((a)^(b))&0xFEFEFEFEUL)>>1) )
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
670 void avg_pixels16_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
671 {
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
672 POWERPC_PERF_DECLARE(altivec_avg_pixels16_num, 1);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
673 register vector unsigned char pixelsv1, pixelsv2, pixelsv, blockv;
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
674 register vector unsigned char perm = vec_lvsl(0, pixels);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
675 int i;
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
676
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
677 POWERPC_PERF_START_COUNT(altivec_avg_pixels16_num, 1);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
678
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
679 for(i=0; i<h; i++) {
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
680 pixelsv1 = vec_ld(0, (unsigned char*)pixels);
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
681 pixelsv2 = vec_ld(16, (unsigned char*)pixels);
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
682 blockv = vec_ld(0, block);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
683 pixelsv = vec_perm(pixelsv1, pixelsv2, perm);
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
684 blockv = vec_avg(blockv,pixelsv);
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
685 vec_st(blockv, 0, (unsigned char*)block);
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
686 pixels+=line_size;
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
687 block +=line_size;
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
688 }
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
689
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
690 POWERPC_PERF_STOP_COUNT(altivec_avg_pixels16_num, 1);
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
691 }
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
692
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
693 /* next one assumes that ((line_size % 8) == 0) */
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
694 void avg_pixels8_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h)
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
695 {
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
696 POWERPC_PERF_DECLARE(altivec_avg_pixels8_num, 1);
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
697 register vector unsigned char pixelsv1, pixelsv2, pixelsv, blockv;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
698 int i;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
699
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
700 POWERPC_PERF_START_COUNT(altivec_avg_pixels8_num, 1);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
701
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
702 for (i = 0; i < h; i++) {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
703 /*
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
704 block is 8 bytes-aligned, so we're either in the
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
705 left block (16 bytes-aligned) or in the right block (not)
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
706 */
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
707 int rightside = ((unsigned long)block & 0x0000000F);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
708
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
709 blockv = vec_ld(0, block);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
710 pixelsv1 = vec_ld(0, (unsigned char*)pixels);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
711 pixelsv2 = vec_ld(16, (unsigned char*)pixels);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
712 pixelsv = vec_perm(pixelsv1, pixelsv2, vec_lvsl(0, pixels));
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
713
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
714 if (rightside)
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
715 {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
716 pixelsv = vec_perm(blockv, pixelsv, vcprm(0,1,s0,s1));
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
717 }
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
718 else
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
719 {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
720 pixelsv = vec_perm(blockv, pixelsv, vcprm(s0,s1,2,3));
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
721 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
722
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
723 blockv = vec_avg(blockv, pixelsv);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
724
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
725 vec_st(blockv, 0, block);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
726
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
727 pixels += line_size;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
728 block += line_size;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
729 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
730
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
731 POWERPC_PERF_STOP_COUNT(altivec_avg_pixels8_num, 1);
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
732 }
1009
3b7cc8e4b83f AltiVec perf (take 2), plus a couple AltiVec functions by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 995
diff changeset
733
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
734 /* next one assumes that ((line_size % 8) == 0) */
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
735 void put_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
736 {
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
737 POWERPC_PERF_DECLARE(altivec_put_pixels8_xy2_num, 1);
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
738 register int i;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
739 register vector unsigned char
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
740 pixelsv1, pixelsv2,
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
741 pixelsavg;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
742 register vector unsigned char
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
743 blockv, temp1, temp2;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
744 register vector unsigned short
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
745 pixelssum1, pixelssum2, temp3;
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
746 register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
747 register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
748
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
749 temp1 = vec_ld(0, pixels);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
750 temp2 = vec_ld(16, pixels);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
751 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels));
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
752 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F)
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
753 {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
754 pixelsv2 = temp2;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
755 }
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
756 else
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
757 {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
758 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels));
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
759 }
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
760 pixelsv1 = vec_mergeh(vczero, pixelsv1);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
761 pixelsv2 = vec_mergeh(vczero, pixelsv2);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
762 pixelssum1 = vec_add((vector unsigned short)pixelsv1,
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
763 (vector unsigned short)pixelsv2);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
764 pixelssum1 = vec_add(pixelssum1, vctwo);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
765
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
766 POWERPC_PERF_START_COUNT(altivec_put_pixels8_xy2_num, 1);
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
767 for (i = 0; i < h ; i++) {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
768 int rightside = ((unsigned long)block & 0x0000000F);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
769 blockv = vec_ld(0, block);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
770
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
771 temp1 = vec_ld(line_size, pixels);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
772 temp2 = vec_ld(line_size + 16, pixels);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
773 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels));
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
774 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F)
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
775 {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
776 pixelsv2 = temp2;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
777 }
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
778 else
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
779 {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
780 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels));
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
781 }
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
782
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
783 pixelsv1 = vec_mergeh(vczero, pixelsv1);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
784 pixelsv2 = vec_mergeh(vczero, pixelsv2);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
785 pixelssum2 = vec_add((vector unsigned short)pixelsv1,
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
786 (vector unsigned short)pixelsv2);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
787 temp3 = vec_add(pixelssum1, pixelssum2);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
788 temp3 = vec_sra(temp3, vctwo);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
789 pixelssum1 = vec_add(pixelssum2, vctwo);
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
790 pixelsavg = vec_packsu(temp3, (vector unsigned short) vczero);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
791
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
792 if (rightside)
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
793 {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
794 blockv = vec_perm(blockv, pixelsavg, vcprm(0, 1, s0, s1));
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
795 }
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
796 else
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
797 {
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
798 blockv = vec_perm(blockv, pixelsavg, vcprm(s0, s1, 2, 3));
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
799 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
800
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
801 vec_st(blockv, 0, block);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
802
1015
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
803 block += line_size;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
804 pixels += line_size;
35cf2f4a0f8c PPC perf, PPC clear_block, AltiVec put_pixels8_xy2 patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1009
diff changeset
805 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
806
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
807 POWERPC_PERF_STOP_COUNT(altivec_put_pixels8_xy2_num, 1);
995
edc10966b081 altivec jumbo patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michaelni
parents: 981
diff changeset
808 }
828
ace3ccd18dd2 Altivec Patch (Mark III) by (Dieter Shirley <dieters at schemasoft dot com>)
michaelni
parents: 638
diff changeset
809
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
810 /* next one assumes that ((line_size % 8) == 0) */
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
811 void put_no_rnd_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
812 {
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
813 POWERPC_PERF_DECLARE(altivec_put_no_rnd_pixels8_xy2_num, 1);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
814 register int i;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
815 register vector unsigned char
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
816 pixelsv1, pixelsv2,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
817 pixelsavg;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
818 register vector unsigned char
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
819 blockv, temp1, temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
820 register vector unsigned short
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
821 pixelssum1, pixelssum2, temp3;
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
822 register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
823 register const vector unsigned short vcone = (const vector unsigned short)vec_splat_u16(1);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
824 register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
825
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
826 temp1 = vec_ld(0, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
827 temp2 = vec_ld(16, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
828 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
829 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
830 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
831 pixelsv2 = temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
832 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
833 else
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
834 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
835 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
836 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
837 pixelsv1 = vec_mergeh(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
838 pixelsv2 = vec_mergeh(vczero, pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
839 pixelssum1 = vec_add((vector unsigned short)pixelsv1,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
840 (vector unsigned short)pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
841 pixelssum1 = vec_add(pixelssum1, vcone);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
842
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
843 POWERPC_PERF_START_COUNT(altivec_put_no_rnd_pixels8_xy2_num, 1);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
844 for (i = 0; i < h ; i++) {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
845 int rightside = ((unsigned long)block & 0x0000000F);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
846 blockv = vec_ld(0, block);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
847
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
848 temp1 = vec_ld(line_size, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
849 temp2 = vec_ld(line_size + 16, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
850 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
851 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
852 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
853 pixelsv2 = temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
854 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
855 else
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
856 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
857 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
858 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
859
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
860 pixelsv1 = vec_mergeh(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
861 pixelsv2 = vec_mergeh(vczero, pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
862 pixelssum2 = vec_add((vector unsigned short)pixelsv1,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
863 (vector unsigned short)pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
864 temp3 = vec_add(pixelssum1, pixelssum2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
865 temp3 = vec_sra(temp3, vctwo);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
866 pixelssum1 = vec_add(pixelssum2, vcone);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
867 pixelsavg = vec_packsu(temp3, (vector unsigned short) vczero);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
868
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
869 if (rightside)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
870 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
871 blockv = vec_perm(blockv, pixelsavg, vcprm(0, 1, s0, s1));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
872 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
873 else
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
874 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
875 blockv = vec_perm(blockv, pixelsavg, vcprm(s0, s1, 2, 3));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
876 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
877
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
878 vec_st(blockv, 0, block);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
879
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
880 block += line_size;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
881 pixels += line_size;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
882 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
883
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
884 POWERPC_PERF_STOP_COUNT(altivec_put_no_rnd_pixels8_xy2_num, 1);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
885 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
886
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
887 /* next one assumes that ((line_size % 16) == 0) */
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
888 void put_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
889 {
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
890 POWERPC_PERF_DECLARE(altivec_put_pixels16_xy2_num, 1);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
891 register int i;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
892 register vector unsigned char
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
893 pixelsv1, pixelsv2, pixelsv3, pixelsv4;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
894 register vector unsigned char
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
895 blockv, temp1, temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
896 register vector unsigned short
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
897 pixelssum1, pixelssum2, temp3,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
898 pixelssum3, pixelssum4, temp4;
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
899 register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
900 register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2);
1340
09b8fe0f0139 PPC fixes & clean-up patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1277
diff changeset
901
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
902 POWERPC_PERF_START_COUNT(altivec_put_pixels16_xy2_num, 1);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
903
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
904 temp1 = vec_ld(0, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
905 temp2 = vec_ld(16, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
906 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
907 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
908 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
909 pixelsv2 = temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
910 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
911 else
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
912 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
913 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
914 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
915 pixelsv3 = vec_mergel(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
916 pixelsv4 = vec_mergel(vczero, pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
917 pixelsv1 = vec_mergeh(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
918 pixelsv2 = vec_mergeh(vczero, pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
919 pixelssum3 = vec_add((vector unsigned short)pixelsv3,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
920 (vector unsigned short)pixelsv4);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
921 pixelssum3 = vec_add(pixelssum3, vctwo);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
922 pixelssum1 = vec_add((vector unsigned short)pixelsv1,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
923 (vector unsigned short)pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
924 pixelssum1 = vec_add(pixelssum1, vctwo);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
925
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
926 for (i = 0; i < h ; i++) {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
927 blockv = vec_ld(0, block);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
928
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
929 temp1 = vec_ld(line_size, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
930 temp2 = vec_ld(line_size + 16, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
931 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
932 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
933 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
934 pixelsv2 = temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
935 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
936 else
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
937 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
938 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
939 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
940
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
941 pixelsv3 = vec_mergel(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
942 pixelsv4 = vec_mergel(vczero, pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
943 pixelsv1 = vec_mergeh(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
944 pixelsv2 = vec_mergeh(vczero, pixelsv2);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
945
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
946 pixelssum4 = vec_add((vector unsigned short)pixelsv3,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
947 (vector unsigned short)pixelsv4);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
948 pixelssum2 = vec_add((vector unsigned short)pixelsv1,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
949 (vector unsigned short)pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
950 temp4 = vec_add(pixelssum3, pixelssum4);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
951 temp4 = vec_sra(temp4, vctwo);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
952 temp3 = vec_add(pixelssum1, pixelssum2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
953 temp3 = vec_sra(temp3, vctwo);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
954
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
955 pixelssum3 = vec_add(pixelssum4, vctwo);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
956 pixelssum1 = vec_add(pixelssum2, vctwo);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
957
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
958 blockv = vec_packsu(temp3, temp4);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
959
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
960 vec_st(blockv, 0, block);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
961
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
962 block += line_size;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
963 pixels += line_size;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
964 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
965
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
966 POWERPC_PERF_STOP_COUNT(altivec_put_pixels16_xy2_num, 1);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
967 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
968
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
969 /* next one assumes that ((line_size % 16) == 0) */
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
970 void put_no_rnd_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
971 {
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
972 POWERPC_PERF_DECLARE(altivec_put_no_rnd_pixels16_xy2_num, 1);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
973 register int i;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
974 register vector unsigned char
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
975 pixelsv1, pixelsv2, pixelsv3, pixelsv4;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
976 register vector unsigned char
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
977 blockv, temp1, temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
978 register vector unsigned short
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
979 pixelssum1, pixelssum2, temp3,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
980 pixelssum3, pixelssum4, temp4;
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
981 register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
982 register const vector unsigned short vcone = (const vector unsigned short)vec_splat_u16(1);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
983 register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2);
1340
09b8fe0f0139 PPC fixes & clean-up patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1277
diff changeset
984
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
985 POWERPC_PERF_START_COUNT(altivec_put_no_rnd_pixels16_xy2_num, 1);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
986
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
987 temp1 = vec_ld(0, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
988 temp2 = vec_ld(16, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
989 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
990 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
991 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
992 pixelsv2 = temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
993 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
994 else
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
995 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
996 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
997 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
998 pixelsv3 = vec_mergel(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
999 pixelsv4 = vec_mergel(vczero, pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1000 pixelsv1 = vec_mergeh(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1001 pixelsv2 = vec_mergeh(vczero, pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1002 pixelssum3 = vec_add((vector unsigned short)pixelsv3,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1003 (vector unsigned short)pixelsv4);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1004 pixelssum3 = vec_add(pixelssum3, vcone);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1005 pixelssum1 = vec_add((vector unsigned short)pixelsv1,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1006 (vector unsigned short)pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1007 pixelssum1 = vec_add(pixelssum1, vcone);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1008
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1009 for (i = 0; i < h ; i++) {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1010 blockv = vec_ld(0, block);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1011
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1012 temp1 = vec_ld(line_size, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1013 temp2 = vec_ld(line_size + 16, pixels);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1014 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1015 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F)
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1016 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1017 pixelsv2 = temp2;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1018 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1019 else
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1020 {
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1021 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels));
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1022 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1023
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1024 pixelsv3 = vec_mergel(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1025 pixelsv4 = vec_mergel(vczero, pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1026 pixelsv1 = vec_mergeh(vczero, pixelsv1);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1027 pixelsv2 = vec_mergeh(vczero, pixelsv2);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1028
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1029 pixelssum4 = vec_add((vector unsigned short)pixelsv3,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1030 (vector unsigned short)pixelsv4);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1031 pixelssum2 = vec_add((vector unsigned short)pixelsv1,
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1032 (vector unsigned short)pixelsv2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1033 temp4 = vec_add(pixelssum3, pixelssum4);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1034 temp4 = vec_sra(temp4, vctwo);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1035 temp3 = vec_add(pixelssum1, pixelssum2);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1036 temp3 = vec_sra(temp3, vctwo);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1037
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1038 pixelssum3 = vec_add(pixelssum4, vcone);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1039 pixelssum1 = vec_add(pixelssum2, vcone);
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1040
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1041 blockv = vec_packsu(temp3, temp4);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1042
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1043 vec_st(blockv, 0, block);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1044
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1045 block += line_size;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1046 pixels += line_size;
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1047 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1048
1352
e8ff4783f188 1) remove TBL support in PPC performance. It's much more useful to use the
michaelni
parents: 1340
diff changeset
1049 POWERPC_PERF_STOP_COUNT(altivec_put_no_rnd_pixels16_xy2_num, 1);
1024
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1050 }
9cc1031e1864 More AltiVec MC functions patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michaelni
parents: 1015
diff changeset
1051
1949
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1052 int hadamard8_diff8x8_altivec(/*MpegEncContext*/ void *s, uint8_t *dst, uint8_t *src, int stride, int h){
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1053 POWERPC_PERF_DECLARE(altivec_hadamard8_diff8x8_num, 1);
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1054 int sum;
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1055 register const vector unsigned char vzero =
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1056 (const vector unsigned char)vec_splat_u8(0);
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1057 register vector signed short temp0, temp1, temp2, temp3, temp4,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1058 temp5, temp6, temp7;
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1059 POWERPC_PERF_START_COUNT(altivec_hadamard8_diff8x8_num, 1);
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1060 {
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1061 register const vector signed short vprod1 =(const vector signed short)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1062 AVV( 1,-1, 1,-1, 1,-1, 1,-1);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1063 register const vector signed short vprod2 =(const vector signed short)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1064 AVV( 1, 1,-1,-1, 1, 1,-1,-1);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1065 register const vector signed short vprod3 =(const vector signed short)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1066 AVV( 1, 1, 1, 1,-1,-1,-1,-1);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1067 register const vector unsigned char perm1 = (const vector unsigned char)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1068 AVV(0x02, 0x03, 0x00, 0x01, 0x06, 0x07, 0x04, 0x05,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1069 0x0A, 0x0B, 0x08, 0x09, 0x0E, 0x0F, 0x0C, 0x0D);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1070 register const vector unsigned char perm2 = (const vector unsigned char)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1071 AVV(0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1072 0x0C, 0x0D, 0x0E, 0x0F, 0x08, 0x09, 0x0A, 0x0B);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1073 register const vector unsigned char perm3 = (const vector unsigned char)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1074 AVV(0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1075 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07);
1949
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1076
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1077 #define ONEITERBUTTERFLY(i, res) \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1078 { \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1079 register vector unsigned char src1, src2, srcO; \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1080 register vector unsigned char dst1, dst2, dstO; \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1081 register vector signed short srcV, dstV; \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1082 register vector signed short but0, but1, but2, op1, op2, op3; \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1083 src1 = vec_ld(stride * i, src); \
4422
c867ae28d4de Simplify and avoid a warning (should be faster on Cell and certain G4 revisions)
lu_zero
parents: 3973
diff changeset
1084 src2 = vec_ld((stride * i) + 15, src); \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1085 srcO = vec_perm(src1, src2, vec_lvsl(stride * i, src)); \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1086 dst1 = vec_ld(stride * i, dst); \
4422
c867ae28d4de Simplify and avoid a warning (should be faster on Cell and certain G4 revisions)
lu_zero
parents: 3973
diff changeset
1087 dst2 = vec_ld((stride * i) + 15, dst); \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1088 dstO = vec_perm(dst1, dst2, vec_lvsl(stride * i, dst)); \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1089 /* promote the unsigned chars to signed shorts */ \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1090 /* we're in the 8x8 function, we only care for the first 8 */ \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1091 srcV = \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1092 (vector signed short)vec_mergeh((vector signed char)vzero, \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1093 (vector signed char)srcO); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1094 dstV = \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1095 (vector signed short)vec_mergeh((vector signed char)vzero, \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1096 (vector signed char)dstO); \
5964
34f551bd0509 Fix alignment broke by my last patch
vitor
parents: 5963
diff changeset
1097 /* subtractions inside the first butterfly */ \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1098 but0 = vec_sub(srcV, dstV); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1099 op1 = vec_perm(but0, but0, perm1); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1100 but1 = vec_mladd(but0, vprod1, op1); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1101 op2 = vec_perm(but1, but1, perm2); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1102 but2 = vec_mladd(but1, vprod2, op2); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1103 op3 = vec_perm(but2, but2, perm3); \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1104 res = vec_mladd(but2, vprod3, op3); \
1949
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1105 }
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1106 ONEITERBUTTERFLY(0, temp0);
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1107 ONEITERBUTTERFLY(1, temp1);
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1108 ONEITERBUTTERFLY(2, temp2);
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1109 ONEITERBUTTERFLY(3, temp3);
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1110 ONEITERBUTTERFLY(4, temp4);
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1111 ONEITERBUTTERFLY(5, temp5);
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1112 ONEITERBUTTERFLY(6, temp6);
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1113 ONEITERBUTTERFLY(7, temp7);
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1114 }
1949
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1115 #undef ONEITERBUTTERFLY
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1116 {
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1117 register vector signed int vsum;
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1118 register vector signed short line0 = vec_add(temp0, temp1);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1119 register vector signed short line1 = vec_sub(temp0, temp1);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1120 register vector signed short line2 = vec_add(temp2, temp3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1121 register vector signed short line3 = vec_sub(temp2, temp3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1122 register vector signed short line4 = vec_add(temp4, temp5);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1123 register vector signed short line5 = vec_sub(temp4, temp5);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1124 register vector signed short line6 = vec_add(temp6, temp7);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1125 register vector signed short line7 = vec_sub(temp6, temp7);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1126
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1127 register vector signed short line0B = vec_add(line0, line2);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1128 register vector signed short line2B = vec_sub(line0, line2);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1129 register vector signed short line1B = vec_add(line1, line3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1130 register vector signed short line3B = vec_sub(line1, line3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1131 register vector signed short line4B = vec_add(line4, line6);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1132 register vector signed short line6B = vec_sub(line4, line6);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1133 register vector signed short line5B = vec_add(line5, line7);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1134 register vector signed short line7B = vec_sub(line5, line7);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1135
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1136 register vector signed short line0C = vec_add(line0B, line4B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1137 register vector signed short line4C = vec_sub(line0B, line4B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1138 register vector signed short line1C = vec_add(line1B, line5B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1139 register vector signed short line5C = vec_sub(line1B, line5B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1140 register vector signed short line2C = vec_add(line2B, line6B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1141 register vector signed short line6C = vec_sub(line2B, line6B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1142 register vector signed short line3C = vec_add(line3B, line7B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1143 register vector signed short line7C = vec_sub(line3B, line7B);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1144
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1145 vsum = vec_sum4s(vec_abs(line0C), vec_splat_s32(0));
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1146 vsum = vec_sum4s(vec_abs(line1C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1147 vsum = vec_sum4s(vec_abs(line2C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1148 vsum = vec_sum4s(vec_abs(line3C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1149 vsum = vec_sum4s(vec_abs(line4C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1150 vsum = vec_sum4s(vec_abs(line5C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1151 vsum = vec_sum4s(vec_abs(line6C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1152 vsum = vec_sum4s(vec_abs(line7C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1153 vsum = vec_sums(vsum, (vector signed int)vzero);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1154 vsum = vec_splat(vsum, 3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1155 vec_ste(vsum, 0, &sum);
1949
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1156 }
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1157 POWERPC_PERF_STOP_COUNT(altivec_hadamard8_diff8x8_num, 1);
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1158 return sum;
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1159 }
66215baae7b9 hadamard8_diff8x8 in AltiVec, the 16bits edition by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1839
diff changeset
1160
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1161 /*
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1162 16x8 works with 16 elements ; it allows to avoid replicating
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1163 loads, and give the compiler more rooms for scheduling.
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1164 It's only used from inside hadamard8_diff16_altivec.
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1165
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1166 Unfortunately, it seems gcc-3.3 is a bit dumb, and
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1167 the compiled code has a LOT of spill code, it seems
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1168 gcc (unlike xlc) cannot keep everything in registers
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1169 by itself. The following code include hand-made
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1170 registers allocation. It's not clean, but on
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1171 a 7450 the resulting code is much faster (best case
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1172 fall from 700+ cycles to 550).
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1173
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1174 xlc doesn't add spill code, but it doesn't know how to
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1175 schedule for the 7450, and its code isn't much faster than
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1176 gcc-3.3 on the 7450 (but uses 25% less instructions...)
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1177
5963
80103098c797 spelling
vitor
parents: 5753
diff changeset
1178 On the 970, the hand-made RA is still a win (around 690
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1179 vs. around 780), but xlc goes to around 660 on the
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1180 regular C code...
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1181 */
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1182
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1183 static int hadamard8_diff16x8_altivec(/*MpegEncContext*/ void *s, uint8_t *dst, uint8_t *src, int stride, int h) {
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1184 int sum;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1185 register vector signed short
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1186 temp0 REG_v(v0),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1187 temp1 REG_v(v1),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1188 temp2 REG_v(v2),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1189 temp3 REG_v(v3),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1190 temp4 REG_v(v4),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1191 temp5 REG_v(v5),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1192 temp6 REG_v(v6),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1193 temp7 REG_v(v7);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1194 register vector signed short
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1195 temp0S REG_v(v8),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1196 temp1S REG_v(v9),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1197 temp2S REG_v(v10),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1198 temp3S REG_v(v11),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1199 temp4S REG_v(v12),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1200 temp5S REG_v(v13),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1201 temp6S REG_v(v14),
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1202 temp7S REG_v(v15);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1203 register const vector unsigned char vzero REG_v(v31)=
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1204 (const vector unsigned char)vec_splat_u8(0);
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1205 {
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1206 register const vector signed short vprod1 REG_v(v16)=
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1207 (const vector signed short)AVV( 1,-1, 1,-1, 1,-1, 1,-1);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1208 register const vector signed short vprod2 REG_v(v17)=
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1209 (const vector signed short)AVV( 1, 1,-1,-1, 1, 1,-1,-1);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1210 register const vector signed short vprod3 REG_v(v18)=
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1211 (const vector signed short)AVV( 1, 1, 1, 1,-1,-1,-1,-1);
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1212 register const vector unsigned char perm1 REG_v(v19)=
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1213 (const vector unsigned char)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1214 AVV(0x02, 0x03, 0x00, 0x01, 0x06, 0x07, 0x04, 0x05,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1215 0x0A, 0x0B, 0x08, 0x09, 0x0E, 0x0F, 0x0C, 0x0D);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1216 register const vector unsigned char perm2 REG_v(v20)=
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1217 (const vector unsigned char)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1218 AVV(0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1219 0x0C, 0x0D, 0x0E, 0x0F, 0x08, 0x09, 0x0A, 0x0B);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1220 register const vector unsigned char perm3 REG_v(v21)=
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1221 (const vector unsigned char)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1222 AVV(0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1223 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07);
1980
a6972a4f90c8 use the AVV macro from gcc_fixes.h instead ifdefs
alex
parents: 1979
diff changeset
1224
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1225 #define ONEITERBUTTERFLY(i, res1, res2) \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1226 { \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1227 register vector unsigned char src1 REG_v(v22), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1228 src2 REG_v(v23), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1229 dst1 REG_v(v24), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1230 dst2 REG_v(v25), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1231 srcO REG_v(v22), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1232 dstO REG_v(v23); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1233 \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1234 register vector signed short srcV REG_v(v24), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1235 dstV REG_v(v25), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1236 srcW REG_v(v26), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1237 dstW REG_v(v27), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1238 but0 REG_v(v28), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1239 but0S REG_v(v29), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1240 op1 REG_v(v30), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1241 but1 REG_v(v22), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1242 op1S REG_v(v23), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1243 but1S REG_v(v24), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1244 op2 REG_v(v25), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1245 but2 REG_v(v26), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1246 op2S REG_v(v27), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1247 but2S REG_v(v28), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1248 op3 REG_v(v29), \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1249 op3S REG_v(v30); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1250 \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1251 src1 = vec_ld(stride * i, src); \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1252 src2 = vec_ld((stride * i) + 16, src); \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1253 srcO = vec_perm(src1, src2, vec_lvsl(stride * i, src)); \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1254 dst1 = vec_ld(stride * i, dst); \
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1255 dst2 = vec_ld((stride * i) + 16, dst); \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1256 dstO = vec_perm(dst1, dst2, vec_lvsl(stride * i, dst)); \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1257 /* promote the unsigned chars to signed shorts */ \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1258 srcV = \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1259 (vector signed short)vec_mergeh((vector signed char)vzero, \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1260 (vector signed char)srcO); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1261 dstV = \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1262 (vector signed short)vec_mergeh((vector signed char)vzero, \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1263 (vector signed char)dstO); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1264 srcW = \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1265 (vector signed short)vec_mergel((vector signed char)vzero, \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1266 (vector signed char)srcO); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1267 dstW = \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1268 (vector signed short)vec_mergel((vector signed char)vzero, \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1269 (vector signed char)dstO); \
5964
34f551bd0509 Fix alignment broke by my last patch
vitor
parents: 5963
diff changeset
1270 /* subtractions inside the first butterfly */ \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1271 but0 = vec_sub(srcV, dstV); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1272 but0S = vec_sub(srcW, dstW); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1273 op1 = vec_perm(but0, but0, perm1); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1274 but1 = vec_mladd(but0, vprod1, op1); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1275 op1S = vec_perm(but0S, but0S, perm1); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1276 but1S = vec_mladd(but0S, vprod1, op1S); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1277 op2 = vec_perm(but1, but1, perm2); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1278 but2 = vec_mladd(but1, vprod2, op2); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1279 op2S = vec_perm(but1S, but1S, perm2); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1280 but2S = vec_mladd(but1S, vprod2, op2S); \
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1281 op3 = vec_perm(but2, but2, perm3); \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1282 res1 = vec_mladd(but2, vprod3, op3); \
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1283 op3S = vec_perm(but2S, but2S, perm3); \
2979
bfabfdf9ce55 COSMETICS: tabs --> spaces, some prettyprinting
diego
parents: 2967
diff changeset
1284 res2 = vec_mladd(but2S, vprod3, op3S); \
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1285 }
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1286 ONEITERBUTTERFLY(0, temp0, temp0S);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1287 ONEITERBUTTERFLY(1, temp1, temp1S);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1288 ONEITERBUTTERFLY(2, temp2, temp2S);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1289 ONEITERBUTTERFLY(3, temp3, temp3S);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1290 ONEITERBUTTERFLY(4, temp4, temp4S);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1291 ONEITERBUTTERFLY(5, temp5, temp5S);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1292 ONEITERBUTTERFLY(6, temp6, temp6S);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1293 ONEITERBUTTERFLY(7, temp7, temp7S);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1294 }
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1295 #undef ONEITERBUTTERFLY
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1296 {
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1297 register vector signed int vsum;
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1298 register vector signed short line0S, line1S, line2S, line3S, line4S,
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1299 line5S, line6S, line7S, line0BS,line2BS,
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1300 line1BS,line3BS,line4BS,line6BS,line5BS,
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1301 line7BS,line0CS,line4CS,line1CS,line5CS,
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1302 line2CS,line6CS,line3CS,line7CS;
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1303
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1304 register vector signed short line0 = vec_add(temp0, temp1);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1305 register vector signed short line1 = vec_sub(temp0, temp1);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1306 register vector signed short line2 = vec_add(temp2, temp3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1307 register vector signed short line3 = vec_sub(temp2, temp3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1308 register vector signed short line4 = vec_add(temp4, temp5);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1309 register vector signed short line5 = vec_sub(temp4, temp5);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1310 register vector signed short line6 = vec_add(temp6, temp7);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1311 register vector signed short line7 = vec_sub(temp6, temp7);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1312
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1313 register vector signed short line0B = vec_add(line0, line2);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1314 register vector signed short line2B = vec_sub(line0, line2);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1315 register vector signed short line1B = vec_add(line1, line3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1316 register vector signed short line3B = vec_sub(line1, line3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1317 register vector signed short line4B = vec_add(line4, line6);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1318 register vector signed short line6B = vec_sub(line4, line6);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1319 register vector signed short line5B = vec_add(line5, line7);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1320 register vector signed short line7B = vec_sub(line5, line7);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1321
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1322 register vector signed short line0C = vec_add(line0B, line4B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1323 register vector signed short line4C = vec_sub(line0B, line4B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1324 register vector signed short line1C = vec_add(line1B, line5B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1325 register vector signed short line5C = vec_sub(line1B, line5B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1326 register vector signed short line2C = vec_add(line2B, line6B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1327 register vector signed short line6C = vec_sub(line2B, line6B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1328 register vector signed short line3C = vec_add(line3B, line7B);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1329 register vector signed short line7C = vec_sub(line3B, line7B);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1330
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1331 vsum = vec_sum4s(vec_abs(line0C), vec_splat_s32(0));
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1332 vsum = vec_sum4s(vec_abs(line1C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1333 vsum = vec_sum4s(vec_abs(line2C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1334 vsum = vec_sum4s(vec_abs(line3C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1335 vsum = vec_sum4s(vec_abs(line4C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1336 vsum = vec_sum4s(vec_abs(line5C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1337 vsum = vec_sum4s(vec_abs(line6C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1338 vsum = vec_sum4s(vec_abs(line7C), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1339
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1340 line0S = vec_add(temp0S, temp1S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1341 line1S = vec_sub(temp0S, temp1S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1342 line2S = vec_add(temp2S, temp3S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1343 line3S = vec_sub(temp2S, temp3S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1344 line4S = vec_add(temp4S, temp5S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1345 line5S = vec_sub(temp4S, temp5S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1346 line6S = vec_add(temp6S, temp7S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1347 line7S = vec_sub(temp6S, temp7S);
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1348
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1349 line0BS = vec_add(line0S, line2S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1350 line2BS = vec_sub(line0S, line2S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1351 line1BS = vec_add(line1S, line3S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1352 line3BS = vec_sub(line1S, line3S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1353 line4BS = vec_add(line4S, line6S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1354 line6BS = vec_sub(line4S, line6S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1355 line5BS = vec_add(line5S, line7S);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1356 line7BS = vec_sub(line5S, line7S);
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1357
3346
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1358 line0CS = vec_add(line0BS, line4BS);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1359 line4CS = vec_sub(line0BS, line4BS);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1360 line1CS = vec_add(line1BS, line5BS);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1361 line5CS = vec_sub(line1BS, line5BS);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1362 line2CS = vec_add(line2BS, line6BS);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1363 line6CS = vec_sub(line2BS, line6BS);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1364 line3CS = vec_add(line3BS, line7BS);
052765f11f1c Cosmetics: should not hurt performance, scream if are
lu_zero
parents: 3252
diff changeset
1365 line7CS = vec_sub(line3BS, line7BS);
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1366
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1367 vsum = vec_sum4s(vec_abs(line0CS), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1368 vsum = vec_sum4s(vec_abs(line1CS), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1369 vsum = vec_sum4s(vec_abs(line2CS), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1370 vsum = vec_sum4s(vec_abs(line3CS), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1371 vsum = vec_sum4s(vec_abs(line4CS), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1372 vsum = vec_sum4s(vec_abs(line5CS), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1373 vsum = vec_sum4s(vec_abs(line6CS), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1374 vsum = vec_sum4s(vec_abs(line7CS), vsum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1375 vsum = vec_sums(vsum, (vector signed int)vzero);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1376 vsum = vec_splat(vsum, 3);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1377 vec_ste(vsum, 0, &sum);
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1378 }
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1379 return sum;
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1380 }
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1381
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1382 int hadamard8_diff16_altivec(/*MpegEncContext*/ void *s, uint8_t *dst, uint8_t *src, int stride, int h){
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1383 POWERPC_PERF_DECLARE(altivec_hadamard8_diff16_num, 1);
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1384 int score;
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1385 POWERPC_PERF_START_COUNT(altivec_hadamard8_diff16_num, 1);
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1386 score = hadamard8_diff16x8_altivec(s, dst, src, stride, 8);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1387 if (h==16) {
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1388 dst += 8*stride;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1389 src += 8*stride;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1390 score += hadamard8_diff16x8_altivec(s, dst, src, stride, 8);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1391 }
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1392 POWERPC_PERF_STOP_COUNT(altivec_hadamard8_diff16_num, 1);
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1393 return score;
1951
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1394 }
2599b8444831 better hadamard8_diff16 in AltiVec, and more patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 1949
diff changeset
1395
3543
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1396 static void vorbis_inverse_coupling_altivec(float *mag, float *ang,
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1397 int blocksize)
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1398 {
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1399 int i;
3546
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1400 vector float m, a;
3543
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1401 vector bool int t0, t1;
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1402 const vector unsigned int v_31 = //XXX
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1403 vec_add(vec_add(vec_splat_u32(15),vec_splat_u32(15)),vec_splat_u32(1));
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1404 for(i=0; i<blocksize; i+=4) {
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1405 m = vec_ld(0, mag+i);
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1406 a = vec_ld(0, ang+i);
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1407 t0 = vec_cmple(m, (vector float)vec_splat_u32(0));
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1408 t1 = vec_cmple(a, (vector float)vec_splat_u32(0));
3545
e2a589e55906 Minor fix
lu_zero
parents: 3543
diff changeset
1409 a = vec_xor(a, (vector float) vec_sl((vector unsigned int)t0, v_31));
3546
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1410 t0 = (vector bool int)vec_and(a, t1);
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1411 t1 = (vector bool int)vec_andc(a, t1);
3550
4f4c13574ad5 Yet another typo
lu_zero
parents: 3549
diff changeset
1412 a = vec_sub(m, (vector float)t1);
4f4c13574ad5 Yet another typo
lu_zero
parents: 3549
diff changeset
1413 m = vec_add(m, (vector float)t0);
3549
7b4e34f1ff1f Fix a stupid typo and another error, thanks to Emanuele Giaquinta <exg@gentoo.org> for pointing out the issue and the patch
lu_zero
parents: 3546
diff changeset
1414 vec_stl(a, 0, ang+i);
7b4e34f1ff1f Fix a stupid typo and another error, thanks to Emanuele Giaquinta <exg@gentoo.org> for pointing out the issue and the patch
lu_zero
parents: 3546
diff changeset
1415 vec_stl(m, 0, mag+i);
3543
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1416 }
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1417 }
6aabb2bec46c vorbis_inverse_coupling_altivec
lu_zero
parents: 3533
diff changeset
1418
2057
4c663228e020 avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2056
diff changeset
1419 /* next one assumes that ((line_size % 8) == 0) */
4c663228e020 avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2056
diff changeset
1420 void avg_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
4c663228e020 avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2056
diff changeset
1421 {
4c663228e020 avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2056
diff changeset
1422 POWERPC_PERF_DECLARE(altivec_avg_pixels8_xy2_num, 1);
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1423 register int i;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1424 register vector unsigned char pixelsv1, pixelsv2, pixelsavg;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1425 register vector unsigned char blockv, temp1, temp2, blocktemp;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1426 register vector unsigned short pixelssum1, pixelssum2, temp3;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1427
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1428 register const vector unsigned char vczero = (const vector unsigned char)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1429 vec_splat_u8(0);
5746
55ed6dc5d476 Remove const vector macro indirection that is useless and obfuscating
diego
parents: 5609
diff changeset
1430 register const vector unsigned short vctwo = (const vector unsigned short)
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1431 vec_splat_u16(2);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1432
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1433 temp1 = vec_ld(0, pixels);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1434 temp2 = vec_ld(16, pixels);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1435 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels));
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1436 if ((((unsigned long)pixels) & 0x0000000F) == 0x0000000F) {
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1437 pixelsv2 = temp2;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1438 } else {
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1439 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(1, pixels));
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1440 }
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1441 pixelsv1 = vec_mergeh(vczero, pixelsv1);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1442 pixelsv2 = vec_mergeh(vczero, pixelsv2);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1443 pixelssum1 = vec_add((vector unsigned short)pixelsv1,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1444 (vector unsigned short)pixelsv2);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1445 pixelssum1 = vec_add(pixelssum1, vctwo);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1446
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1447 POWERPC_PERF_START_COUNT(altivec_avg_pixels8_xy2_num, 1);
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1448 for (i = 0; i < h ; i++) {
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1449 int rightside = ((unsigned long)block & 0x0000000F);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1450 blockv = vec_ld(0, block);
2057
4c663228e020 avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2056
diff changeset
1451
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1452 temp1 = vec_ld(line_size, pixels);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1453 temp2 = vec_ld(line_size + 16, pixels);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1454 pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(line_size, pixels));
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1455 if (((((unsigned long)pixels) + line_size) & 0x0000000F) == 0x0000000F)
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1456 {
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1457 pixelsv2 = temp2;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1458 } else {
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1459 pixelsv2 = vec_perm(temp1, temp2, vec_lvsl(line_size + 1, pixels));
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1460 }
2057
4c663228e020 avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2056
diff changeset
1461
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1462 pixelsv1 = vec_mergeh(vczero, pixelsv1);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1463 pixelsv2 = vec_mergeh(vczero, pixelsv2);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1464 pixelssum2 = vec_add((vector unsigned short)pixelsv1,
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1465 (vector unsigned short)pixelsv2);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1466 temp3 = vec_add(pixelssum1, pixelssum2);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1467 temp3 = vec_sra(temp3, vctwo);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1468 pixelssum1 = vec_add(pixelssum2, vctwo);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1469 pixelsavg = vec_packsu(temp3, (vector unsigned short) vczero);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1470
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1471 if (rightside) {
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1472 blocktemp = vec_perm(blockv, pixelsavg, vcprm(0, 1, s0, s1));
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1473 } else {
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1474 blocktemp = vec_perm(blockv, pixelsavg, vcprm(s0, s1, 2, 3));
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1475 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1476
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1477 blockv = vec_avg(blocktemp, blockv);
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1478 vec_st(blockv, 0, block);
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1479
3554
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1480 block += line_size;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1481 pixels += line_size;
ce5554dd79ce Cosmetics: 2->4 spaces and some braces
lu_zero
parents: 3550
diff changeset
1482 }
2967
ef2149182f1c COSMETICS: Remove all trailing whitespace.
diego
parents: 2286
diff changeset
1483
2057
4c663228e020 avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2056
diff changeset
1484 POWERPC_PERF_STOP_COUNT(altivec_avg_pixels8_xy2_num, 1);
4c663228e020 avg_pixels8_xy2_altivec in AltiVec, enabling avg_pixels8_altivec, hadamard fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2056
diff changeset
1485 }
3546
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1486
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1487 void dsputil_init_altivec(DSPContext* c, AVCodecContext *avctx)
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1488 {
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1489 c->pix_abs[0][1] = sad16_x2_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1490 c->pix_abs[0][2] = sad16_y2_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1491 c->pix_abs[0][3] = sad16_xy2_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1492 c->pix_abs[0][0] = sad16_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1493 c->pix_abs[1][0] = sad8_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1494 c->sad[0]= sad16_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1495 c->sad[1]= sad8_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1496 c->pix_norm1 = pix_norm1_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1497 c->sse[1]= sse8_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1498 c->sse[0]= sse16_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1499 c->pix_sum = pix_sum_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1500 c->diff_pixels = diff_pixels_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1501 c->get_pixels = get_pixels_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1502 c->add_bytes= add_bytes_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1503 c->put_pixels_tab[0][0] = put_pixels16_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1504 /* the two functions do the same thing, so use the same code */
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1505 c->put_no_rnd_pixels_tab[0][0] = put_pixels16_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1506 c->avg_pixels_tab[0][0] = avg_pixels16_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1507 c->avg_pixels_tab[1][0] = avg_pixels8_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1508 c->avg_pixels_tab[1][3] = avg_pixels8_xy2_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1509 c->put_pixels_tab[1][3] = put_pixels8_xy2_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1510 c->put_no_rnd_pixels_tab[1][3] = put_no_rnd_pixels8_xy2_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1511 c->put_pixels_tab[0][3] = put_pixels16_xy2_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1512 c->put_no_rnd_pixels_tab[0][3] = put_no_rnd_pixels16_xy2_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1513
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1514 c->hadamard8_diff[0] = hadamard8_diff16_altivec;
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1515 c->hadamard8_diff[1] = hadamard8_diff8x8_altivec;
5752
af20477e9994 Replace CONFIG_VORBIS_DECODER #ifdef by if (ENABLE_VORBIS_DECODER).
diego
parents: 5750
diff changeset
1516 if (ENABLE_VORBIS_DECODER)
5753
25429b089d96 cosmetics: Fix indentation after last commit.
diego
parents: 5752
diff changeset
1517 c->vorbis_inverse_coupling = vorbis_inverse_coupling_altivec;
3546
5f97ba9a4eaa Almost cosmetic changes in dsputil_init_ppc and vorbis_inverse_coupling_altivec:
lu_zero
parents: 3545
diff changeset
1518 }