annotate postproc/swscale_altivec_template.c @ 18715:30d7ddf08889

Fix window position when changing videos while in fullscreen and for window managers that modify position on Map. Oked by Alexander Strasser.
author reimar
date Thu, 15 Jun 2006 08:00:37 +0000
parents a0ab6fed1d14
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
12017
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
1 /*
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
2 AltiVec-enhanced yuv2yuvX
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
3
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
4 Copyright (C) 2004 Romain Dolbeau <romain@dolbeau.org>
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
5 based on the equivalent C code in "postproc/swscale.c"
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
6
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
7
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
8 This program is free software; you can redistribute it and/or modify
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
9 it under the terms of the GNU General Public License as published by
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
10 the Free Software Foundation; either version 2 of the License, or
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
11 (at your option) any later version.
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
12
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
13 This program is distributed in the hope that it will be useful,
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
14 but WITHOUT ANY WARRANTY; without even the implied warranty of
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
15 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
16 GNU General Public License for more details.
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
17
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
18 You should have received a copy of the GNU General Public License
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
19 along with this program; if not, write to the Free Software
17367
401b440a6d76 Update licensing information: The FSF changed postal address.
diego
parents: 15824
diff changeset
20 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
12017
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
21 */
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
22
12130
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
23 #ifdef CONFIG_DARWIN
12532
79a2af950cf7 small linux/altivec compile fix in postproc/ by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12130
diff changeset
24 #define AVV(x...) (x)
12130
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
25 #else
12532
79a2af950cf7 small linux/altivec compile fix in postproc/ by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12130
diff changeset
26 #define AVV(x...) {x}
79a2af950cf7 small linux/altivec compile fix in postproc/ by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12130
diff changeset
27 #endif
79a2af950cf7 small linux/altivec compile fix in postproc/ by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12130
diff changeset
28
18046
a0ab6fed1d14 Reorganize vector constants to work around gcc 4.1 bug:
pacman
parents: 17367
diff changeset
29 #define vzero vec_splat_s32(0)
12130
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
30
12017
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
31 static inline void
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
32 altivec_packIntArrayToCharArray(int *val, uint8_t* dest, int dstW) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
33 register int i;
18046
a0ab6fed1d14 Reorganize vector constants to work around gcc 4.1 bug:
pacman
parents: 17367
diff changeset
34 vector unsigned int altivec_vectorShiftInt19 =
a0ab6fed1d14 Reorganize vector constants to work around gcc 4.1 bug:
pacman
parents: 17367
diff changeset
35 vec_add(vec_splat_u32(10),vec_splat_u32(9));
12017
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
36 if ((unsigned long)dest % 16) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
37 /* badly aligned store, we force store alignement */
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
38 /* and will handle load misalignement on val w/ vec_perm */
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
39 for (i = 0 ; (i < dstW) &&
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
40 (((unsigned long)dest + i) % 16) ; i++) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
41 int t = val[i] >> 19;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
42 dest[i] = (t < 0) ? 0 : ((t > 255) ? 255 : t);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
43 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
44 vector unsigned char perm1 = vec_lvsl(i << 2, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
45 vector signed int v1 = vec_ld(i << 2, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
46 for ( ; i < (dstW - 15); i+=16) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
47 int offset = i << 2;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
48 vector signed int v2 = vec_ld(offset + 16, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
49 vector signed int v3 = vec_ld(offset + 32, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
50 vector signed int v4 = vec_ld(offset + 48, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
51 vector signed int v5 = vec_ld(offset + 64, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
52 vector signed int v12 = vec_perm(v1,v2,perm1);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
53 vector signed int v23 = vec_perm(v2,v3,perm1);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
54 vector signed int v34 = vec_perm(v3,v4,perm1);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
55 vector signed int v45 = vec_perm(v4,v5,perm1);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
56
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
57 vector signed int vA = vec_sra(v12, altivec_vectorShiftInt19);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
58 vector signed int vB = vec_sra(v23, altivec_vectorShiftInt19);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
59 vector signed int vC = vec_sra(v34, altivec_vectorShiftInt19);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
60 vector signed int vD = vec_sra(v45, altivec_vectorShiftInt19);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
61 vector unsigned short vs1 = vec_packsu(vA, vB);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
62 vector unsigned short vs2 = vec_packsu(vC, vD);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
63 vector unsigned char vf = vec_packsu(vs1, vs2);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
64 vec_st(vf, i, dest);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
65 v1 = v5;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
66 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
67 } else { // dest is properly aligned, great
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
68 for (i = 0; i < (dstW - 15); i+=16) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
69 int offset = i << 2;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
70 vector signed int v1 = vec_ld(offset, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
71 vector signed int v2 = vec_ld(offset + 16, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
72 vector signed int v3 = vec_ld(offset + 32, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
73 vector signed int v4 = vec_ld(offset + 48, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
74 vector signed int v5 = vec_sra(v1, altivec_vectorShiftInt19);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
75 vector signed int v6 = vec_sra(v2, altivec_vectorShiftInt19);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
76 vector signed int v7 = vec_sra(v3, altivec_vectorShiftInt19);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
77 vector signed int v8 = vec_sra(v4, altivec_vectorShiftInt19);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
78 vector unsigned short vs1 = vec_packsu(v5, v6);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
79 vector unsigned short vs2 = vec_packsu(v7, v8);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
80 vector unsigned char vf = vec_packsu(vs1, vs2);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
81 vec_st(vf, i, dest);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
82 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
83 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
84 for ( ; i < dstW ; i++) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
85 int t = val[i] >> 19;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
86 dest[i] = (t < 0) ? 0 : ((t > 255) ? 255 : t);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
87 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
88 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
89
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
90 static inline void
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
91 yuv2yuvX_altivec_real(int16_t *lumFilter, int16_t **lumSrc, int lumFilterSize,
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
92 int16_t *chrFilter, int16_t **chrSrc, int chrFilterSize,
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
93 uint8_t *dest, uint8_t *uDest, uint8_t *vDest, int dstW, int chrDstW)
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
94 {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
95 const vector signed int vini = {(1 << 18), (1 << 18), (1 << 18), (1 << 18)};
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
96 register int i, j;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
97 {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
98 int __attribute__ ((aligned (16))) val[dstW];
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
99
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
100 for (i = 0; i < (dstW -7); i+=4) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
101 vec_st(vini, i << 2, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
102 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
103 for (; i < dstW; i++) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
104 val[i] = (1 << 18);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
105 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
106
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
107 for (j = 0; j < lumFilterSize; j++) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
108 vector signed short vLumFilter = vec_ld(j << 1, lumFilter);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
109 vector unsigned char perm0 = vec_lvsl(j << 1, lumFilter);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
110 vLumFilter = vec_perm(vLumFilter, vLumFilter, perm0);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
111 vLumFilter = vec_splat(vLumFilter, 0); // lumFilter[j] is loaded 8 times in vLumFilter
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
112
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
113 vector unsigned char perm = vec_lvsl(0, lumSrc[j]);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
114 vector signed short l1 = vec_ld(0, lumSrc[j]);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
115
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
116 for (i = 0; i < (dstW - 7); i+=8) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
117 int offset = i << 2;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
118 vector signed short l2 = vec_ld((i << 1) + 16, lumSrc[j]);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
119
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
120 vector signed int v1 = vec_ld(offset, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
121 vector signed int v2 = vec_ld(offset + 16, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
122
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
123 vector signed short ls = vec_perm(l1, l2, perm); // lumSrc[j][i] ... lumSrc[j][i+7]
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
124
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
125 vector signed int i1 = vec_mule(vLumFilter, ls);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
126 vector signed int i2 = vec_mulo(vLumFilter, ls);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
127
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
128 vector signed int vf1 = vec_mergeh(i1, i2);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
129 vector signed int vf2 = vec_mergel(i1, i2); // lumSrc[j][i] * lumFilter[j] ... lumSrc[j][i+7] * lumFilter[j]
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
130
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
131 vector signed int vo1 = vec_add(v1, vf1);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
132 vector signed int vo2 = vec_add(v2, vf2);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
133
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
134 vec_st(vo1, offset, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
135 vec_st(vo2, offset + 16, val);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
136
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
137 l1 = l2;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
138 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
139 for ( ; i < dstW; i++) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
140 val[i] += lumSrc[j][i] * lumFilter[j];
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
141 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
142 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
143 altivec_packIntArrayToCharArray(val,dest,dstW);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
144 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
145 if (uDest != 0) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
146 int __attribute__ ((aligned (16))) u[chrDstW];
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
147 int __attribute__ ((aligned (16))) v[chrDstW];
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
148
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
149 for (i = 0; i < (chrDstW -7); i+=4) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
150 vec_st(vini, i << 2, u);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
151 vec_st(vini, i << 2, v);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
152 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
153 for (; i < chrDstW; i++) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
154 u[i] = (1 << 18);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
155 v[i] = (1 << 18);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
156 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
157
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
158 for (j = 0; j < chrFilterSize; j++) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
159 vector signed short vChrFilter = vec_ld(j << 1, chrFilter);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
160 vector unsigned char perm0 = vec_lvsl(j << 1, chrFilter);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
161 vChrFilter = vec_perm(vChrFilter, vChrFilter, perm0);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
162 vChrFilter = vec_splat(vChrFilter, 0); // chrFilter[j] is loaded 8 times in vChrFilter
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
163
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
164 vector unsigned char perm = vec_lvsl(0, chrSrc[j]);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
165 vector signed short l1 = vec_ld(0, chrSrc[j]);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
166 vector signed short l1_V = vec_ld(2048 << 1, chrSrc[j]);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
167
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
168 for (i = 0; i < (chrDstW - 7); i+=8) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
169 int offset = i << 2;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
170 vector signed short l2 = vec_ld((i << 1) + 16, chrSrc[j]);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
171 vector signed short l2_V = vec_ld(((i + 2048) << 1) + 16, chrSrc[j]);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
172
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
173 vector signed int v1 = vec_ld(offset, u);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
174 vector signed int v2 = vec_ld(offset + 16, u);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
175 vector signed int v1_V = vec_ld(offset, v);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
176 vector signed int v2_V = vec_ld(offset + 16, v);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
177
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
178 vector signed short ls = vec_perm(l1, l2, perm); // chrSrc[j][i] ... chrSrc[j][i+7]
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
179 vector signed short ls_V = vec_perm(l1_V, l2_V, perm); // chrSrc[j][i+2048] ... chrSrc[j][i+2055]
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
180
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
181 vector signed int i1 = vec_mule(vChrFilter, ls);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
182 vector signed int i2 = vec_mulo(vChrFilter, ls);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
183 vector signed int i1_V = vec_mule(vChrFilter, ls_V);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
184 vector signed int i2_V = vec_mulo(vChrFilter, ls_V);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
185
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
186 vector signed int vf1 = vec_mergeh(i1, i2);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
187 vector signed int vf2 = vec_mergel(i1, i2); // chrSrc[j][i] * chrFilter[j] ... chrSrc[j][i+7] * chrFilter[j]
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
188 vector signed int vf1_V = vec_mergeh(i1_V, i2_V);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
189 vector signed int vf2_V = vec_mergel(i1_V, i2_V); // chrSrc[j][i] * chrFilter[j] ... chrSrc[j][i+7] * chrFilter[j]
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
190
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
191 vector signed int vo1 = vec_add(v1, vf1);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
192 vector signed int vo2 = vec_add(v2, vf2);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
193 vector signed int vo1_V = vec_add(v1_V, vf1_V);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
194 vector signed int vo2_V = vec_add(v2_V, vf2_V);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
195
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
196 vec_st(vo1, offset, u);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
197 vec_st(vo2, offset + 16, u);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
198 vec_st(vo1_V, offset, v);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
199 vec_st(vo2_V, offset + 16, v);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
200
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
201 l1 = l2;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
202 l1_V = l2_V;
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
203 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
204 for ( ; i < chrDstW; i++) {
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
205 u[i] += chrSrc[j][i] * chrFilter[j];
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
206 v[i] += chrSrc[j][i + 2048] * chrFilter[j];
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
207 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
208 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
209 altivec_packIntArrayToCharArray(u,uDest,chrDstW);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
210 altivec_packIntArrayToCharArray(v,vDest,chrDstW);
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
211 }
21e5cb258a95 AltiVec support in postproc/ + altivec optimizations for yuv2yuvX patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents:
diff changeset
212 }
12130
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
213
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
214 static inline void hScale_altivec_real(int16_t *dst, int dstW, uint8_t *src, int srcW, int xInc, int16_t *filter, int16_t *filterPos, int filterSize) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
215 register int i;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
216 int __attribute__ ((aligned (16))) tempo[4];
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
217
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
218 if (filterSize % 4) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
219 for(i=0; i<dstW; i++) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
220 register int j;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
221 register int srcPos = filterPos[i];
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
222 register int val = 0;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
223 for(j=0; j<filterSize; j++) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
224 val += ((int)src[srcPos + j])*filter[filterSize*i + j];
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
225 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
226 dst[i] = MIN(MAX(0, val>>7), (1<<15)-1);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
227 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
228 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
229 else
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
230 switch (filterSize) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
231 case 4:
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
232 {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
233 for(i=0; i<dstW; i++) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
234 register int j;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
235 register int srcPos = filterPos[i];
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
236
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
237 vector unsigned char src_v0 = vec_ld(srcPos, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
238 vector unsigned char src_v1;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
239 if ((((int)src + srcPos)% 16) > 12) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
240 src_v1 = vec_ld(srcPos + 16, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
241 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
242 vector unsigned char src_vF = vec_perm(src_v0, src_v1, vec_lvsl(srcPos, src));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
243
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
244 vector signed short src_v = // vec_unpackh sign-extends...
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
245 (vector signed short)(vec_mergeh((vector unsigned char)vzero, src_vF));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
246 // now put our elements in the even slots
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
247 src_v = vec_mergeh(src_v, (vector signed short)vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
248
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
249 vector signed short filter_v = vec_ld(i << 3, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
250 // the 3 above is 2 (filterSize == 4) + 1 (sizeof(short) == 2)
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
251
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
252 // the neat trick : we only care for half the elements,
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
253 // high or low depending on (i<<3)%16 (it's 0 or 8 here),
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
254 // and we're going to use vec_mule, so we chose
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
255 // carefully how to "unpack" the elements into the even slots
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
256 if ((i << 3) % 16)
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
257 filter_v = vec_mergel(filter_v,(vector signed short)vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
258 else
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
259 filter_v = vec_mergeh(filter_v,(vector signed short)vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
260
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
261 vector signed int val_vEven = vec_mule(src_v, filter_v);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
262 vector signed int val_s = vec_sums(val_vEven, vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
263 vec_st(val_s, 0, tempo);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
264 dst[i] = MIN(MAX(0, tempo[3]>>7), (1<<15)-1);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
265 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
266 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
267 break;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
268
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
269 case 8:
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
270 {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
271 for(i=0; i<dstW; i++) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
272 register int srcPos = filterPos[i];
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
273
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
274 vector unsigned char src_v0 = vec_ld(srcPos, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
275 vector unsigned char src_v1;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
276 if ((((int)src + srcPos)% 16) > 8) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
277 src_v1 = vec_ld(srcPos + 16, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
278 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
279 vector unsigned char src_vF = vec_perm(src_v0, src_v1, vec_lvsl(srcPos, src));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
280
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
281 vector signed short src_v = // vec_unpackh sign-extends...
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
282 (vector signed short)(vec_mergeh((vector unsigned char)vzero, src_vF));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
283 vector signed short filter_v = vec_ld(i << 4, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
284 // the 4 above is 3 (filterSize == 8) + 1 (sizeof(short) == 2)
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
285
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
286 vector signed int val_v = vec_msums(src_v, filter_v, (vector signed int)vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
287 vector signed int val_s = vec_sums(val_v, vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
288 vec_st(val_s, 0, tempo);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
289 dst[i] = MIN(MAX(0, tempo[3]>>7), (1<<15)-1);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
290 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
291 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
292 break;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
293
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
294 case 16:
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
295 {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
296 for(i=0; i<dstW; i++) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
297 register int srcPos = filterPos[i];
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
298
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
299 vector unsigned char src_v0 = vec_ld(srcPos, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
300 vector unsigned char src_v1 = vec_ld(srcPos + 16, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
301 vector unsigned char src_vF = vec_perm(src_v0, src_v1, vec_lvsl(srcPos, src));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
302
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
303 vector signed short src_vA = // vec_unpackh sign-extends...
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
304 (vector signed short)(vec_mergeh((vector unsigned char)vzero, src_vF));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
305 vector signed short src_vB = // vec_unpackh sign-extends...
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
306 (vector signed short)(vec_mergel((vector unsigned char)vzero, src_vF));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
307
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
308 vector signed short filter_v0 = vec_ld(i << 5, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
309 vector signed short filter_v1 = vec_ld((i << 5) + 16, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
310 // the 5 above are 4 (filterSize == 16) + 1 (sizeof(short) == 2)
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
311
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
312 vector signed int val_acc = vec_msums(src_vA, filter_v0, (vector signed int)vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
313 vector signed int val_v = vec_msums(src_vB, filter_v1, val_acc);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
314
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
315 vector signed int val_s = vec_sums(val_v, vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
316
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
317 vec_st(val_s, 0, tempo);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
318 dst[i] = MIN(MAX(0, tempo[3]>>7), (1<<15)-1);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
319 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
320 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
321 break;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
322
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
323 default:
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
324 {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
325 for(i=0; i<dstW; i++) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
326 register int j;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
327 register int srcPos = filterPos[i];
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
328
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
329 vector signed int val_v = (vector signed int)vzero;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
330 vector signed short filter_v0R = vec_ld(i * 2 * filterSize, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
331 vector unsigned char permF = vec_lvsl((i * 2 * filterSize), filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
332
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
333 vector unsigned char src_v0 = vec_ld(srcPos, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
334 vector unsigned char permS = vec_lvsl(srcPos, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
335
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
336 for (j = 0 ; j < filterSize - 15; j += 16) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
337 vector unsigned char src_v1 = vec_ld(srcPos + j + 16, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
338 vector unsigned char src_vF = vec_perm(src_v0, src_v1, permS);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
339
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
340 vector signed short src_vA = // vec_unpackh sign-extends...
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
341 (vector signed short)(vec_mergeh((vector unsigned char)vzero, src_vF));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
342 vector signed short src_vB = // vec_unpackh sign-extends...
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
343 (vector signed short)(vec_mergel((vector unsigned char)vzero, src_vF));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
344
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
345 vector signed short filter_v1R = vec_ld((i * 2 * filterSize) + (j * 2) + 16, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
346 vector signed short filter_v2R = vec_ld((i * 2 * filterSize) + (j * 2) + 32, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
347 vector signed short filter_v0 = vec_perm(filter_v0R, filter_v1R, permF);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
348 vector signed short filter_v1 = vec_perm(filter_v1R, filter_v2R, permF);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
349
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
350 vector signed int val_acc = vec_msums(src_vA, filter_v0, val_v);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
351 val_v = vec_msums(src_vB, filter_v1, val_acc);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
352
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
353 filter_v0R = filter_v2R;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
354 src_v0 = src_v1;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
355 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
356
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
357 if (j < (filterSize-7)) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
358 // loading src_v0 is useless, it's already done above
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
359 //vector unsigned char src_v0 = vec_ld(srcPos + j, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
360 vector unsigned char src_v1;
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
361 if ((((int)src + srcPos)% 16) > 8) {
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
362 src_v1 = vec_ld(srcPos + j + 16, src);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
363 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
364 vector unsigned char src_vF = vec_perm(src_v0, src_v1, permS);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
365
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
366 vector signed short src_v = // vec_unpackh sign-extends...
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
367 (vector signed short)(vec_mergeh((vector unsigned char)vzero, src_vF));
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
368 // loading filter_v0R is useless, it's already done above
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
369 //vector signed short filter_v0R = vec_ld((i * 2 * filterSize) + j, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
370 vector signed short filter_v1R = vec_ld((i * 2 * filterSize) + (j * 2) + 16, filter);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
371 vector signed short filter_v = vec_perm(filter_v0R, filter_v1R, permF);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
372
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
373 val_v = vec_msums(src_v, filter_v, val_v);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
374 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
375
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
376 vector signed int val_s = vec_sums(val_v, vzero);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
377
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
378 vec_st(val_s, 0, tempo);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
379 dst[i] = MIN(MAX(0, tempo[3]>>7), (1<<15)-1);
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
380 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
381
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
382 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
383 }
2ef24558b732 AltiVec hScale, all size patch by (Romain Dolbeau <dolbeaur at club-internet dot fr>)
michael
parents: 12017
diff changeset
384 }
12768
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
385
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
386 static inline int yv12toyuy2_unscaled_altivec(SwsContext *c, uint8_t* src[], int srcStride[], int srcSliceY,
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
387 int srcSliceH, uint8_t* dstParam[], int dstStride_a[]) {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
388 uint8_t *dst=dstParam[0] + dstStride_a[0]*srcSliceY;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
389 // yv12toyuy2( src[0],src[1],src[2],dst,c->srcW,srcSliceH,srcStride[0],srcStride[1],dstStride[0] );
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
390 uint8_t *ysrc = src[0];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
391 uint8_t *usrc = src[1];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
392 uint8_t *vsrc = src[2];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
393 const int width = c->srcW;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
394 const int height = srcSliceH;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
395 const int lumStride = srcStride[0];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
396 const int chromStride = srcStride[1];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
397 const int dstStride = dstStride_a[0];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
398 const vector unsigned char yperm = vec_lvsl(0, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
399 const int vertLumPerChroma = 2;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
400 register unsigned int y;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
401
15824
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
402 if(width&15){
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
403 yv12toyuy2( ysrc, usrc, vsrc, dst,c->srcW,srcSliceH, lumStride, chromStride, dstStride);
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
404 return srcSliceH;
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
405 }
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
406
12768
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
407 /* this code assume:
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
408
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
409 1) dst is 16 bytes-aligned
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
410 2) dstStride is a multiple of 16
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
411 3) width is a multiple of 16
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
412 4) lum&chrom stride are multiple of 8
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
413 */
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
414
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
415 for(y=0; y<height; y++)
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
416 {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
417 int i;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
418 for (i = 0; i < width - 31; i+= 32) {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
419 const unsigned int j = i >> 1;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
420 vector unsigned char v_yA = vec_ld(i, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
421 vector unsigned char v_yB = vec_ld(i + 16, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
422 vector unsigned char v_yC = vec_ld(i + 32, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
423 vector unsigned char v_y1 = vec_perm(v_yA, v_yB, yperm);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
424 vector unsigned char v_y2 = vec_perm(v_yB, v_yC, yperm);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
425 vector unsigned char v_uA = vec_ld(j, usrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
426 vector unsigned char v_uB = vec_ld(j + 16, usrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
427 vector unsigned char v_u = vec_perm(v_uA, v_uB, vec_lvsl(j, usrc));
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
428 vector unsigned char v_vA = vec_ld(j, vsrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
429 vector unsigned char v_vB = vec_ld(j + 16, vsrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
430 vector unsigned char v_v = vec_perm(v_vA, v_vB, vec_lvsl(j, vsrc));
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
431 vector unsigned char v_uv_a = vec_mergeh(v_u, v_v);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
432 vector unsigned char v_uv_b = vec_mergel(v_u, v_v);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
433 vector unsigned char v_yuy2_0 = vec_mergeh(v_y1, v_uv_a);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
434 vector unsigned char v_yuy2_1 = vec_mergel(v_y1, v_uv_a);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
435 vector unsigned char v_yuy2_2 = vec_mergeh(v_y2, v_uv_b);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
436 vector unsigned char v_yuy2_3 = vec_mergel(v_y2, v_uv_b);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
437 vec_st(v_yuy2_0, (i << 1), dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
438 vec_st(v_yuy2_1, (i << 1) + 16, dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
439 vec_st(v_yuy2_2, (i << 1) + 32, dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
440 vec_st(v_yuy2_3, (i << 1) + 48, dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
441 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
442 if (i < width) {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
443 const unsigned int j = i >> 1;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
444 vector unsigned char v_y1 = vec_ld(i, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
445 vector unsigned char v_u = vec_ld(j, usrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
446 vector unsigned char v_v = vec_ld(j, vsrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
447 vector unsigned char v_uv_a = vec_mergeh(v_u, v_v);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
448 vector unsigned char v_yuy2_0 = vec_mergeh(v_y1, v_uv_a);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
449 vector unsigned char v_yuy2_1 = vec_mergel(v_y1, v_uv_a);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
450 vec_st(v_yuy2_0, (i << 1), dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
451 vec_st(v_yuy2_1, (i << 1) + 16, dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
452 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
453 if((y&(vertLumPerChroma-1))==(vertLumPerChroma-1) )
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
454 {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
455 usrc += chromStride;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
456 vsrc += chromStride;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
457 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
458 ysrc += lumStride;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
459 dst += dstStride;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
460 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
461
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
462 return srcSliceH;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
463 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
464
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
465 static inline int yv12touyvy_unscaled_altivec(SwsContext *c, uint8_t* src[], int srcStride[], int srcSliceY,
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
466 int srcSliceH, uint8_t* dstParam[], int dstStride_a[]) {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
467 uint8_t *dst=dstParam[0] + dstStride_a[0]*srcSliceY;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
468 // yv12toyuy2( src[0],src[1],src[2],dst,c->srcW,srcSliceH,srcStride[0],srcStride[1],dstStride[0] );
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
469 uint8_t *ysrc = src[0];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
470 uint8_t *usrc = src[1];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
471 uint8_t *vsrc = src[2];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
472 const int width = c->srcW;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
473 const int height = srcSliceH;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
474 const int lumStride = srcStride[0];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
475 const int chromStride = srcStride[1];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
476 const int dstStride = dstStride_a[0];
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
477 const int vertLumPerChroma = 2;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
478 const vector unsigned char yperm = vec_lvsl(0, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
479 register unsigned int y;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
480
15824
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
481 if(width&15){
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
482 yv12touyvy( ysrc, usrc, vsrc, dst,c->srcW,srcSliceH, lumStride, chromStride, dstStride);
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
483 return srcSliceH;
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
484 }
53231c701b29 width % 16 != 0 workaround by (Nicolas Plourde: nicolas plourde, gmail com>)
michael
parents: 12768
diff changeset
485
12768
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
486 /* this code assume:
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
487
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
488 1) dst is 16 bytes-aligned
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
489 2) dstStride is a multiple of 16
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
490 3) width is a multiple of 16
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
491 4) lum&chrom stride are multiple of 8
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
492 */
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
493
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
494 for(y=0; y<height; y++)
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
495 {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
496 int i;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
497 for (i = 0; i < width - 31; i+= 32) {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
498 const unsigned int j = i >> 1;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
499 vector unsigned char v_yA = vec_ld(i, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
500 vector unsigned char v_yB = vec_ld(i + 16, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
501 vector unsigned char v_yC = vec_ld(i + 32, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
502 vector unsigned char v_y1 = vec_perm(v_yA, v_yB, yperm);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
503 vector unsigned char v_y2 = vec_perm(v_yB, v_yC, yperm);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
504 vector unsigned char v_uA = vec_ld(j, usrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
505 vector unsigned char v_uB = vec_ld(j + 16, usrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
506 vector unsigned char v_u = vec_perm(v_uA, v_uB, vec_lvsl(j, usrc));
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
507 vector unsigned char v_vA = vec_ld(j, vsrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
508 vector unsigned char v_vB = vec_ld(j + 16, vsrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
509 vector unsigned char v_v = vec_perm(v_vA, v_vB, vec_lvsl(j, vsrc));
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
510 vector unsigned char v_uv_a = vec_mergeh(v_u, v_v);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
511 vector unsigned char v_uv_b = vec_mergel(v_u, v_v);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
512 vector unsigned char v_uyvy_0 = vec_mergeh(v_uv_a, v_y1);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
513 vector unsigned char v_uyvy_1 = vec_mergel(v_uv_a, v_y1);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
514 vector unsigned char v_uyvy_2 = vec_mergeh(v_uv_b, v_y2);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
515 vector unsigned char v_uyvy_3 = vec_mergel(v_uv_b, v_y2);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
516 vec_st(v_uyvy_0, (i << 1), dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
517 vec_st(v_uyvy_1, (i << 1) + 16, dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
518 vec_st(v_uyvy_2, (i << 1) + 32, dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
519 vec_st(v_uyvy_3, (i << 1) + 48, dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
520 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
521 if (i < width) {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
522 const unsigned int j = i >> 1;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
523 vector unsigned char v_y1 = vec_ld(i, ysrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
524 vector unsigned char v_u = vec_ld(j, usrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
525 vector unsigned char v_v = vec_ld(j, vsrc);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
526 vector unsigned char v_uv_a = vec_mergeh(v_u, v_v);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
527 vector unsigned char v_uyvy_0 = vec_mergeh(v_uv_a, v_y1);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
528 vector unsigned char v_uyvy_1 = vec_mergel(v_uv_a, v_y1);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
529 vec_st(v_uyvy_0, (i << 1), dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
530 vec_st(v_uyvy_1, (i << 1) + 16, dst);
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
531 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
532 if((y&(vertLumPerChroma-1))==(vertLumPerChroma-1) )
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
533 {
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
534 usrc += chromStride;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
535 vsrc += chromStride;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
536 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
537 ysrc += lumStride;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
538 dst += dstStride;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
539 }
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
540 return srcSliceH;
931eee818c52 Altivec unscaled YV12 -> packed YUV patch by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 12532
diff changeset
541 }