annotate i386/vp3dsp_mmx.c @ 2504:f12657081093 libavcodec

INTRA PCM macroblocks support patch by (Loic )lll+ffmpeg m4x org) This patch adds the support for INTRA PCM macroblocks in CAVLC and CABAC mode, the deblocking needed a small modification and so did the intra4x4_pred_mode prediction. With this patch, the 5 streams of the conformance suite containing INTRA PCM macroblocks now decode entirely, 4 are completely corrects, 1 is incorrect since the first B slice because of deblocking in B slice not yet implemented. The code is not optimized for speed, it is not necessary IPCM macroblocks are rare, but it could be optimized for code size, if someone want to do this, feel free.
author michael
date Mon, 07 Feb 2005 00:10:28 +0000
parents 89422281f6f6
children 9699d325049d
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
1 /*
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
2 * Copyright (C) 2004 the ffmpeg project
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
3 *
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
4 * This library is free software; you can redistribute it and/or
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
5 * modify it under the terms of the GNU Lesser General Public
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
6 * License as published by the Free Software Foundation; either
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
7 * version 2 of the License, or (at your option) any later version.
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
8 *
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
9 * This library is distributed in the hope that it will be useful,
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
10 * but WITHOUT ANY WARRANTY; without even the implied warranty of
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
12 * Lesser General Public License for more details.
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
13 *
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
14 * You should have received a copy of the GNU Lesser General Public
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
15 * License along with this library; if not, write to the Free Software
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
16 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
17 */
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
18
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
19 /**
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
20 * @file vp3dsp_mmx.c
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
21 * MMX-optimized functions cribbed from the original VP3 source code.
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
22 */
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
23
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
24 #include "../dsputil.h"
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
25 #include "mmx.h"
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
26
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
27 #define IdctAdjustBeforeShift 8
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
28
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
29 /* (12 * 4) 2-byte memory locations ( = 96 bytes total)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
30 * idct_constants[0..15] = Mask table (M(I))
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
31 * idct_constants[16..43] = Cosine table (C(I))
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
32 * idct_constants[44..47] = 8
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
33 */
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
34 static uint16_t idct_constants[(4 + 7 + 1) * 4];
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
35 static uint16_t idct_cosine_table[7] = {
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
36 64277, 60547, 54491, 46341, 36410, 25080, 12785
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
37 };
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
38
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
39 #define r0 mm0
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
40 #define r1 mm1
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
41 #define r2 mm2
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
42 #define r3 mm3
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
43 #define r4 mm4
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
44 #define r5 mm5
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
45 #define r6 mm6
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
46 #define r7 mm7
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
47
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
48 /* from original comments: The Macro does IDct on 4 1-D Dcts */
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
49 #define BeginIDCT() { \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
50 movq_m2r(*I(3), r2); \
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
51 movq_m2r(*C(3), r6); \
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
52 movq_r2r(r2, r4); \
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
53 movq_m2r(*J(5), r7); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
54 pmulhw_r2r(r6, r4); /* r4 = c3*i3 - i3 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
55 movq_m2r(*C(5), r1); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
56 pmulhw_r2r(r7, r6); /* r6 = c3*i5 - i5 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
57 movq_r2r(r1, r5); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
58 pmulhw_r2r(r2, r1); /* r1 = c5*i3 - i3 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
59 movq_m2r(*I(1), r3); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
60 pmulhw_r2r(r7, r5); /* r5 = c5*i5 - i5 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
61 movq_m2r(*C(1), r0); /* (all registers are in use) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
62 paddw_r2r(r2, r4); /* r4 = c3*i3 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
63 paddw_r2r(r7, r6); /* r6 = c3*i5 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
64 paddw_r2r(r1, r2); /* r2 = c5*i3 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
65 movq_m2r(*J(7), r1); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
66 paddw_r2r(r5, r7); /* r7 = c5*i5 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
67 movq_r2r(r0, r5); /* r5 = c1 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
68 pmulhw_r2r(r3, r0); /* r0 = c1*i1 - i1 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
69 paddsw_r2r(r7, r4); /* r4 = C = c3*i3 + c5*i5 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
70 pmulhw_r2r(r1, r5); /* r5 = c1*i7 - i7 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
71 movq_m2r(*C(7), r7); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
72 psubsw_r2r(r2, r6); /* r6 = D = c3*i5 - c5*i3 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
73 paddw_r2r(r3, r0); /* r0 = c1*i1 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
74 pmulhw_r2r(r7, r3); /* r3 = c7*i1 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
75 movq_m2r(*I(2), r2); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
76 pmulhw_r2r(r1, r7); /* r7 = c7*i7 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
77 paddw_r2r(r1, r5); /* r5 = c1*i7 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
78 movq_r2r(r2, r1); /* r1 = i2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
79 pmulhw_m2r(*C(2), r2); /* r2 = c2*i2 - i2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
80 psubsw_r2r(r5, r3); /* r3 = B = c7*i1 - c1*i7 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
81 movq_m2r(*J(6), r5); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
82 paddsw_r2r(r7, r0); /* r0 = A = c1*i1 + c7*i7 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
83 movq_r2r(r5, r7); /* r7 = i6 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
84 psubsw_r2r(r4, r0); /* r0 = A - C */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
85 pmulhw_m2r(*C(2), r5); /* r5 = c2*i6 - i6 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
86 paddw_r2r(r1, r2); /* r2 = c2*i2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
87 pmulhw_m2r(*C(6), r1); /* r1 = c6*i2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
88 paddsw_r2r(r4, r4); /* r4 = C + C */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
89 paddsw_r2r(r0, r4); /* r4 = C. = A + C */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
90 psubsw_r2r(r6, r3); /* r3 = B - D */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
91 paddw_r2r(r7, r5); /* r5 = c2*i6 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
92 paddsw_r2r(r6, r6); /* r6 = D + D */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
93 pmulhw_m2r(*C(6), r7); /* r7 = c6*i6 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
94 paddsw_r2r(r3, r6); /* r6 = D. = B + D */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
95 movq_r2m(r4, *I(1)); /* save C. at I(1) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
96 psubsw_r2r(r5, r1); /* r1 = H = c6*i2 - c2*i6 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
97 movq_m2r(*C(4), r4); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
98 movq_r2r(r3, r5); /* r5 = B - D */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
99 pmulhw_r2r(r4, r3); /* r3 = (c4 - 1) * (B - D) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
100 paddsw_r2r(r2, r7); /* r7 = G = c6*i6 + c2*i2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
101 movq_r2m(r6, *I(2)); /* save D. at I(2) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
102 movq_r2r(r0, r2); /* r2 = A - C */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
103 movq_m2r(*I(0), r6); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
104 pmulhw_r2r(r4, r0); /* r0 = (c4 - 1) * (A - C) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
105 paddw_r2r(r3, r5); /* r5 = B. = c4 * (B - D) */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
106 movq_m2r(*J(4), r3); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
107 psubsw_r2r(r1, r5); /* r5 = B.. = B. - H */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
108 paddw_r2r(r0, r2); /* r0 = A. = c4 * (A - C) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
109 psubsw_r2r(r3, r6); /* r6 = i0 - i4 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
110 movq_r2r(r6, r0); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
111 pmulhw_r2r(r4, r6); /* r6 = (c4 - 1) * (i0 - i4) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
112 paddsw_r2r(r3, r3); /* r3 = i4 + i4 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
113 paddsw_r2r(r1, r1); /* r1 = H + H */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
114 paddsw_r2r(r0, r3); /* r3 = i0 + i4 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
115 paddsw_r2r(r5, r1); /* r1 = H. = B + H */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
116 pmulhw_r2r(r3, r4); /* r4 = (c4 - 1) * (i0 + i4) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
117 paddsw_r2r(r0, r6); /* r6 = F = c4 * (i0 - i4) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
118 psubsw_r2r(r2, r6); /* r6 = F. = F - A. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
119 paddsw_r2r(r2, r2); /* r2 = A. + A. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
120 movq_m2r(*I(1), r0); /* r0 = C. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
121 paddsw_r2r(r6, r2); /* r2 = A.. = F + A. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
122 paddw_r2r(r3, r4); /* r4 = E = c4 * (i0 + i4) */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
123 psubsw_r2r(r1, r2); /* r2 = R2 = A.. - H. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
124 }
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
125
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
126 /* RowIDCT gets ready to transpose */
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
127 #define RowIDCT() { \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
128 \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
129 BeginIDCT(); \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
130 \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
131 movq_m2r(*I(2), r3); /* r3 = D. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
132 psubsw_r2r(r7, r4); /* r4 = E. = E - G */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
133 paddsw_r2r(r1, r1); /* r1 = H. + H. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
134 paddsw_r2r(r7, r7); /* r7 = G + G */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
135 paddsw_r2r(r2, r1); /* r1 = R1 = A.. + H. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
136 paddsw_r2r(r4, r7); /* r7 = G. = E + G */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
137 psubsw_r2r(r3, r4); /* r4 = R4 = E. - D. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
138 paddsw_r2r(r3, r3); \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
139 psubsw_r2r(r5, r6); /* r6 = R6 = F. - B.. */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
140 paddsw_r2r(r5, r5); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
141 paddsw_r2r(r4, r3); /* r3 = R3 = E. + D. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
142 paddsw_r2r(r6, r5); /* r5 = R5 = F. + B.. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
143 psubsw_r2r(r0, r7); /* r7 = R7 = G. - C. */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
144 paddsw_r2r(r0, r0); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
145 movq_r2m(r1, *I(1)); /* save R1 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
146 paddsw_r2r(r7, r0); /* r0 = R0 = G. + C. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
147 }
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
148
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
149 /* Column IDCT normalizes and stores final results */
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
150 #define ColumnIDCT() { \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
151 \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
152 BeginIDCT(); \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
153 \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
154 paddsw_m2r(*Eight, r2); /* adjust R2 (and R1) for shift */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
155 paddsw_r2r(r1, r1); /* r1 = H. + H. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
156 paddsw_r2r(r2, r1); /* r1 = R1 = A.. + H. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
157 psraw_i2r(4, r2); /* r2 = NR2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
158 psubsw_r2r(r7, r4); /* r4 = E. = E - G */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
159 psraw_i2r(4, r1); /* r1 = NR1 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
160 movq_m2r(*I(2), r3); /* r3 = D. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
161 paddsw_r2r(r7, r7); /* r7 = G + G */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
162 movq_r2m(r2, *I(2)); /* store NR2 at I2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
163 paddsw_r2r(r4, r7); /* r7 = G. = E + G */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
164 movq_r2m(r1, *I(1)); /* store NR1 at I1 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
165 psubsw_r2r(r3, r4); /* r4 = R4 = E. - D. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
166 paddsw_m2r(*Eight, r4); /* adjust R4 (and R3) for shift */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
167 paddsw_r2r(r3, r3); /* r3 = D. + D. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
168 paddsw_r2r(r4, r3); /* r3 = R3 = E. + D. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
169 psraw_i2r(4, r4); /* r4 = NR4 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
170 psubsw_r2r(r5, r6); /* r6 = R6 = F. - B.. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
171 psraw_i2r(4, r3); /* r3 = NR3 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
172 paddsw_m2r(*Eight, r6); /* adjust R6 (and R5) for shift */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
173 paddsw_r2r(r5, r5); /* r5 = B.. + B.. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
174 paddsw_r2r(r6, r5); /* r5 = R5 = F. + B.. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
175 psraw_i2r(4, r6); /* r6 = NR6 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
176 movq_r2m(r4, *J(4)); /* store NR4 at J4 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
177 psraw_i2r(4, r5); /* r5 = NR5 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
178 movq_r2m(r3, *I(3)); /* store NR3 at I3 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
179 psubsw_r2r(r0, r7); /* r7 = R7 = G. - C. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
180 paddsw_m2r(*Eight, r7); /* adjust R7 (and R0) for shift */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
181 paddsw_r2r(r0, r0); /* r0 = C. + C. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
182 paddsw_r2r(r7, r0); /* r0 = R0 = G. + C. */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
183 psraw_i2r(4, r7); /* r7 = NR7 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
184 movq_r2m(r6, *J(6)); /* store NR6 at J6 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
185 psraw_i2r(4, r0); /* r0 = NR0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
186 movq_r2m(r5, *J(5)); /* store NR5 at J5 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
187 movq_r2m(r7, *J(7)); /* store NR7 at J7 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
188 movq_r2m(r0, *I(0)); /* store NR0 at I0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
189 }
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
190
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
191 /* Following macro does two 4x4 transposes in place.
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
192
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
193 At entry (we assume):
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
194
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
195 r0 = a3 a2 a1 a0
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
196 I(1) = b3 b2 b1 b0
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
197 r2 = c3 c2 c1 c0
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
198 r3 = d3 d2 d1 d0
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
199
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
200 r4 = e3 e2 e1 e0
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
201 r5 = f3 f2 f1 f0
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
202 r6 = g3 g2 g1 g0
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
203 r7 = h3 h2 h1 h0
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
204
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
205 At exit, we have:
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
206
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
207 I(0) = d0 c0 b0 a0
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
208 I(1) = d1 c1 b1 a1
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
209 I(2) = d2 c2 b2 a2
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
210 I(3) = d3 c3 b3 a3
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
211
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
212 J(4) = h0 g0 f0 e0
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
213 J(5) = h1 g1 f1 e1
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
214 J(6) = h2 g2 f2 e2
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
215 J(7) = h3 g3 f3 e3
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
216
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
217 I(0) I(1) I(2) I(3) is the transpose of r0 I(1) r2 r3.
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
218 J(4) J(5) J(6) J(7) is the transpose of r4 r5 r6 r7.
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
219
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
220 Since r1 is free at entry, we calculate the Js first. */
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
221
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
222 #define Transpose() { \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
223 movq_r2r(r4, r1); /* r1 = e3 e2 e1 e0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
224 punpcklwd_r2r(r5, r4); /* r4 = f1 e1 f0 e0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
225 movq_r2m(r0, *I(0)); /* save a3 a2 a1 a0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
226 punpckhwd_r2r(r5, r1); /* r1 = f3 e3 f2 e2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
227 movq_r2r(r6, r0); /* r0 = g3 g2 g1 g0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
228 punpcklwd_r2r(r7, r6); /* r6 = h1 g1 h0 g0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
229 movq_r2r(r4, r5); /* r5 = f1 e1 f0 e0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
230 punpckldq_r2r(r6, r4); /* r4 = h0 g0 f0 e0 = R4 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
231 punpckhdq_r2r(r6, r5); /* r5 = h1 g1 f1 e1 = R5 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
232 movq_r2r(r1, r6); /* r6 = f3 e3 f2 e2 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
233 movq_r2m(r4, *J(4)); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
234 punpckhwd_r2r(r7, r0); /* r0 = h3 g3 h2 g2 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
235 movq_r2m(r5, *J(5)); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
236 punpckhdq_r2r(r0, r6); /* r6 = h3 g3 f3 e3 = R7 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
237 movq_m2r(*I(0), r4); /* r4 = a3 a2 a1 a0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
238 punpckldq_r2r(r0, r1); /* r1 = h2 g2 f2 e2 = R6 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
239 movq_m2r(*I(1), r5); /* r5 = b3 b2 b1 b0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
240 movq_r2r(r4, r0); /* r0 = a3 a2 a1 a0 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
241 movq_r2m(r6, *J(7)); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
242 punpcklwd_r2r(r5, r0); /* r0 = b1 a1 b0 a0 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
243 movq_r2m(r1, *J(6)); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
244 punpckhwd_r2r(r5, r4); /* r4 = b3 a3 b2 a2 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
245 movq_r2r(r2, r5); /* r5 = c3 c2 c1 c0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
246 punpcklwd_r2r(r3, r2); /* r2 = d1 c1 d0 c0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
247 movq_r2r(r0, r1); /* r1 = b1 a1 b0 a0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
248 punpckldq_r2r(r2, r0); /* r0 = d0 c0 b0 a0 = R0 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
249 punpckhdq_r2r(r2, r1); /* r1 = d1 c1 b1 a1 = R1 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
250 movq_r2r(r4, r2); /* r2 = b3 a3 b2 a2 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
251 movq_r2m(r0, *I(0)); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
252 punpckhwd_r2r(r3, r5); /* r5 = d3 c3 d2 c2 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
253 movq_r2m(r1, *I(1)); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
254 punpckhdq_r2r(r5, r4); /* r4 = d3 c3 b3 a3 = R3 */ \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
255 punpckldq_r2r(r5, r2); /* r2 = d2 c2 b2 a2 = R2 */ \
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
256 movq_r2m(r4, *I(3)); \
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
257 movq_r2m(r2, *I(2)); \
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
258 }
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
259
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
260 void vp3_dsp_init_mmx(void)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
261 {
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
262 int j = 16;
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
263 uint16_t *p;
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
264
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
265 do {
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
266 idct_constants[--j] = 0;
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
267 } while (j);
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
268
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
269 idct_constants[0] = idct_constants[5] =
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
270 idct_constants[10] = idct_constants[15] = 65535;
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
271
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
272 j = 1;
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
273 do {
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
274 p = idct_constants + ((j + 3) << 2);
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
275 p[0] = p[1] = p[2] = p[3] = idct_cosine_table[j - 1];
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
276 } while (++j <= 7);
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
277
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
278 idct_constants[44] = idct_constants[45] =
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
279 idct_constants[46] = idct_constants[47] = IdctAdjustBeforeShift;
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
280 }
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
281
1977
89422281f6f6 reorganize and simplify the VP3 IDCT stuff
melanson
parents: 1969
diff changeset
282 void vp3_idct_mmx(int16_t *input_data, int16_t *dequant_matrix,
89422281f6f6 reorganize and simplify the VP3 IDCT stuff
melanson
parents: 1969
diff changeset
283 int coeff_count, int16_t *output_data)
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
284 {
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
285 /* eax = quantized input
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
286 * ebx = dequantizer matrix
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
287 * ecx = IDCT constants
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
288 * M(I) = ecx + MaskOffset(0) + I * 8
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
289 * C(I) = ecx + CosineOffset(32) + (I-1) * 8
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
290 * edx = output
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
291 * r0..r7 = mm0..mm7
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
292 */
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
293
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
294 #define M(x) (idct_constants + x * 4)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
295 #define C(x) (idct_constants + 16 + (x - 1) * 4)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
296 #define Eight (idct_constants + 44)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
297
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
298 unsigned char *input_bytes = (unsigned char *)input_data;
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
299 unsigned char *dequant_matrix_bytes = (unsigned char *)dequant_matrix;
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
300 unsigned char *output_data_bytes = (unsigned char *)output_data;
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
301
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
302 movq_m2r(*(input_bytes), r0);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
303 pmullw_m2r(*(dequant_matrix_bytes), r0); /* r0 = 03 02 01 00 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
304 movq_m2r(*(input_bytes+16), r1);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
305 pmullw_m2r(*(dequant_matrix_bytes+16), r1); /* r1 = 13 12 11 10 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
306 movq_m2r(*M(0), r2); /* r2 = __ __ __ FF */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
307 movq_r2r(r0, r3); /* r3 = 03 02 01 00 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
308 movq_m2r(*(input_bytes+8), r4);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
309 psrlq_i2r(16, r0); /* r0 = __ 03 02 01 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
310 pmullw_m2r(*(dequant_matrix_bytes+8), r4); /* r4 = 07 06 05 04 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
311 pand_r2r(r2, r3); /* r3 = __ __ __ 00 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
312 movq_r2r(r0, r5); /* r5 = __ 03 02 01 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
313 movq_r2r(r1, r6); /* r6 = 13 12 11 10 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
314 pand_r2r(r2, r5); /* r5 = __ __ __ 01 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
315 psllq_i2r(32, r6); /* r6 = 11 10 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
316 movq_m2r(*M(3), r7); /* r7 = FF __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
317 pxor_r2r(r5, r0); /* r0 = __ 03 02 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
318 pand_r2r(r6, r7); /* r7 = 11 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
319 por_r2r(r3, r0); /* r0 = __ 03 02 00 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
320 pxor_r2r(r7, r6); /* r6 = __ 10 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
321 por_r2r(r7, r0); /* r0 = 11 03 02 00 = R0 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
322 movq_m2r(*M(3), r7); /* r7 = FF __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
323 movq_r2r(r4, r3); /* r3 = 07 06 05 04 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
324 movq_r2m(r0, *(output_data_bytes)); /* write R0 = r0 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
325 pand_r2r(r2, r3); /* r3 = __ __ __ 04 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
326 movq_m2r(*(input_bytes+32), r0);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
327 psllq_i2r(16, r3); /* r3 = __ __ 04 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
328 pmullw_m2r(*(dequant_matrix_bytes+32), r0); /* r0 = 23 22 21 20 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
329 pand_r2r(r1, r7); /* r7 = 13 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
330 por_r2r(r3, r5); /* r5 = __ __ 04 01 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
331 por_r2r(r6, r7); /* r7 = 13 10 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
332 movq_m2r(*(input_bytes+24), r3);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
333 por_r2r(r5, r7); /* r7 = 13 10 04 01 = R1 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
334 pmullw_m2r(*(dequant_matrix_bytes+24), r3); /* r3 = 17 16 15 14 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
335 psrlq_i2r(16, r4); /* r4 = __ 07 06 05 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
336 movq_r2m(r7, *(output_data_bytes+16)); /* write R1 = r7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
337 movq_r2r(r4, r5); /* r5 = __ 07 06 05 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
338 movq_r2r(r0, r7); /* r7 = 23 22 21 20 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
339 psrlq_i2r(16, r4); /* r4 = __ __ 07 06 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
340 psrlq_i2r(48, r7); /* r7 = __ __ __ 23 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
341 movq_r2r(r2, r6); /* r6 = __ __ __ FF */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
342 pand_r2r(r2, r5); /* r5 = __ __ __ 05 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
343 pand_r2r(r4, r6); /* r6 = __ __ __ 06 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
344 movq_r2m(r7, *(output_data_bytes+80)); /* partial R9 = __ __ __ 23 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
345 pxor_r2r(r6, r4); /* r4 = __ __ 07 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
346 psrlq_i2r(32, r1); /* r1 = __ __ 13 12 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
347 por_r2r(r5, r4); /* r4 = __ __ 07 05 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
348 movq_m2r(*M(3), r7); /* r7 = FF __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
349 pand_r2r(r2, r1); /* r1 = __ __ __ 12 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
350 movq_m2r(*(input_bytes+48), r5);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
351 psllq_i2r(16, r0); /* r0 = 22 21 20 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
352 pmullw_m2r(*(dequant_matrix_bytes+48), r5); /* r5 = 33 32 31 30 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
353 pand_r2r(r0, r7); /* r7 = 22 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
354 movq_r2m(r1, *(output_data_bytes+64)); /* partial R8 = __ __ __ 12 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
355 por_r2r(r4, r7); /* r7 = 22 __ 07 05 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
356 movq_r2r(r3, r4); /* r4 = 17 16 15 14 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
357 pand_r2r(r2, r3); /* r3 = __ __ __ 14 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
358 movq_m2r(*M(2), r1); /* r1 = __ FF __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
359 psllq_i2r(32, r3); /* r3 = __ 14 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
360 por_r2r(r3, r7); /* r7 = 22 14 07 05 = R2 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
361 movq_r2r(r5, r3); /* r3 = 33 32 31 30 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
362 psllq_i2r(48, r3); /* r3 = 30 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
363 pand_r2r(r0, r1); /* r1 = __ 21 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
364 movq_r2m(r7, *(output_data_bytes+32)); /* write R2 = r7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
365 por_r2r(r3, r6); /* r6 = 30 __ __ 06 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
366 movq_m2r(*M(1), r7); /* r7 = __ __ FF __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
367 por_r2r(r1, r6); /* r6 = 30 21 __ 06 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
368 movq_m2r(*(input_bytes+56), r1);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
369 pand_r2r(r4, r7); /* r7 = __ __ 15 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
370 pmullw_m2r(*(dequant_matrix_bytes+56), r1); /* r1 = 37 36 35 34 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
371 por_r2r(r6, r7); /* r7 = 30 21 15 06 = R3 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
372 pand_m2r(*M(1), r0); /* r0 = __ __ 20 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
373 psrlq_i2r(32, r4); /* r4 = __ __ 17 16 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
374 movq_r2m(r7, *(output_data_bytes+48)); /* write R3 = r7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
375 movq_r2r(r4, r6); /* r6 = __ __ 17 16 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
376 movq_m2r(*M(3), r7); /* r7 = FF __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
377 pand_r2r(r2, r4); /* r4 = __ __ __ 16 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
378 movq_m2r(*M(1), r3); /* r3 = __ __ FF __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
379 pand_r2r(r1, r7); /* r7 = 37 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
380 pand_r2r(r5, r3); /* r3 = __ __ 31 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
381 por_r2r(r4, r0); /* r0 = __ __ 20 16 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
382 psllq_i2r(16, r3); /* r3 = __ 31 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
383 por_r2r(r0, r7); /* r7 = 37 __ 20 16 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
384 movq_m2r(*M(2), r4); /* r4 = __ FF __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
385 por_r2r(r3, r7); /* r7 = 37 31 20 16 = R4 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
386 movq_m2r(*(input_bytes+80), r0);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
387 movq_r2r(r4, r3); /* r3 = __ __ FF __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
388 pmullw_m2r(*(dequant_matrix_bytes+80), r0); /* r0 = 53 52 51 50 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
389 pand_r2r(r5, r4); /* r4 = __ 32 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
390 movq_r2m(r7, *(output_data_bytes+8)); /* write R4 = r7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
391 por_r2r(r4, r6); /* r6 = __ 32 17 16 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
392 movq_r2r(r3, r4); /* r4 = __ FF __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
393 psrlq_i2r(16, r6); /* r6 = __ __ 32 17 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
394 movq_r2r(r0, r7); /* r7 = 53 52 51 50 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
395 pand_r2r(r1, r4); /* r4 = __ 36 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
396 psllq_i2r(48, r7); /* r7 = 50 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
397 por_r2r(r4, r6); /* r6 = __ 36 32 17 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
398 movq_m2r(*(input_bytes+88), r4);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
399 por_r2r(r6, r7); /* r7 = 50 36 32 17 = R5 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
400 pmullw_m2r(*(dequant_matrix_bytes+88), r4); /* r4 = 57 56 55 54 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
401 psrlq_i2r(16, r3); /* r3 = __ __ FF __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
402 movq_r2m(r7, *(output_data_bytes+24)); /* write R5 = r7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
403 pand_r2r(r1, r3); /* r3 = __ __ 35 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
404 psrlq_i2r(48, r5); /* r5 = __ __ __ 33 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
405 pand_r2r(r2, r1); /* r1 = __ __ __ 34 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
406 movq_m2r(*(input_bytes+104), r6);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
407 por_r2r(r3, r5); /* r5 = __ __ 35 33 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
408 pmullw_m2r(*(dequant_matrix_bytes+104), r6); /* r6 = 67 66 65 64 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
409 psrlq_i2r(16, r0); /* r0 = __ 53 52 51 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
410 movq_r2r(r4, r7); /* r7 = 57 56 55 54 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
411 movq_r2r(r2, r3); /* r3 = __ __ __ FF */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
412 psllq_i2r(48, r7); /* r7 = 54 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
413 pand_r2r(r0, r3); /* r3 = __ __ __ 51 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
414 pxor_r2r(r3, r0); /* r0 = __ 53 52 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
415 psllq_i2r(32, r3); /* r3 = __ 51 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
416 por_r2r(r5, r7); /* r7 = 54 __ 35 33 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
417 movq_r2r(r6, r5); /* r5 = 67 66 65 64 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
418 pand_m2r(*M(1), r6); /* r6 = __ __ 65 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
419 por_r2r(r3, r7); /* r7 = 54 51 35 33 = R6 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
420 psllq_i2r(32, r6); /* r6 = 65 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
421 por_r2r(r1, r0); /* r0 = __ 53 52 34 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
422 movq_r2m(r7, *(output_data_bytes+40)); /* write R6 = r7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
423 por_r2r(r6, r0); /* r0 = 65 53 52 34 = R7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
424 movq_m2r(*(input_bytes+120), r7);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
425 movq_r2r(r5, r6); /* r6 = 67 66 65 64 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
426 pmullw_m2r(*(dequant_matrix_bytes+120), r7); /* r7 = 77 76 75 74 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
427 psrlq_i2r(32, r5); /* r5 = __ __ 67 66 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
428 pand_r2r(r2, r6); /* r6 = __ __ __ 64 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
429 movq_r2r(r5, r1); /* r1 = __ __ 67 66 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
430 movq_r2m(r0, *(output_data_bytes+56)); /* write R7 = r0 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
431 pand_r2r(r2, r1); /* r1 = __ __ __ 66 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
432 movq_m2r(*(input_bytes+112), r0);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
433 movq_r2r(r7, r3); /* r3 = 77 76 75 74 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
434 pmullw_m2r(*(dequant_matrix_bytes+112), r0); /* r0 = 73 72 71 70 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
435 psllq_i2r(16, r3); /* r3 = 76 75 74 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
436 pand_m2r(*M(3), r7); /* r7 = 77 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
437 pxor_r2r(r1, r5); /* r5 = __ __ 67 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
438 por_r2r(r5, r6); /* r6 = __ __ 67 64 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
439 movq_r2r(r3, r5); /* r5 = 76 75 74 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
440 pand_m2r(*M(3), r5); /* r5 = 76 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
441 por_r2r(r1, r7); /* r7 = 77 __ __ 66 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
442 movq_m2r(*(input_bytes+96), r1);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
443 pxor_r2r(r5, r3); /* r3 = __ 75 74 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
444 pmullw_m2r(*(dequant_matrix_bytes+96), r1); /* r1 = 63 62 61 60 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
445 por_r2r(r3, r7); /* r7 = 77 75 74 66 = R15 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
446 por_r2r(r5, r6); /* r6 = 76 __ 67 64 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
447 movq_r2r(r0, r5); /* r5 = 73 72 71 70 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
448 movq_r2m(r7, *(output_data_bytes+120)); /* store R15 = r7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
449 psrlq_i2r(16, r5); /* r5 = __ 73 72 71 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
450 pand_m2r(*M(2), r5); /* r5 = __ 73 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
451 movq_r2r(r0, r7); /* r7 = 73 72 71 70 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
452 por_r2r(r5, r6); /* r6 = 76 73 67 64 = R14 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
453 pand_r2r(r2, r0); /* r0 = __ __ __ 70 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
454 pxor_r2r(r0, r7); /* r7 = 73 72 71 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
455 psllq_i2r(32, r0); /* r0 = __ 70 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
456 movq_r2m(r6, *(output_data_bytes+104)); /* write R14 = r6 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
457 psrlq_i2r(16, r4); /* r4 = __ 57 56 55 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
458 movq_m2r(*(input_bytes+72), r5);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
459 psllq_i2r(16, r7); /* r7 = 72 71 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
460 pmullw_m2r(*(dequant_matrix_bytes+72), r5); /* r5 = 47 46 45 44 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
461 movq_r2r(r7, r6); /* r6 = 72 71 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
462 movq_m2r(*M(2), r3); /* r3 = __ FF __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
463 psllq_i2r(16, r6); /* r6 = 71 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
464 pand_m2r(*M(3), r7); /* r7 = 72 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
465 pand_r2r(r1, r3); /* r3 = __ 62 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
466 por_r2r(r0, r7); /* r7 = 72 70 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
467 movq_r2r(r1, r0); /* r0 = 63 62 61 60 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
468 pand_m2r(*M(3), r1); /* r1 = 63 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
469 por_r2r(r3, r6); /* r6 = 71 62 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
470 movq_r2r(r4, r3); /* r3 = __ 57 56 55 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
471 psrlq_i2r(32, r1); /* r1 = __ __ 63 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
472 pand_r2r(r2, r3); /* r3 = __ __ __ 55 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
473 por_r2r(r1, r7); /* r7 = 72 70 63 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
474 por_r2r(r3, r7); /* r7 = 72 70 63 55 = R13 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
475 movq_r2r(r4, r3); /* r3 = __ 57 56 55 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
476 pand_m2r(*M(1), r3); /* r3 = __ __ 56 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
477 movq_r2r(r5, r1); /* r1 = 47 46 45 44 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
478 movq_r2m(r7, *(output_data_bytes+88)); /* write R13 = r7 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
479 psrlq_i2r(48, r5); /* r5 = __ __ __ 47 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
480 movq_m2r(*(input_bytes+64), r7);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
481 por_r2r(r3, r6); /* r6 = 71 62 56 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
482 pmullw_m2r(*(dequant_matrix_bytes+64), r7); /* r7 = 43 42 41 40 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
483 por_r2r(r5, r6); /* r6 = 71 62 56 47 = R12 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
484 pand_m2r(*M(2), r4); /* r4 = __ 57 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
485 psllq_i2r(32, r0); /* r0 = 61 60 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
486 movq_r2m(r6, *(output_data_bytes+72)); /* write R12 = r6 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
487 movq_r2r(r0, r6); /* r6 = 61 60 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
488 pand_m2r(*M(3), r0); /* r0 = 61 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
489 psllq_i2r(16, r6); /* r6 = 60 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
490 movq_m2r(*(input_bytes+40), r5);
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
491 movq_r2r(r1, r3); /* r3 = 47 46 45 44 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
492 pmullw_m2r(*(dequant_matrix_bytes+40), r5); /* r5 = 27 26 25 24 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
493 psrlq_i2r(16, r1); /* r1 = __ 47 46 45 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
494 pand_m2r(*M(1), r1); /* r1 = __ __ 46 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
495 por_r2r(r4, r0); /* r0 = 61 57 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
496 pand_r2r(r7, r2); /* r2 = __ __ __ 40 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
497 por_r2r(r1, r0); /* r0 = 61 57 46 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
498 por_r2r(r2, r0); /* r0 = 61 57 46 40 = R11 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
499 psllq_i2r(16, r3); /* r3 = 46 45 44 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
500 movq_r2r(r3, r4); /* r4 = 46 45 44 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
501 movq_r2r(r5, r2); /* r2 = 27 26 25 24 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
502 movq_r2m(r0, *(output_data_bytes+112)); /* write R11 = r0 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
503 psrlq_i2r(48, r2); /* r2 = __ __ __ 27 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
504 pand_m2r(*M(2), r4); /* r4 = __ 45 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
505 por_r2r(r2, r6); /* r6 = 60 __ __ 27 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
506 movq_m2r(*M(1), r2); /* r2 = __ __ FF __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
507 por_r2r(r4, r6); /* r6 = 60 45 __ 27 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
508 pand_r2r(r7, r2); /* r2 = __ __ 41 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
509 psllq_i2r(32, r3); /* r3 = 44 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
510 por_m2r(*(output_data_bytes+80), r3); /* r3 = 44 __ __ 23 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
511 por_r2r(r2, r6); /* r6 = 60 45 41 27 = R10 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
512 movq_m2r(*M(3), r2); /* r2 = FF __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
513 psllq_i2r(16, r5); /* r5 = 26 25 24 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
514 movq_r2m(r6, *(output_data_bytes+96)); /* store R10 = r6 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
515 pand_r2r(r5, r2); /* r2 = 26 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
516 movq_m2r(*M(2), r6); /* r6 = __ FF __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
517 pxor_r2r(r2, r5); /* r5 = __ 25 24 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
518 pand_r2r(r7, r6); /* r6 = __ 42 __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
519 psrlq_i2r(32, r2); /* r2 = __ __ 26 __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
520 pand_m2r(*M(3), r7); /* r7 = 43 __ __ __ */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
521 por_r2r(r2, r3); /* r3 = 44 __ 26 23 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
522 por_m2r(*(output_data_bytes+64), r7); /* r7 = 43 __ __ 12 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
523 por_r2r(r3, r6); /* r6 = 44 42 26 23 = R9 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
524 por_r2r(r5, r7); /* r7 = 43 25 24 12 = R8 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
525 movq_r2m(r6, *(output_data_bytes+80)); /* store R9 = r6 */
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
526 movq_r2m(r7, *(output_data_bytes+64)); /* store R8 = r7 */
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
527
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
528
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
529 #undef M
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
530
1969
56cb752222cc correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents: 1866
diff changeset
531 /* at this point, function has completed dequantization + dezigzag +
1866
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
532 * partial transposition; now do the idct itself */
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
533
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
534 #define I(K) (output_data + K * 8)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
535 #define J(K) (output_data + ((K - 4) * 8) + 4)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
536
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
537 RowIDCT();
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
538 Transpose();
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
539
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
540 #undef I
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
541 #undef J
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
542 #define I(K) (output_data + (K * 8) + 32)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
543 #define J(K) (output_data + ((K - 4) * 8) + 36)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
544
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
545 RowIDCT();
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
546 Transpose();
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
547
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
548 #undef I
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
549 #undef J
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
550 #define I(K) (output_data + K * 8)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
551 #define J(K) (output_data + K * 8)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
552
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
553 ColumnIDCT();
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
554
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
555 #undef I
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
556 #undef J
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
557 #define I(K) (output_data + (K * 8) + 4)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
558 #define J(K) (output_data + (K * 8) + 4)
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
559
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
560 ColumnIDCT();
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
561
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
562 #undef I
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
563 #undef J
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
564
1755f959ab7f seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff changeset
565 }