Mercurial > libavcodec.hg
annotate i386/vp3dsp_mmx.c @ 2504:f12657081093 libavcodec
INTRA PCM macroblocks support patch by (Loic )lll+ffmpeg m4x org)
This patch adds the support for INTRA PCM macroblocks in CAVLC and CABAC
mode, the deblocking needed a small modification and so did the
intra4x4_pred_mode prediction.
With this patch, the 5 streams of the conformance suite containing INTRA
PCM macroblocks now decode entirely, 4 are completely corrects, 1 is
incorrect since the first B slice because of deblocking in B slice not
yet implemented.
The code is not optimized for speed, it is not necessary IPCM
macroblocks are rare, but it could be optimized for code size, if
someone want to do this, feel free.
author | michael |
---|---|
date | Mon, 07 Feb 2005 00:10:28 +0000 |
parents | 89422281f6f6 |
children | 9699d325049d |
rev | line source |
---|---|
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
1 /* |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
2 * Copyright (C) 2004 the ffmpeg project |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
3 * |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
4 * This library is free software; you can redistribute it and/or |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
5 * modify it under the terms of the GNU Lesser General Public |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
6 * License as published by the Free Software Foundation; either |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
7 * version 2 of the License, or (at your option) any later version. |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
8 * |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
9 * This library is distributed in the hope that it will be useful, |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
10 * but WITHOUT ANY WARRANTY; without even the implied warranty of |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
12 * Lesser General Public License for more details. |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
13 * |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
14 * You should have received a copy of the GNU Lesser General Public |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
15 * License along with this library; if not, write to the Free Software |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
16 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
17 */ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
18 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
19 /** |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
20 * @file vp3dsp_mmx.c |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
21 * MMX-optimized functions cribbed from the original VP3 source code. |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
22 */ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
23 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
24 #include "../dsputil.h" |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
25 #include "mmx.h" |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
26 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
27 #define IdctAdjustBeforeShift 8 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
28 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
29 /* (12 * 4) 2-byte memory locations ( = 96 bytes total) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
30 * idct_constants[0..15] = Mask table (M(I)) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
31 * idct_constants[16..43] = Cosine table (C(I)) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
32 * idct_constants[44..47] = 8 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
33 */ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
34 static uint16_t idct_constants[(4 + 7 + 1) * 4]; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
35 static uint16_t idct_cosine_table[7] = { |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
36 64277, 60547, 54491, 46341, 36410, 25080, 12785 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
37 }; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
38 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
39 #define r0 mm0 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
40 #define r1 mm1 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
41 #define r2 mm2 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
42 #define r3 mm3 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
43 #define r4 mm4 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
44 #define r5 mm5 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
45 #define r6 mm6 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
46 #define r7 mm7 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
47 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
48 /* from original comments: The Macro does IDct on 4 1-D Dcts */ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
49 #define BeginIDCT() { \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
50 movq_m2r(*I(3), r2); \ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
51 movq_m2r(*C(3), r6); \ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
52 movq_r2r(r2, r4); \ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
53 movq_m2r(*J(5), r7); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
54 pmulhw_r2r(r6, r4); /* r4 = c3*i3 - i3 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
55 movq_m2r(*C(5), r1); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
56 pmulhw_r2r(r7, r6); /* r6 = c3*i5 - i5 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
57 movq_r2r(r1, r5); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
58 pmulhw_r2r(r2, r1); /* r1 = c5*i3 - i3 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
59 movq_m2r(*I(1), r3); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
60 pmulhw_r2r(r7, r5); /* r5 = c5*i5 - i5 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
61 movq_m2r(*C(1), r0); /* (all registers are in use) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
62 paddw_r2r(r2, r4); /* r4 = c3*i3 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
63 paddw_r2r(r7, r6); /* r6 = c3*i5 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
64 paddw_r2r(r1, r2); /* r2 = c5*i3 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
65 movq_m2r(*J(7), r1); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
66 paddw_r2r(r5, r7); /* r7 = c5*i5 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
67 movq_r2r(r0, r5); /* r5 = c1 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
68 pmulhw_r2r(r3, r0); /* r0 = c1*i1 - i1 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
69 paddsw_r2r(r7, r4); /* r4 = C = c3*i3 + c5*i5 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
70 pmulhw_r2r(r1, r5); /* r5 = c1*i7 - i7 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
71 movq_m2r(*C(7), r7); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
72 psubsw_r2r(r2, r6); /* r6 = D = c3*i5 - c5*i3 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
73 paddw_r2r(r3, r0); /* r0 = c1*i1 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
74 pmulhw_r2r(r7, r3); /* r3 = c7*i1 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
75 movq_m2r(*I(2), r2); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
76 pmulhw_r2r(r1, r7); /* r7 = c7*i7 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
77 paddw_r2r(r1, r5); /* r5 = c1*i7 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
78 movq_r2r(r2, r1); /* r1 = i2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
79 pmulhw_m2r(*C(2), r2); /* r2 = c2*i2 - i2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
80 psubsw_r2r(r5, r3); /* r3 = B = c7*i1 - c1*i7 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
81 movq_m2r(*J(6), r5); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
82 paddsw_r2r(r7, r0); /* r0 = A = c1*i1 + c7*i7 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
83 movq_r2r(r5, r7); /* r7 = i6 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
84 psubsw_r2r(r4, r0); /* r0 = A - C */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
85 pmulhw_m2r(*C(2), r5); /* r5 = c2*i6 - i6 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
86 paddw_r2r(r1, r2); /* r2 = c2*i2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
87 pmulhw_m2r(*C(6), r1); /* r1 = c6*i2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
88 paddsw_r2r(r4, r4); /* r4 = C + C */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
89 paddsw_r2r(r0, r4); /* r4 = C. = A + C */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
90 psubsw_r2r(r6, r3); /* r3 = B - D */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
91 paddw_r2r(r7, r5); /* r5 = c2*i6 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
92 paddsw_r2r(r6, r6); /* r6 = D + D */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
93 pmulhw_m2r(*C(6), r7); /* r7 = c6*i6 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
94 paddsw_r2r(r3, r6); /* r6 = D. = B + D */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
95 movq_r2m(r4, *I(1)); /* save C. at I(1) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
96 psubsw_r2r(r5, r1); /* r1 = H = c6*i2 - c2*i6 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
97 movq_m2r(*C(4), r4); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
98 movq_r2r(r3, r5); /* r5 = B - D */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
99 pmulhw_r2r(r4, r3); /* r3 = (c4 - 1) * (B - D) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
100 paddsw_r2r(r2, r7); /* r7 = G = c6*i6 + c2*i2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
101 movq_r2m(r6, *I(2)); /* save D. at I(2) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
102 movq_r2r(r0, r2); /* r2 = A - C */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
103 movq_m2r(*I(0), r6); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
104 pmulhw_r2r(r4, r0); /* r0 = (c4 - 1) * (A - C) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
105 paddw_r2r(r3, r5); /* r5 = B. = c4 * (B - D) */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
106 movq_m2r(*J(4), r3); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
107 psubsw_r2r(r1, r5); /* r5 = B.. = B. - H */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
108 paddw_r2r(r0, r2); /* r0 = A. = c4 * (A - C) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
109 psubsw_r2r(r3, r6); /* r6 = i0 - i4 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
110 movq_r2r(r6, r0); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
111 pmulhw_r2r(r4, r6); /* r6 = (c4 - 1) * (i0 - i4) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
112 paddsw_r2r(r3, r3); /* r3 = i4 + i4 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
113 paddsw_r2r(r1, r1); /* r1 = H + H */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
114 paddsw_r2r(r0, r3); /* r3 = i0 + i4 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
115 paddsw_r2r(r5, r1); /* r1 = H. = B + H */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
116 pmulhw_r2r(r3, r4); /* r4 = (c4 - 1) * (i0 + i4) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
117 paddsw_r2r(r0, r6); /* r6 = F = c4 * (i0 - i4) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
118 psubsw_r2r(r2, r6); /* r6 = F. = F - A. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
119 paddsw_r2r(r2, r2); /* r2 = A. + A. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
120 movq_m2r(*I(1), r0); /* r0 = C. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
121 paddsw_r2r(r6, r2); /* r2 = A.. = F + A. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
122 paddw_r2r(r3, r4); /* r4 = E = c4 * (i0 + i4) */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
123 psubsw_r2r(r1, r2); /* r2 = R2 = A.. - H. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
124 } |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
125 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
126 /* RowIDCT gets ready to transpose */ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
127 #define RowIDCT() { \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
128 \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
129 BeginIDCT(); \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
130 \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
131 movq_m2r(*I(2), r3); /* r3 = D. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
132 psubsw_r2r(r7, r4); /* r4 = E. = E - G */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
133 paddsw_r2r(r1, r1); /* r1 = H. + H. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
134 paddsw_r2r(r7, r7); /* r7 = G + G */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
135 paddsw_r2r(r2, r1); /* r1 = R1 = A.. + H. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
136 paddsw_r2r(r4, r7); /* r7 = G. = E + G */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
137 psubsw_r2r(r3, r4); /* r4 = R4 = E. - D. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
138 paddsw_r2r(r3, r3); \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
139 psubsw_r2r(r5, r6); /* r6 = R6 = F. - B.. */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
140 paddsw_r2r(r5, r5); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
141 paddsw_r2r(r4, r3); /* r3 = R3 = E. + D. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
142 paddsw_r2r(r6, r5); /* r5 = R5 = F. + B.. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
143 psubsw_r2r(r0, r7); /* r7 = R7 = G. - C. */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
144 paddsw_r2r(r0, r0); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
145 movq_r2m(r1, *I(1)); /* save R1 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
146 paddsw_r2r(r7, r0); /* r0 = R0 = G. + C. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
147 } |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
148 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
149 /* Column IDCT normalizes and stores final results */ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
150 #define ColumnIDCT() { \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
151 \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
152 BeginIDCT(); \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
153 \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
154 paddsw_m2r(*Eight, r2); /* adjust R2 (and R1) for shift */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
155 paddsw_r2r(r1, r1); /* r1 = H. + H. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
156 paddsw_r2r(r2, r1); /* r1 = R1 = A.. + H. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
157 psraw_i2r(4, r2); /* r2 = NR2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
158 psubsw_r2r(r7, r4); /* r4 = E. = E - G */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
159 psraw_i2r(4, r1); /* r1 = NR1 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
160 movq_m2r(*I(2), r3); /* r3 = D. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
161 paddsw_r2r(r7, r7); /* r7 = G + G */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
162 movq_r2m(r2, *I(2)); /* store NR2 at I2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
163 paddsw_r2r(r4, r7); /* r7 = G. = E + G */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
164 movq_r2m(r1, *I(1)); /* store NR1 at I1 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
165 psubsw_r2r(r3, r4); /* r4 = R4 = E. - D. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
166 paddsw_m2r(*Eight, r4); /* adjust R4 (and R3) for shift */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
167 paddsw_r2r(r3, r3); /* r3 = D. + D. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
168 paddsw_r2r(r4, r3); /* r3 = R3 = E. + D. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
169 psraw_i2r(4, r4); /* r4 = NR4 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
170 psubsw_r2r(r5, r6); /* r6 = R6 = F. - B.. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
171 psraw_i2r(4, r3); /* r3 = NR3 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
172 paddsw_m2r(*Eight, r6); /* adjust R6 (and R5) for shift */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
173 paddsw_r2r(r5, r5); /* r5 = B.. + B.. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
174 paddsw_r2r(r6, r5); /* r5 = R5 = F. + B.. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
175 psraw_i2r(4, r6); /* r6 = NR6 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
176 movq_r2m(r4, *J(4)); /* store NR4 at J4 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
177 psraw_i2r(4, r5); /* r5 = NR5 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
178 movq_r2m(r3, *I(3)); /* store NR3 at I3 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
179 psubsw_r2r(r0, r7); /* r7 = R7 = G. - C. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
180 paddsw_m2r(*Eight, r7); /* adjust R7 (and R0) for shift */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
181 paddsw_r2r(r0, r0); /* r0 = C. + C. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
182 paddsw_r2r(r7, r0); /* r0 = R0 = G. + C. */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
183 psraw_i2r(4, r7); /* r7 = NR7 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
184 movq_r2m(r6, *J(6)); /* store NR6 at J6 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
185 psraw_i2r(4, r0); /* r0 = NR0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
186 movq_r2m(r5, *J(5)); /* store NR5 at J5 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
187 movq_r2m(r7, *J(7)); /* store NR7 at J7 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
188 movq_r2m(r0, *I(0)); /* store NR0 at I0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
189 } |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
190 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
191 /* Following macro does two 4x4 transposes in place. |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
192 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
193 At entry (we assume): |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
194 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
195 r0 = a3 a2 a1 a0 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
196 I(1) = b3 b2 b1 b0 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
197 r2 = c3 c2 c1 c0 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
198 r3 = d3 d2 d1 d0 |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
199 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
200 r4 = e3 e2 e1 e0 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
201 r5 = f3 f2 f1 f0 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
202 r6 = g3 g2 g1 g0 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
203 r7 = h3 h2 h1 h0 |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
204 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
205 At exit, we have: |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
206 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
207 I(0) = d0 c0 b0 a0 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
208 I(1) = d1 c1 b1 a1 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
209 I(2) = d2 c2 b2 a2 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
210 I(3) = d3 c3 b3 a3 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
211 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
212 J(4) = h0 g0 f0 e0 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
213 J(5) = h1 g1 f1 e1 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
214 J(6) = h2 g2 f2 e2 |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
215 J(7) = h3 g3 f3 e3 |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
216 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
217 I(0) I(1) I(2) I(3) is the transpose of r0 I(1) r2 r3. |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
218 J(4) J(5) J(6) J(7) is the transpose of r4 r5 r6 r7. |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
219 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
220 Since r1 is free at entry, we calculate the Js first. */ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
221 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
222 #define Transpose() { \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
223 movq_r2r(r4, r1); /* r1 = e3 e2 e1 e0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
224 punpcklwd_r2r(r5, r4); /* r4 = f1 e1 f0 e0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
225 movq_r2m(r0, *I(0)); /* save a3 a2 a1 a0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
226 punpckhwd_r2r(r5, r1); /* r1 = f3 e3 f2 e2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
227 movq_r2r(r6, r0); /* r0 = g3 g2 g1 g0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
228 punpcklwd_r2r(r7, r6); /* r6 = h1 g1 h0 g0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
229 movq_r2r(r4, r5); /* r5 = f1 e1 f0 e0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
230 punpckldq_r2r(r6, r4); /* r4 = h0 g0 f0 e0 = R4 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
231 punpckhdq_r2r(r6, r5); /* r5 = h1 g1 f1 e1 = R5 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
232 movq_r2r(r1, r6); /* r6 = f3 e3 f2 e2 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
233 movq_r2m(r4, *J(4)); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
234 punpckhwd_r2r(r7, r0); /* r0 = h3 g3 h2 g2 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
235 movq_r2m(r5, *J(5)); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
236 punpckhdq_r2r(r0, r6); /* r6 = h3 g3 f3 e3 = R7 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
237 movq_m2r(*I(0), r4); /* r4 = a3 a2 a1 a0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
238 punpckldq_r2r(r0, r1); /* r1 = h2 g2 f2 e2 = R6 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
239 movq_m2r(*I(1), r5); /* r5 = b3 b2 b1 b0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
240 movq_r2r(r4, r0); /* r0 = a3 a2 a1 a0 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
241 movq_r2m(r6, *J(7)); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
242 punpcklwd_r2r(r5, r0); /* r0 = b1 a1 b0 a0 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
243 movq_r2m(r1, *J(6)); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
244 punpckhwd_r2r(r5, r4); /* r4 = b3 a3 b2 a2 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
245 movq_r2r(r2, r5); /* r5 = c3 c2 c1 c0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
246 punpcklwd_r2r(r3, r2); /* r2 = d1 c1 d0 c0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
247 movq_r2r(r0, r1); /* r1 = b1 a1 b0 a0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
248 punpckldq_r2r(r2, r0); /* r0 = d0 c0 b0 a0 = R0 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
249 punpckhdq_r2r(r2, r1); /* r1 = d1 c1 b1 a1 = R1 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
250 movq_r2r(r4, r2); /* r2 = b3 a3 b2 a2 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
251 movq_r2m(r0, *I(0)); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
252 punpckhwd_r2r(r3, r5); /* r5 = d3 c3 d2 c2 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
253 movq_r2m(r1, *I(1)); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
254 punpckhdq_r2r(r5, r4); /* r4 = d3 c3 b3 a3 = R3 */ \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
255 punpckldq_r2r(r5, r2); /* r2 = d2 c2 b2 a2 = R2 */ \ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
256 movq_r2m(r4, *I(3)); \ |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
257 movq_r2m(r2, *I(2)); \ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
258 } |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
259 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
260 void vp3_dsp_init_mmx(void) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
261 { |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
262 int j = 16; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
263 uint16_t *p; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
264 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
265 do { |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
266 idct_constants[--j] = 0; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
267 } while (j); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
268 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
269 idct_constants[0] = idct_constants[5] = |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
270 idct_constants[10] = idct_constants[15] = 65535; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
271 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
272 j = 1; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
273 do { |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
274 p = idct_constants + ((j + 3) << 2); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
275 p[0] = p[1] = p[2] = p[3] = idct_cosine_table[j - 1]; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
276 } while (++j <= 7); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
277 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
278 idct_constants[44] = idct_constants[45] = |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
279 idct_constants[46] = idct_constants[47] = IdctAdjustBeforeShift; |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
280 } |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
281 |
1977 | 282 void vp3_idct_mmx(int16_t *input_data, int16_t *dequant_matrix, |
283 int coeff_count, int16_t *output_data) | |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
284 { |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
285 /* eax = quantized input |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
286 * ebx = dequantizer matrix |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
287 * ecx = IDCT constants |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
288 * M(I) = ecx + MaskOffset(0) + I * 8 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
289 * C(I) = ecx + CosineOffset(32) + (I-1) * 8 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
290 * edx = output |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
291 * r0..r7 = mm0..mm7 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
292 */ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
293 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
294 #define M(x) (idct_constants + x * 4) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
295 #define C(x) (idct_constants + 16 + (x - 1) * 4) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
296 #define Eight (idct_constants + 44) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
297 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
298 unsigned char *input_bytes = (unsigned char *)input_data; |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
299 unsigned char *dequant_matrix_bytes = (unsigned char *)dequant_matrix; |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
300 unsigned char *output_data_bytes = (unsigned char *)output_data; |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
301 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
302 movq_m2r(*(input_bytes), r0); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
303 pmullw_m2r(*(dequant_matrix_bytes), r0); /* r0 = 03 02 01 00 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
304 movq_m2r(*(input_bytes+16), r1); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
305 pmullw_m2r(*(dequant_matrix_bytes+16), r1); /* r1 = 13 12 11 10 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
306 movq_m2r(*M(0), r2); /* r2 = __ __ __ FF */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
307 movq_r2r(r0, r3); /* r3 = 03 02 01 00 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
308 movq_m2r(*(input_bytes+8), r4); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
309 psrlq_i2r(16, r0); /* r0 = __ 03 02 01 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
310 pmullw_m2r(*(dequant_matrix_bytes+8), r4); /* r4 = 07 06 05 04 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
311 pand_r2r(r2, r3); /* r3 = __ __ __ 00 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
312 movq_r2r(r0, r5); /* r5 = __ 03 02 01 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
313 movq_r2r(r1, r6); /* r6 = 13 12 11 10 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
314 pand_r2r(r2, r5); /* r5 = __ __ __ 01 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
315 psllq_i2r(32, r6); /* r6 = 11 10 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
316 movq_m2r(*M(3), r7); /* r7 = FF __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
317 pxor_r2r(r5, r0); /* r0 = __ 03 02 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
318 pand_r2r(r6, r7); /* r7 = 11 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
319 por_r2r(r3, r0); /* r0 = __ 03 02 00 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
320 pxor_r2r(r7, r6); /* r6 = __ 10 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
321 por_r2r(r7, r0); /* r0 = 11 03 02 00 = R0 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
322 movq_m2r(*M(3), r7); /* r7 = FF __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
323 movq_r2r(r4, r3); /* r3 = 07 06 05 04 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
324 movq_r2m(r0, *(output_data_bytes)); /* write R0 = r0 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
325 pand_r2r(r2, r3); /* r3 = __ __ __ 04 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
326 movq_m2r(*(input_bytes+32), r0); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
327 psllq_i2r(16, r3); /* r3 = __ __ 04 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
328 pmullw_m2r(*(dequant_matrix_bytes+32), r0); /* r0 = 23 22 21 20 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
329 pand_r2r(r1, r7); /* r7 = 13 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
330 por_r2r(r3, r5); /* r5 = __ __ 04 01 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
331 por_r2r(r6, r7); /* r7 = 13 10 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
332 movq_m2r(*(input_bytes+24), r3); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
333 por_r2r(r5, r7); /* r7 = 13 10 04 01 = R1 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
334 pmullw_m2r(*(dequant_matrix_bytes+24), r3); /* r3 = 17 16 15 14 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
335 psrlq_i2r(16, r4); /* r4 = __ 07 06 05 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
336 movq_r2m(r7, *(output_data_bytes+16)); /* write R1 = r7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
337 movq_r2r(r4, r5); /* r5 = __ 07 06 05 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
338 movq_r2r(r0, r7); /* r7 = 23 22 21 20 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
339 psrlq_i2r(16, r4); /* r4 = __ __ 07 06 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
340 psrlq_i2r(48, r7); /* r7 = __ __ __ 23 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
341 movq_r2r(r2, r6); /* r6 = __ __ __ FF */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
342 pand_r2r(r2, r5); /* r5 = __ __ __ 05 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
343 pand_r2r(r4, r6); /* r6 = __ __ __ 06 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
344 movq_r2m(r7, *(output_data_bytes+80)); /* partial R9 = __ __ __ 23 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
345 pxor_r2r(r6, r4); /* r4 = __ __ 07 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
346 psrlq_i2r(32, r1); /* r1 = __ __ 13 12 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
347 por_r2r(r5, r4); /* r4 = __ __ 07 05 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
348 movq_m2r(*M(3), r7); /* r7 = FF __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
349 pand_r2r(r2, r1); /* r1 = __ __ __ 12 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
350 movq_m2r(*(input_bytes+48), r5); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
351 psllq_i2r(16, r0); /* r0 = 22 21 20 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
352 pmullw_m2r(*(dequant_matrix_bytes+48), r5); /* r5 = 33 32 31 30 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
353 pand_r2r(r0, r7); /* r7 = 22 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
354 movq_r2m(r1, *(output_data_bytes+64)); /* partial R8 = __ __ __ 12 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
355 por_r2r(r4, r7); /* r7 = 22 __ 07 05 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
356 movq_r2r(r3, r4); /* r4 = 17 16 15 14 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
357 pand_r2r(r2, r3); /* r3 = __ __ __ 14 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
358 movq_m2r(*M(2), r1); /* r1 = __ FF __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
359 psllq_i2r(32, r3); /* r3 = __ 14 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
360 por_r2r(r3, r7); /* r7 = 22 14 07 05 = R2 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
361 movq_r2r(r5, r3); /* r3 = 33 32 31 30 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
362 psllq_i2r(48, r3); /* r3 = 30 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
363 pand_r2r(r0, r1); /* r1 = __ 21 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
364 movq_r2m(r7, *(output_data_bytes+32)); /* write R2 = r7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
365 por_r2r(r3, r6); /* r6 = 30 __ __ 06 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
366 movq_m2r(*M(1), r7); /* r7 = __ __ FF __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
367 por_r2r(r1, r6); /* r6 = 30 21 __ 06 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
368 movq_m2r(*(input_bytes+56), r1); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
369 pand_r2r(r4, r7); /* r7 = __ __ 15 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
370 pmullw_m2r(*(dequant_matrix_bytes+56), r1); /* r1 = 37 36 35 34 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
371 por_r2r(r6, r7); /* r7 = 30 21 15 06 = R3 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
372 pand_m2r(*M(1), r0); /* r0 = __ __ 20 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
373 psrlq_i2r(32, r4); /* r4 = __ __ 17 16 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
374 movq_r2m(r7, *(output_data_bytes+48)); /* write R3 = r7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
375 movq_r2r(r4, r6); /* r6 = __ __ 17 16 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
376 movq_m2r(*M(3), r7); /* r7 = FF __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
377 pand_r2r(r2, r4); /* r4 = __ __ __ 16 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
378 movq_m2r(*M(1), r3); /* r3 = __ __ FF __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
379 pand_r2r(r1, r7); /* r7 = 37 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
380 pand_r2r(r5, r3); /* r3 = __ __ 31 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
381 por_r2r(r4, r0); /* r0 = __ __ 20 16 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
382 psllq_i2r(16, r3); /* r3 = __ 31 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
383 por_r2r(r0, r7); /* r7 = 37 __ 20 16 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
384 movq_m2r(*M(2), r4); /* r4 = __ FF __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
385 por_r2r(r3, r7); /* r7 = 37 31 20 16 = R4 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
386 movq_m2r(*(input_bytes+80), r0); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
387 movq_r2r(r4, r3); /* r3 = __ __ FF __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
388 pmullw_m2r(*(dequant_matrix_bytes+80), r0); /* r0 = 53 52 51 50 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
389 pand_r2r(r5, r4); /* r4 = __ 32 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
390 movq_r2m(r7, *(output_data_bytes+8)); /* write R4 = r7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
391 por_r2r(r4, r6); /* r6 = __ 32 17 16 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
392 movq_r2r(r3, r4); /* r4 = __ FF __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
393 psrlq_i2r(16, r6); /* r6 = __ __ 32 17 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
394 movq_r2r(r0, r7); /* r7 = 53 52 51 50 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
395 pand_r2r(r1, r4); /* r4 = __ 36 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
396 psllq_i2r(48, r7); /* r7 = 50 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
397 por_r2r(r4, r6); /* r6 = __ 36 32 17 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
398 movq_m2r(*(input_bytes+88), r4); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
399 por_r2r(r6, r7); /* r7 = 50 36 32 17 = R5 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
400 pmullw_m2r(*(dequant_matrix_bytes+88), r4); /* r4 = 57 56 55 54 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
401 psrlq_i2r(16, r3); /* r3 = __ __ FF __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
402 movq_r2m(r7, *(output_data_bytes+24)); /* write R5 = r7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
403 pand_r2r(r1, r3); /* r3 = __ __ 35 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
404 psrlq_i2r(48, r5); /* r5 = __ __ __ 33 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
405 pand_r2r(r2, r1); /* r1 = __ __ __ 34 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
406 movq_m2r(*(input_bytes+104), r6); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
407 por_r2r(r3, r5); /* r5 = __ __ 35 33 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
408 pmullw_m2r(*(dequant_matrix_bytes+104), r6); /* r6 = 67 66 65 64 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
409 psrlq_i2r(16, r0); /* r0 = __ 53 52 51 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
410 movq_r2r(r4, r7); /* r7 = 57 56 55 54 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
411 movq_r2r(r2, r3); /* r3 = __ __ __ FF */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
412 psllq_i2r(48, r7); /* r7 = 54 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
413 pand_r2r(r0, r3); /* r3 = __ __ __ 51 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
414 pxor_r2r(r3, r0); /* r0 = __ 53 52 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
415 psllq_i2r(32, r3); /* r3 = __ 51 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
416 por_r2r(r5, r7); /* r7 = 54 __ 35 33 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
417 movq_r2r(r6, r5); /* r5 = 67 66 65 64 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
418 pand_m2r(*M(1), r6); /* r6 = __ __ 65 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
419 por_r2r(r3, r7); /* r7 = 54 51 35 33 = R6 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
420 psllq_i2r(32, r6); /* r6 = 65 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
421 por_r2r(r1, r0); /* r0 = __ 53 52 34 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
422 movq_r2m(r7, *(output_data_bytes+40)); /* write R6 = r7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
423 por_r2r(r6, r0); /* r0 = 65 53 52 34 = R7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
424 movq_m2r(*(input_bytes+120), r7); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
425 movq_r2r(r5, r6); /* r6 = 67 66 65 64 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
426 pmullw_m2r(*(dequant_matrix_bytes+120), r7); /* r7 = 77 76 75 74 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
427 psrlq_i2r(32, r5); /* r5 = __ __ 67 66 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
428 pand_r2r(r2, r6); /* r6 = __ __ __ 64 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
429 movq_r2r(r5, r1); /* r1 = __ __ 67 66 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
430 movq_r2m(r0, *(output_data_bytes+56)); /* write R7 = r0 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
431 pand_r2r(r2, r1); /* r1 = __ __ __ 66 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
432 movq_m2r(*(input_bytes+112), r0); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
433 movq_r2r(r7, r3); /* r3 = 77 76 75 74 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
434 pmullw_m2r(*(dequant_matrix_bytes+112), r0); /* r0 = 73 72 71 70 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
435 psllq_i2r(16, r3); /* r3 = 76 75 74 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
436 pand_m2r(*M(3), r7); /* r7 = 77 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
437 pxor_r2r(r1, r5); /* r5 = __ __ 67 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
438 por_r2r(r5, r6); /* r6 = __ __ 67 64 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
439 movq_r2r(r3, r5); /* r5 = 76 75 74 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
440 pand_m2r(*M(3), r5); /* r5 = 76 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
441 por_r2r(r1, r7); /* r7 = 77 __ __ 66 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
442 movq_m2r(*(input_bytes+96), r1); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
443 pxor_r2r(r5, r3); /* r3 = __ 75 74 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
444 pmullw_m2r(*(dequant_matrix_bytes+96), r1); /* r1 = 63 62 61 60 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
445 por_r2r(r3, r7); /* r7 = 77 75 74 66 = R15 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
446 por_r2r(r5, r6); /* r6 = 76 __ 67 64 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
447 movq_r2r(r0, r5); /* r5 = 73 72 71 70 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
448 movq_r2m(r7, *(output_data_bytes+120)); /* store R15 = r7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
449 psrlq_i2r(16, r5); /* r5 = __ 73 72 71 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
450 pand_m2r(*M(2), r5); /* r5 = __ 73 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
451 movq_r2r(r0, r7); /* r7 = 73 72 71 70 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
452 por_r2r(r5, r6); /* r6 = 76 73 67 64 = R14 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
453 pand_r2r(r2, r0); /* r0 = __ __ __ 70 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
454 pxor_r2r(r0, r7); /* r7 = 73 72 71 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
455 psllq_i2r(32, r0); /* r0 = __ 70 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
456 movq_r2m(r6, *(output_data_bytes+104)); /* write R14 = r6 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
457 psrlq_i2r(16, r4); /* r4 = __ 57 56 55 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
458 movq_m2r(*(input_bytes+72), r5); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
459 psllq_i2r(16, r7); /* r7 = 72 71 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
460 pmullw_m2r(*(dequant_matrix_bytes+72), r5); /* r5 = 47 46 45 44 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
461 movq_r2r(r7, r6); /* r6 = 72 71 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
462 movq_m2r(*M(2), r3); /* r3 = __ FF __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
463 psllq_i2r(16, r6); /* r6 = 71 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
464 pand_m2r(*M(3), r7); /* r7 = 72 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
465 pand_r2r(r1, r3); /* r3 = __ 62 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
466 por_r2r(r0, r7); /* r7 = 72 70 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
467 movq_r2r(r1, r0); /* r0 = 63 62 61 60 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
468 pand_m2r(*M(3), r1); /* r1 = 63 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
469 por_r2r(r3, r6); /* r6 = 71 62 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
470 movq_r2r(r4, r3); /* r3 = __ 57 56 55 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
471 psrlq_i2r(32, r1); /* r1 = __ __ 63 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
472 pand_r2r(r2, r3); /* r3 = __ __ __ 55 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
473 por_r2r(r1, r7); /* r7 = 72 70 63 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
474 por_r2r(r3, r7); /* r7 = 72 70 63 55 = R13 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
475 movq_r2r(r4, r3); /* r3 = __ 57 56 55 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
476 pand_m2r(*M(1), r3); /* r3 = __ __ 56 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
477 movq_r2r(r5, r1); /* r1 = 47 46 45 44 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
478 movq_r2m(r7, *(output_data_bytes+88)); /* write R13 = r7 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
479 psrlq_i2r(48, r5); /* r5 = __ __ __ 47 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
480 movq_m2r(*(input_bytes+64), r7); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
481 por_r2r(r3, r6); /* r6 = 71 62 56 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
482 pmullw_m2r(*(dequant_matrix_bytes+64), r7); /* r7 = 43 42 41 40 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
483 por_r2r(r5, r6); /* r6 = 71 62 56 47 = R12 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
484 pand_m2r(*M(2), r4); /* r4 = __ 57 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
485 psllq_i2r(32, r0); /* r0 = 61 60 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
486 movq_r2m(r6, *(output_data_bytes+72)); /* write R12 = r6 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
487 movq_r2r(r0, r6); /* r6 = 61 60 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
488 pand_m2r(*M(3), r0); /* r0 = 61 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
489 psllq_i2r(16, r6); /* r6 = 60 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
490 movq_m2r(*(input_bytes+40), r5); |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
491 movq_r2r(r1, r3); /* r3 = 47 46 45 44 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
492 pmullw_m2r(*(dequant_matrix_bytes+40), r5); /* r5 = 27 26 25 24 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
493 psrlq_i2r(16, r1); /* r1 = __ 47 46 45 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
494 pand_m2r(*M(1), r1); /* r1 = __ __ 46 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
495 por_r2r(r4, r0); /* r0 = 61 57 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
496 pand_r2r(r7, r2); /* r2 = __ __ __ 40 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
497 por_r2r(r1, r0); /* r0 = 61 57 46 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
498 por_r2r(r2, r0); /* r0 = 61 57 46 40 = R11 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
499 psllq_i2r(16, r3); /* r3 = 46 45 44 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
500 movq_r2r(r3, r4); /* r4 = 46 45 44 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
501 movq_r2r(r5, r2); /* r2 = 27 26 25 24 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
502 movq_r2m(r0, *(output_data_bytes+112)); /* write R11 = r0 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
503 psrlq_i2r(48, r2); /* r2 = __ __ __ 27 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
504 pand_m2r(*M(2), r4); /* r4 = __ 45 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
505 por_r2r(r2, r6); /* r6 = 60 __ __ 27 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
506 movq_m2r(*M(1), r2); /* r2 = __ __ FF __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
507 por_r2r(r4, r6); /* r6 = 60 45 __ 27 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
508 pand_r2r(r7, r2); /* r2 = __ __ 41 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
509 psllq_i2r(32, r3); /* r3 = 44 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
510 por_m2r(*(output_data_bytes+80), r3); /* r3 = 44 __ __ 23 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
511 por_r2r(r2, r6); /* r6 = 60 45 41 27 = R10 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
512 movq_m2r(*M(3), r2); /* r2 = FF __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
513 psllq_i2r(16, r5); /* r5 = 26 25 24 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
514 movq_r2m(r6, *(output_data_bytes+96)); /* store R10 = r6 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
515 pand_r2r(r5, r2); /* r2 = 26 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
516 movq_m2r(*M(2), r6); /* r6 = __ FF __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
517 pxor_r2r(r2, r5); /* r5 = __ 25 24 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
518 pand_r2r(r7, r6); /* r6 = __ 42 __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
519 psrlq_i2r(32, r2); /* r2 = __ __ 26 __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
520 pand_m2r(*M(3), r7); /* r7 = 43 __ __ __ */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
521 por_r2r(r2, r3); /* r3 = 44 __ 26 23 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
522 por_m2r(*(output_data_bytes+64), r7); /* r7 = 43 __ __ 12 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
523 por_r2r(r3, r6); /* r6 = 44 42 26 23 = R9 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
524 por_r2r(r5, r7); /* r7 = 43 25 24 12 = R8 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
525 movq_r2m(r6, *(output_data_bytes+80)); /* store R9 = r6 */ |
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
526 movq_r2m(r7, *(output_data_bytes+64)); /* store R8 = r7 */ |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
527 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
528 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
529 #undef M |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
530 |
1969
56cb752222cc
correct MMX-optimized variant of VP3 IDCT, with comments (thank you
melanson
parents:
1866
diff
changeset
|
531 /* at this point, function has completed dequantization + dezigzag + |
1866
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
532 * partial transposition; now do the idct itself */ |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
533 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
534 #define I(K) (output_data + K * 8) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
535 #define J(K) (output_data + ((K - 4) * 8) + 4) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
536 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
537 RowIDCT(); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
538 Transpose(); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
539 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
540 #undef I |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
541 #undef J |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
542 #define I(K) (output_data + (K * 8) + 32) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
543 #define J(K) (output_data + ((K - 4) * 8) + 36) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
544 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
545 RowIDCT(); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
546 Transpose(); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
547 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
548 #undef I |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
549 #undef J |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
550 #define I(K) (output_data + K * 8) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
551 #define J(K) (output_data + K * 8) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
552 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
553 ColumnIDCT(); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
554 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
555 #undef I |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
556 #undef J |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
557 #define I(K) (output_data + (K * 8) + 4) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
558 #define J(K) (output_data + (K * 8) + 4) |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
559 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
560 ColumnIDCT(); |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
561 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
562 #undef I |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
563 #undef J |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
564 |
1755f959ab7f
seperated out the C-based VP3 DSP functions into a different file; also
melanson
parents:
diff
changeset
|
565 } |