Mercurial > mplayer.hg
annotate mp3lib/dct64_sse.c @ 18932:69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
Currently only used on CPUs that _only_ support SSE (otherwise try 3DNow* before)
Patch by The Mighty Zuxy Meng %zuxy * meng $ gmail * com%
Original thread:
Date: Jun 21, 2006 10:20 AM
Subject: [MPlayer-dev-eng] [PATCH] SSE version of DCT64 for mp3lib
author | gpoirier |
---|---|
date | Fri, 07 Jul 2006 14:04:07 +0000 |
parents | |
children | c7d3523c74ee |
rev | line source |
---|---|
18932
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
1 /* |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
2 * Discrete Cosine Tansform (DCT) for SSE |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
3 * Copyright (c) 2006 Zuxy MENG <zuxy.meng@gmail.com> |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
4 * based upon code from mp3lib/dct64.c, mp3lib/dct64_altivec.c |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
5 * and mp3lib/dct64_MMX.c |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
6 */ |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
7 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
8 /* NOTE: The following code is suboptimal! It can be improved (at least) by |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
9 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
10 1. Replace all movups by movaps. (Can Parameter c be always aligned on |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
11 a 16-byte boundary?) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
12 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
13 2. Rewritten using intrinsics. (GCC generally optimizes intrinsics |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
14 better. However, when __m128 locals are involved, GCC may |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
15 produce bad code that uses movaps to access a stack not aligned |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
16 on a 16-byte boundary, which leads to run-time crashes.) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
17 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
18 */ |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
19 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
20 typedef float real; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
21 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
22 extern float __attribute__((aligned(16))) costab_mmx[]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
23 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
24 static const int ppnn[4] __attribute__((aligned(16))) = |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
25 { 0, 0, 1 << 31, 1 << 31 }; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
26 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
27 static const int pnpn[4] __attribute__((aligned(16))) = |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
28 { 0, 1 << 31, 0, 1 << 31 }; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
29 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
30 static const int nnnn[4] __attribute__((aligned(16))) = |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
31 { 1 << 31, 1 << 31, 1 << 31, 1 << 31 }; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
32 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
33 void dct64_sse(real *a,real *b,real *c) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
34 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
35 static real __attribute__ ((aligned(16))) b1[0x20]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
36 static real __attribute__ ((aligned(16))) b2[0x20]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
37 static real const one = 1.f; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
38 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
39 short *out0 = (short*)a; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
40 short *out1 = (short*)b; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
41 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
42 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
43 real *costab = costab_mmx; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
44 int i; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
45 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
46 for (i = 0; i < 0x20 / 2; i += 4) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
47 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
48 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
49 "movaps %2, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
50 "shufps $27, %%xmm3, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
51 "movups %3, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
52 "movaps %%xmm1, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
53 "movups %4, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
54 "shufps $27, %%xmm4, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
55 "movaps %%xmm2, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
56 "shufps $27, %%xmm0, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
57 "addps %%xmm0, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
58 "movaps %%xmm1, %0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
59 "subps %%xmm2, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
60 "mulps %%xmm3, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
61 "movaps %%xmm4, %1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
62 :"=m"(*(b1 + i)), "=m"(*(b1 + 0x1c - i)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
63 :"m"(*(costab + i)), "m"(*(c + i)), "m"(*(c + 0x1c - i)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
64 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
65 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
66 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
67 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
68 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
69 int i; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
70 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
71 for (i = 0; i < 0x20; i += 0x10) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
72 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
73 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
74 "movaps %4, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
75 "movaps %5, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
76 "movaps %6, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
77 "movaps %7, %%xmm6\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
78 "movaps %%xmm1, %%xmm7\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
79 "shufps $27, %%xmm7, %%xmm7\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
80 "movaps %%xmm3, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
81 "shufps $27, %%xmm5, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
82 "movaps %%xmm4, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
83 "shufps $27, %%xmm2, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
84 "movaps %%xmm6, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
85 "shufps $27, %%xmm0, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
86 "addps %%xmm0, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
87 "movaps %%xmm1, %0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
88 "addps %%xmm2, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
89 "movaps %%xmm3, %1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
90 "subps %%xmm4, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
91 "movaps %%xmm5, %2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
92 "subps %%xmm6, %%xmm7\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
93 "movaps %%xmm7, %3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
94 :"=m"(*(b2 + i)), "=m"(*(b2 + i + 4)), "=m"(*(b2 + i + 8)), "=m"(*(b2 + i + 12)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
95 :"m"(*(b1 + i)), "m"(*(b1 + i + 4)), "m"(*(b1 + i + 8)), "m"(*(b1 + i + 12)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
96 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
97 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
98 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
99 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
100 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
101 real *costab = costab_mmx + 16; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
102 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
103 "movaps %4, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
104 "movaps %5, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
105 "movaps %8, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
106 "xorps %%xmm6, %%xmm6\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
107 "shufps $27, %%xmm4, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
108 "mulps %%xmm4, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
109 "movaps %9, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
110 "xorps %%xmm7, %%xmm7\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
111 "shufps $27, %%xmm2, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
112 "mulps %%xmm2, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
113 "movaps %%xmm0, %0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
114 "movaps %%xmm1, %1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
115 "movaps %6, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
116 "mulps %%xmm2, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
117 "subps %%xmm3, %%xmm6\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
118 "movaps %%xmm6, %2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
119 "movaps %7, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
120 "mulps %%xmm4, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
121 "subps %%xmm5, %%xmm7\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
122 "movaps %%xmm7, %3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
123 :"=m"(*(b2 + 8)), "=m"(*(b2 + 0xc)), "=m"(*(b2 + 0x18)), "=m"(*(b2 + 0x1c)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
124 :"m"(*(b2 + 8)), "m"(*(b2 + 0xc)), "m"(*(b2 + 0x18)), "m"(*(b2 + 0x1c)), "m"(*costab), "m"(*(costab + 4)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
125 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
126 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
127 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
128 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
129 real *costab = costab_mmx + 24; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
130 int i; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
131 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
132 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
133 "movaps %0, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
134 "shufps $27, %%xmm0, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
135 "movaps %1, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
136 "movaps %%xmm5, %%xmm6\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
137 : |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
138 :"m"(*costab), "m"(*nnnn) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
139 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
140 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
141 for (i = 0; i < 0x20; i += 8) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
142 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
143 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
144 "movaps %2, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
145 "movaps %3, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
146 "movaps %%xmm2, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
147 "xorps %%xmm5, %%xmm6\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
148 "shufps $27, %%xmm4, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
149 "movaps %%xmm3, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
150 "shufps $27, %%xmm1, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
151 "addps %%xmm1, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
152 "movaps %%xmm2, %0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
153 "subps %%xmm3, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
154 "xorps %%xmm6, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
155 "mulps %%xmm0, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
156 "movaps %%xmm4, %1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
157 :"=m"(*(b1 + i)), "=m"(*(b1 + i + 4)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
158 :"m"(*(b2 + i)), "m"(*(b2 + i + 4)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
159 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
160 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
161 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
162 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
163 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
164 int i; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
165 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
166 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
167 "movss %0, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
168 "movss %1, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
169 "movaps %%xmm1, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
170 "unpcklps %%xmm0, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
171 "movss %2, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
172 "movaps %%xmm1, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
173 "unpcklps %%xmm2, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
174 "unpcklps %%xmm3, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
175 "movaps %3, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
176 : |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
177 :"m"(one), "m"(costab_mmx[28]), "m"(costab_mmx[29]), "m"(*ppnn) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
178 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
179 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
180 for (i = 0; i < 0x20; i += 8) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
181 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
182 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
183 "movaps %2, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
184 "movaps %%xmm3, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
185 "shufps $20, %%xmm4, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
186 "shufps $235, %%xmm3, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
187 "xorps %%xmm2, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
188 "addps %%xmm3, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
189 "mulps %%xmm0, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
190 "movaps %%xmm4, %0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
191 "movaps %3, %%xmm6\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
192 "movaps %%xmm6, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
193 "shufps $27, %%xmm5, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
194 "xorps %%xmm2, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
195 "addps %%xmm5, %%xmm6\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
196 "mulps %%xmm0, %%xmm6\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
197 "movaps %%xmm6, %1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
198 :"=m"(*(b2 + i)), "=m"(*(b2 + i + 4)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
199 :"m"(*(b1 + i)), "m"(*(b1 + i + 4)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
200 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
201 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
202 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
203 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
204 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
205 int i; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
206 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
207 "movss %0, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
208 "movaps %%xmm1, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
209 "movaps %%xmm0, %%xmm7\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
210 "unpcklps %%xmm1, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
211 "unpcklps %%xmm0, %%xmm7\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
212 "movaps %1, %%xmm0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
213 "unpcklps %%xmm7, %%xmm2\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
214 : |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
215 :"m"(costab_mmx[30]), "m"(*pnpn) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
216 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
217 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
218 for (i = 0x8; i < 0x20; i += 8) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
219 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
220 asm volatile ( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
221 "movaps %2, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
222 "movaps %%xmm1, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
223 "shufps $224, %%xmm3, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
224 "shufps $181, %%xmm1, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
225 "xorps %%xmm0, %%xmm1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
226 "addps %%xmm1, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
227 "mulps %%xmm2, %%xmm3\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
228 "movaps %%xmm3, %0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
229 "movaps %3, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
230 "movaps %%xmm4, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
231 "shufps $224, %%xmm5, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
232 "shufps $181, %%xmm4, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
233 "xorps %%xmm0, %%xmm4\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
234 "addps %%xmm4, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
235 "mulps %%xmm2, %%xmm5\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
236 "movaps %%xmm5, %1\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
237 :"=m"(*(b1 + i)), "=m"(*(b1 + i + 4)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
238 :"m"(*(b2 + i)), "m"(*(b2 + i + 4)) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
239 :"memory" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
240 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
241 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
242 for (i = 0x8; i < 0x20; i += 8) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
243 { |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
244 b1[i + 2] += b1[i + 3]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
245 b1[i + 6] += b1[i + 7]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
246 b1[i + 4] += b1[i + 6]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
247 b1[i + 6] += b1[i + 5]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
248 b1[i + 5] += b1[i + 7]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
249 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
250 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
251 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
252 #if 0 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
253 /* Reference C code */ |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
254 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
255 /* |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
256 Should run faster than x87 asm, given that the compiler is sane. |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
257 However, the C code dosen't round with saturation (0x7fff for too |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
258 large positive float, 0x8000 for too small negative float). You |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
259 can hear the difference if you listen carefully. |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
260 */ |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
261 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
262 out0[256] = (short)(b2[0] + b2[1]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
263 out0[0] = (short)((b2[0] - b2[1]) * costab_mmx[30]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
264 out1[128] = (short)((b2[3] - b2[2]) * costab_mmx[30]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
265 out0[128] = (short)((b2[3] - b2[2]) * costab_mmx[30] + b2[3] + b2[2]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
266 out1[192] = (short)((b2[7] - b2[6]) * costab_mmx[30]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
267 out0[192] = (short)((b2[7] - b2[6]) * costab_mmx[30] + b2[6] + b2[7] + b2[4] + b2[5]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
268 out0[64] = (short)((b2[7] - b2[6]) * costab_mmx[30] + b2[6] + b2[7] + (b2[4] - b2[5]) * costab_mmx[30]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
269 out1[64] = (short)((b2[7] - b2[6]) * costab_mmx[30] + (b2[4] - b2[5]) * costab_mmx[30]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
270 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
271 out0[224] = (short)(b1[8] + b1[12]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
272 out0[160] = (short)(b1[12] + b1[10]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
273 out0[96] = (short)(b1[10] + b1[14]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
274 out0[32] = (short)(b1[14] + b1[9]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
275 out1[32] = (short)(b1[9] + b1[13]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
276 out1[96] = (short)(b1[13] + b1[11]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
277 out1[222] = (short)b1[15]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
278 out1[160] = (short)(b1[15] + b1[11]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
279 out0[240] = (short)(b1[24] + b1[28] + b1[16]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
280 out0[208] = (short)(b1[24] + b1[28] + b1[20]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
281 out0[176] = (short)(b1[28] + b1[26] + b1[20]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
282 out0[144] = (short)(b1[28] + b1[26] + b1[18]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
283 out0[112] = (short)(b1[26] + b1[30] + b1[18]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
284 out0[80] = (short)(b1[26] + b1[30] + b1[22]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
285 out0[48] = (short)(b1[30] + b1[25] + b1[22]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
286 out0[16] = (short)(b1[30] + b1[25] + b1[17]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
287 out1[16] = (short)(b1[25] + b1[29] + b1[17]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
288 out1[48] = (short)(b1[25] + b1[29] + b1[21]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
289 out1[80] = (short)(b1[29] + b1[27] + b1[21]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
290 out1[112] = (short)(b1[29] + b1[27] + b1[19]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
291 out1[144] = (short)(b1[27] + b1[31] + b1[19]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
292 out1[176] = (short)(b1[27] + b1[31] + b1[23]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
293 out1[240] = (short)(b1[31]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
294 out1[208] = (short)(b1[31] + b1[23]); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
295 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
296 #else |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
297 /* |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
298 To do saturation efficiently in x86 we can use fist(t)(p), |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
299 pf2iw, or packssdw. We use fist(p) here. |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
300 */ |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
301 asm( |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
302 "flds %0\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
303 "flds (%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
304 "fadds 4(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
305 "fistp 512(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
306 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
307 "flds (%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
308 "fsubs 4(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
309 "fmul %%st(1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
310 "fistp (%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
311 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
312 "flds 12(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
313 "fsubs 8(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
314 "fmul %%st(1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
315 "fist 256(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
316 "fadds 12(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
317 "fadds 8(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
318 "fistp 256(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
319 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
320 "flds 16(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
321 "fsubs 20(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
322 "fmul %%st(1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
323 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
324 "flds 28(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
325 "fsubs 24(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
326 "fmul %%st(2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
327 "fist 384(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
328 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
329 "fadds 24(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
330 "fadds 28(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
331 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
332 "fadds 16(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
333 "fadds 20(%2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
334 "fistp 384(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
335 "fadd %%st(2)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
336 "fistp 128(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
337 "faddp %%st(1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
338 "fistp 128(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
339 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
340 "flds 32(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
341 "fadds 48(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
342 "fistp 448(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
343 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
344 "flds 48(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
345 "fadds 40(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
346 "fistp 320(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
347 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
348 "flds 40(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
349 "fadds 56(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
350 "fistp 192(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
351 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
352 "flds 56(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
353 "fadds 36(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
354 "fistp 64(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
355 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
356 "flds 36(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
357 "fadds 52(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
358 "fistp 64(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
359 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
360 "flds 52(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
361 "fadds 44(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
362 "fistp 192(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
363 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
364 "flds 60(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
365 "fist 448(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
366 "fadds 44(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
367 "fistp 320(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
368 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
369 "flds 96(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
370 "fadds 112(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
371 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
372 "fadds 64(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
373 "fistp 480(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
374 "fadds 80(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
375 "fistp 416(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
376 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
377 "flds 112(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
378 "fadds 104(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
379 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
380 "fadds 80(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
381 "fistp 352(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
382 "fadds 72(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
383 "fistp 288(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
384 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
385 "flds 104(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
386 "fadds 120(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
387 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
388 "fadds 72(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
389 "fistp 224(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
390 "fadds 88(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
391 "fistp 160(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
392 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
393 "flds 120(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
394 "fadds 100(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
395 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
396 "fadds 88(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
397 "fistp 96(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
398 "fadds 68(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
399 "fistp 32(%3)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
400 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
401 "flds 100(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
402 "fadds 116(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
403 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
404 "fadds 68(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
405 "fistp 32(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
406 "fadds 84(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
407 "fistp 96(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
408 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
409 "flds 116(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
410 "fadds 108(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
411 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
412 "fadds 84(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
413 "fistp 160(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
414 "fadds 76(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
415 "fistp 224(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
416 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
417 "flds 108(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
418 "fadds 124(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
419 "fld %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
420 "fadds 76(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
421 "fistp 288(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
422 "fadds 92(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
423 "fistp 352(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
424 |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
425 "flds 124(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
426 "fist 480(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
427 "fadds 92(%1)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
428 "fistp 416(%4)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
429 "ffreep %%st(0)\n\t" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
430 : |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
431 :"m"(costab_mmx[30]), "r"(b1), "r"(b2), "r"(a), "r"(b) |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
432 :"memory" |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
433 ); |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
434 #endif |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
435 out1[0] = out0[0]; |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
436 } |
69c665e91946
Add dct64_sse, a replacement for dct64_MMX. About 60% faster on its author's Pentium III
gpoirier
parents:
diff
changeset
|
437 |