Mercurial > libavcodec.hg
annotate armv4l/simple_idct_arm.S @ 5707:c46509aca422 libavcodec
Remove check for input buffer size as it does not guarantee that
decoder will not run out of output buffer bounds (and all suspected
decoders have their own checks now).
author | kostya |
---|---|
date | Mon, 24 Sep 2007 16:50:32 +0000 |
parents | d2a7fc14345c |
children | 15ed47af1838 |
rev | line source |
---|---|
2967 | 1 /* |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
2 * simple_idct_arm.S |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
3 * Copyright (C) 2002 Frederic 'dilb' Boulay. |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
4 * |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
5 * Author: Frederic Boulay <dilb@handhelds.org> |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
6 * |
5214 | 7 * The function defined in this file is derived from the simple_idct function |
8 * from the libavcodec library part of the FFmpeg project. | |
9 * | |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3683
diff
changeset
|
10 * This file is part of FFmpeg. |
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3683
diff
changeset
|
11 * |
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3683
diff
changeset
|
12 * FFmpeg is free software; you can redistribute it and/or |
3683
dc1e28564bb2
Switch license from GPL to LGPL. The original author agreed to this as
diego
parents:
3036
diff
changeset
|
13 * modify it under the terms of the GNU Lesser General Public |
dc1e28564bb2
Switch license from GPL to LGPL. The original author agreed to this as
diego
parents:
3036
diff
changeset
|
14 * License as published by the Free Software Foundation; either |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3683
diff
changeset
|
15 * version 2.1 of the License, or (at your option) any later version. |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
16 * |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3683
diff
changeset
|
17 * FFmpeg is distributed in the hope that it will be useful, |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
18 * but WITHOUT ANY WARRANTY; without even the implied warranty of |
3683
dc1e28564bb2
Switch license from GPL to LGPL. The original author agreed to this as
diego
parents:
3036
diff
changeset
|
19 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
dc1e28564bb2
Switch license from GPL to LGPL. The original author agreed to this as
diego
parents:
3036
diff
changeset
|
20 * Lesser General Public License for more details. |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
21 * |
3683
dc1e28564bb2
Switch license from GPL to LGPL. The original author agreed to this as
diego
parents:
3036
diff
changeset
|
22 * You should have received a copy of the GNU Lesser General Public |
3947
c8c591fe26f8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
diego
parents:
3683
diff
changeset
|
23 * License along with FFmpeg; if not, write to the Free Software |
3036
0b546eab515d
Update licensing information: The FSF changed postal address.
diego
parents:
2979
diff
changeset
|
24 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
25 */ |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
26 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
27 /* useful constants for the algorithm, they are save in __constant_ptr__ at */ |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
28 /* the end of the source code.*/ |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
29 #define W1 22725 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
30 #define W2 21407 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
31 #define W3 19266 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
32 #define W4 16383 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
33 #define W5 12873 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
34 #define W6 8867 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
35 #define W7 4520 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
36 #define MASK_MSHW 0xFFFF0000 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
37 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
38 /* offsets of the constants in the vector */ |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
39 #define offW1 0 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
40 #define offW2 4 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
41 #define offW3 8 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
42 #define offW4 12 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
43 #define offW5 16 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
44 #define offW6 20 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
45 #define offW7 24 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
46 #define offMASK_MSHW 28 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
47 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
48 #define ROW_SHIFT 11 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
49 #define ROW_SHIFT2MSHW (16-11) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
50 #define COL_SHIFT 20 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
51 #define ROW_SHIFTED_1 1024 /* 1<< (ROW_SHIFT-1) */ |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
52 #define COL_SHIFTED_1 524288 /* 1<< (COL_SHIFT-1) */ |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
53 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
54 |
2979 | 55 .text |
56 .align | |
57 .global simple_idct_ARM | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
58 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
59 simple_idct_ARM: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
60 @@ void simple_idct_ARM(int16_t *block) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
61 @@ save stack for reg needed (take all of them), |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
62 @@ R0-R3 are scratch regs, so no need to save them, but R0 contains the pointer to block |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
63 @@ so it must not be overwritten, if it is not saved!! |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
64 @@ R12 is another scratch register, so it should not be saved too |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
65 @@ save all registers |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
66 stmfd sp!, {r4-r11, r14} @ R14 is also called LR |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
67 @@ at this point, R0=block, other registers are free. |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
68 add r14, r0, #112 @ R14=&block[8*7], better start from the last row, and decrease the value until row=0, i.e. R12=block. |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
69 add r12, pc, #(__constant_ptr__-.-8) @ R12=__constant_ptr__, the vector containing the constants, probably not necessary to reserve a register for it |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
70 @@ add 2 temporary variables in the stack: R0 and R14 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
71 sub sp, sp, #8 @ allow 2 local variables |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
72 str r0, [sp, #0] @ save block in sp[0] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
73 @@ stack status |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
74 @@ sp+4 free |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
75 @@ sp+0 R0 (block) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
76 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
77 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
78 @@ at this point, R0=block, R14=&block[56], R12=__const_ptr_, R1-R11 free |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
79 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
80 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
81 __row_loop: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
82 @@ read the row and check if it is null, almost null, or not, according to strongarm specs, it is not necessary to optimise ldr accesses (i.e. split 32bits in 2 16bits words), at least it gives more usable registers :) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
83 ldr r1, [r14, #0] @ R1=(int32)(R12)[0]=ROWr32[0] (relative row cast to a 32b pointer) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
84 ldr r2, [r14, #4] @ R2=(int32)(R12)[1]=ROWr32[1] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
85 ldr r3, [r14, #8] @ R3=ROWr32[2] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
86 ldr r4, [r14, #12] @ R4=ROWr32[3] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
87 @@ check if the words are null, if all of them are null, then proceed with next row (branch __end_row_loop), |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
88 @@ if ROWr16[0] is the only one not null, then proceed with this special case (branch __almost_empty_row) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
89 @@ else follow the complete algorithm. |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
90 @@ at this point, R0=block, R14=&block[n], R12=__const_ptr_, R1=ROWr32[0], R2=ROWr32[1], |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
91 @@ R3=ROWr32[2], R4=ROWr32[3], R5-R11 free |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
92 orr r5, r4, r3 @ R5=R4 | R3 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
93 orr r5, r5, r2 @ R5=R4 | R3 | R2 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
94 orrs r6, r5, r1 @ Test R5 | R1 (the aim is to check if everything is null) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
95 beq __end_row_loop |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
96 mov r7, r1, asr #16 @ R7=R1>>16=ROWr16[1] (evaluate it now, as it could be useful later) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
97 ldrsh r6, [r14, #0] @ R6=ROWr16[0] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
98 orrs r5, r5, r7 @ R5=R4 | R3 | R2 | R7 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
99 beq __almost_empty_row |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
100 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
101 __b_evaluation: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
102 @@ at this point, R0=block (temp), R1(free), R2=ROWr32[1], R3=ROWr32[2], R4=ROWr32[3], |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
103 @@ R5=(temp), R6=ROWr16[0], R7=ROWr16[1], R8-R11 free, |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
104 @@ R12=__const_ptr_, R14=&block[n] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
105 @@ to save some registers/calls, proceed with b0-b3 first, followed by a0-a3 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
106 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
107 @@ MUL16(b0, W1, row[1]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
108 @@ MUL16(b1, W3, row[1]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
109 @@ MUL16(b2, W5, row[1]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
110 @@ MUL16(b3, W7, row[1]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
111 @@ MAC16(b0, W3, row[3]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
112 @@ MAC16(b1, -W7, row[3]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
113 @@ MAC16(b2, -W1, row[3]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
114 @@ MAC16(b3, -W5, row[3]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
115 ldr r8, [r12, #offW1] @ R8=W1 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
116 mov r2, r2, asr #16 @ R2=ROWr16[3] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
117 mul r0, r8, r7 @ R0=W1*ROWr16[1]=b0 (ROWr16[1] must be the second arg, to have the possibility to save 1 cycle) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
118 ldr r9, [r12, #offW3] @ R9=W3 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
119 ldr r10, [r12, #offW5] @ R10=W5 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
120 mul r1, r9, r7 @ R1=W3*ROWr16[1]=b1 (ROWr16[1] must be the second arg, to have the possibility to save 1 cycle) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
121 ldr r11, [r12, #offW7] @ R11=W7 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
122 mul r5, r10, r7 @ R5=W5*ROWr16[1]=b2 (ROWr16[1] must be the second arg, to have the possibility to save 1 cycle) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
123 mul r7, r11, r7 @ R7=W7*ROWr16[1]=b3 (ROWr16[1] must be the second arg, to have the possibility to save 1 cycle) |
2979 | 124 teq r2, #0 @ if null avoid muls |
125 mlane r0, r9, r2, r0 @ R0+=W3*ROWr16[3]=b0 (ROWr16[3] must be the second arg, to have the possibility to save 1 cycle) | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
126 rsbne r2, r2, #0 @ R2=-ROWr16[3] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
127 mlane r1, r11, r2, r1 @ R1-=W7*ROWr16[3]=b1 (ROWr16[3] must be the second arg, to have the possibility to save 1 cycle) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
128 mlane r5, r8, r2, r5 @ R5-=W1*ROWr16[3]=b2 (ROWr16[3] must be the second arg, to have the possibility to save 1 cycle) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
129 mlane r7, r10, r2, r7 @ R7-=W5*ROWr16[3]=b3 (ROWr16[3] must be the second arg, to have the possibility to save 1 cycle) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
130 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
131 @@ at this point, R0=b0, R1=b1, R2 (free), R3=ROWr32[2], R4=ROWr32[3], |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
132 @@ R5=b2, R6=ROWr16[0], R7=b3, R8=W1, R9=W3, R10=W5, R11=W7, |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
133 @@ R12=__const_ptr_, R14=&block[n] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
134 @@ temp = ((uint32_t*)row)[2] | ((uint32_t*)row)[3]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
135 @@ if (temp != 0) {} |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
136 orrs r2, r3, r4 @ R2=ROWr32[2] | ROWr32[3] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
137 beq __end_b_evaluation |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
138 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
139 @@ at this point, R0=b0, R1=b1, R2 (free), R3=ROWr32[2], R4=ROWr32[3], |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
140 @@ R5=b2, R6=ROWr16[0], R7=b3, R8=W1, R9=W3, R10=W5, R11=W7, |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
141 @@ R12=__const_ptr_, R14=&block[n] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
142 @@ MAC16(b0, W5, row[5]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
143 @@ MAC16(b2, W7, row[5]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
144 @@ MAC16(b3, W3, row[5]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
145 @@ MAC16(b1, -W1, row[5]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
146 @@ MAC16(b0, W7, row[7]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
147 @@ MAC16(b2, W3, row[7]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
148 @@ MAC16(b3, -W1, row[7]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
149 @@ MAC16(b1, -W5, row[7]); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
150 mov r3, r3, asr #16 @ R3=ROWr16[5] |
2979 | 151 teq r3, #0 @ if null avoid muls |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
152 mlane r0, r10, r3, r0 @ R0+=W5*ROWr16[5]=b0 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
153 mov r4, r4, asr #16 @ R4=ROWr16[7] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
154 mlane r5, r11, r3, r5 @ R5+=W7*ROWr16[5]=b2 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
155 mlane r7, r9, r3, r7 @ R7+=W3*ROWr16[5]=b3 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
156 rsbne r3, r3, #0 @ R3=-ROWr16[5] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
157 mlane r1, r8, r3, r1 @ R7-=W1*ROWr16[5]=b1 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
158 @@ R3 is free now |
2979 | 159 teq r4, #0 @ if null avoid muls |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
160 mlane r0, r11, r4, r0 @ R0+=W7*ROWr16[7]=b0 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
161 mlane r5, r9, r4, r5 @ R5+=W3*ROWr16[7]=b2 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
162 rsbne r4, r4, #0 @ R4=-ROWr16[7] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
163 mlane r7, r8, r4, r7 @ R7-=W1*ROWr16[7]=b3 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
164 mlane r1, r10, r4, r1 @ R1-=W5*ROWr16[7]=b1 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
165 @@ R4 is free now |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
166 __end_b_evaluation: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
167 @@ at this point, R0=b0, R1=b1, R2=ROWr32[2] | ROWr32[3] (tmp), R3 (free), R4 (free), |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
168 @@ R5=b2, R6=ROWr16[0], R7=b3, R8 (free), R9 (free), R10 (free), R11 (free), |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
169 @@ R12=__const_ptr_, R14=&block[n] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
170 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
171 __a_evaluation: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
172 @@ a0 = (W4 * row[0]) + (1 << (ROW_SHIFT - 1)); |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
173 @@ a1 = a0 + W6 * row[2]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
174 @@ a2 = a0 - W6 * row[2]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
175 @@ a3 = a0 - W2 * row[2]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
176 @@ a0 = a0 + W2 * row[2]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
177 ldr r9, [r12, #offW4] @ R9=W4 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
178 mul r6, r9, r6 @ R6=W4*ROWr16[0] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
179 ldr r10, [r12, #offW6] @ R10=W6 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
180 ldrsh r4, [r14, #4] @ R4=ROWr16[2] (a3 not defined yet) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
181 add r6, r6, #ROW_SHIFTED_1 @ R6=W4*ROWr16[0] + 1<<(ROW_SHIFT-1) (a0) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
182 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
183 mul r11, r10, r4 @ R11=W6*ROWr16[2] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
184 ldr r8, [r12, #offW2] @ R8=W2 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
185 sub r3, r6, r11 @ R3=a0-W6*ROWr16[2] (a2) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
186 @@ temp = ((uint32_t*)row)[2] | ((uint32_t*)row)[3]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
187 @@ if (temp != 0) {} |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
188 teq r2, #0 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
189 beq __end_bef_a_evaluation |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
190 |
2979 | 191 add r2, r6, r11 @ R2=a0+W6*ROWr16[2] (a1) |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
192 mul r11, r8, r4 @ R11=W2*ROWr16[2] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
193 sub r4, r6, r11 @ R4=a0-W2*ROWr16[2] (a3) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
194 add r6, r6, r11 @ R6=a0+W2*ROWr16[2] (a0) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
195 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
196 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
197 @@ at this point, R0=b0, R1=b1, R2=a1, R3=a2, R4=a3, |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
198 @@ R5=b2, R6=a0, R7=b3, R8=W2, R9=W4, R10=W6, R11 (free), |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
199 @@ R12=__const_ptr_, R14=&block[n] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
200 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
201 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
202 @@ a0 += W4*row[4] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
203 @@ a1 -= W4*row[4] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
204 @@ a2 -= W4*row[4] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
205 @@ a3 += W4*row[4] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
206 ldrsh r11, [r14, #8] @ R11=ROWr16[4] |
2979 | 207 teq r11, #0 @ if null avoid muls |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
208 mulne r11, r9, r11 @ R11=W4*ROWr16[4] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
209 @@ R9 is free now |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
210 ldrsh r9, [r14, #12] @ R9=ROWr16[6] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
211 addne r6, r6, r11 @ R6+=W4*ROWr16[4] (a0) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
212 subne r2, r2, r11 @ R2-=W4*ROWr16[4] (a1) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
213 subne r3, r3, r11 @ R3-=W4*ROWr16[4] (a2) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
214 addne r4, r4, r11 @ R4+=W4*ROWr16[4] (a3) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
215 @@ W6 alone is no more useful, save W2*ROWr16[6] in it instead |
2979 | 216 teq r9, #0 @ if null avoid muls |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
217 mulne r11, r10, r9 @ R11=W6*ROWr16[6] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
218 addne r6, r6, r11 @ R6+=W6*ROWr16[6] (a0) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
219 mulne r10, r8, r9 @ R10=W2*ROWr16[6] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
220 @@ a0 += W6*row[6]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
221 @@ a3 -= W6*row[6]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
222 @@ a1 -= W2*row[6]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
223 @@ a2 += W2*row[6]; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
224 subne r4, r4, r11 @ R4-=W6*ROWr16[6] (a3) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
225 subne r2, r2, r10 @ R2-=W2*ROWr16[6] (a1) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
226 addne r3, r3, r10 @ R3+=W2*ROWr16[6] (a2) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
227 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
228 __end_a_evaluation: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
229 @@ at this point, R0=b0, R1=b1, R2=a1, R3=a2, R4=a3, |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
230 @@ R5=b2, R6=a0, R7=b3, R8 (free), R9 (free), R10 (free), R11 (free), |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
231 @@ R12=__const_ptr_, R14=&block[n] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
232 @@ row[0] = (a0 + b0) >> ROW_SHIFT; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
233 @@ row[1] = (a1 + b1) >> ROW_SHIFT; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
234 @@ row[2] = (a2 + b2) >> ROW_SHIFT; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
235 @@ row[3] = (a3 + b3) >> ROW_SHIFT; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
236 @@ row[4] = (a3 - b3) >> ROW_SHIFT; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
237 @@ row[5] = (a2 - b2) >> ROW_SHIFT; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
238 @@ row[6] = (a1 - b1) >> ROW_SHIFT; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
239 @@ row[7] = (a0 - b0) >> ROW_SHIFT; |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
240 add r8, r6, r0 @ R8=a0+b0 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
241 add r9, r2, r1 @ R9=a1+b1 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
242 @@ put 2 16 bits half-words in a 32bits word |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
243 @@ ROWr32[0]=ROWr16[0] | (ROWr16[1]<<16) (only Little Endian compliant then!!!) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
244 ldr r10, [r12, #offMASK_MSHW] @ R10=0xFFFF0000 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
245 and r9, r10, r9, lsl #ROW_SHIFT2MSHW @ R9=0xFFFF0000 & ((a1+b1)<<5) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
246 mvn r11, r10 @ R11= NOT R10= 0x0000FFFF |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
247 and r8, r11, r8, asr #ROW_SHIFT @ R8=0x0000FFFF & ((a0+b0)>>11) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
248 orr r8, r8, r9 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
249 str r8, [r14, #0] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
250 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
251 add r8, r3, r5 @ R8=a2+b2 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
252 add r9, r4, r7 @ R9=a3+b3 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
253 and r9, r10, r9, lsl #ROW_SHIFT2MSHW @ R9=0xFFFF0000 & ((a3+b3)<<5) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
254 and r8, r11, r8, asr #ROW_SHIFT @ R8=0x0000FFFF & ((a2+b2)>>11) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
255 orr r8, r8, r9 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
256 str r8, [r14, #4] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
257 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
258 sub r8, r4, r7 @ R8=a3-b3 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
259 sub r9, r3, r5 @ R9=a2-b2 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
260 and r9, r10, r9, lsl #ROW_SHIFT2MSHW @ R9=0xFFFF0000 & ((a2-b2)<<5) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
261 and r8, r11, r8, asr #ROW_SHIFT @ R8=0x0000FFFF & ((a3-b3)>>11) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
262 orr r8, r8, r9 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
263 str r8, [r14, #8] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
264 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
265 sub r8, r2, r1 @ R8=a1-b1 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
266 sub r9, r6, r0 @ R9=a0-b0 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
267 and r9, r10, r9, lsl #ROW_SHIFT2MSHW @ R9=0xFFFF0000 & ((a0-b0)<<5) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
268 and r8, r11, r8, asr #ROW_SHIFT @ R8=0x0000FFFF & ((a1-b1)>>11) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
269 orr r8, r8, r9 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
270 str r8, [r14, #12] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
271 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
272 bal __end_row_loop |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
273 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
274 __almost_empty_row: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
275 @@ the row was empty, except ROWr16[0], now, management of this special case |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
276 @@ at this point, R0=block, R14=&block[n], R12=__const_ptr_, R1=ROWr32[0], R2=ROWr32[1], |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
277 @@ R3=ROWr32[2], R4=ROWr32[3], R5=(temp), R6=ROWr16[0], R7=ROWr16[1], |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
278 @@ R8=0xFFFF (temp), R9-R11 free |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
279 mov r8, #0x10000 @ R8=0xFFFF (2 steps needed!) it saves a ldr call (because of delay run). |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
280 sub r8, r8, #1 @ R8 is now ready. |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
281 and r5, r8, r6, lsl #3 @ R5=R8 & (R6<<3)= (ROWr16[0]<<3) & 0xFFFF |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
282 orr r5, r5, r5, lsl #16 @ R5=R5 | (R5<<16) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
283 str r5, [r14, #0] @ R14[0]=ROWr32[0]=R5 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
284 str r5, [r14, #4] @ R14[4]=ROWr32[1]=R5 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
285 str r5, [r14, #8] @ R14[8]=ROWr32[2]=R5 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
286 str r5, [r14, #12] @ R14[12]=ROWr32[3]=R5 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
287 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
288 __end_row_loop: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
289 @@ at this point, R0-R11 (free) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
290 @@ R12=__const_ptr_, R14=&block[n] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
291 ldr r0, [sp, #0] @ R0=block |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
292 teq r0, r14 @ compare current &block[8*n] to block, when block is reached, the loop is finished. |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
293 sub r14, r14, #16 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
294 bne __row_loop |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
295 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
296 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
297 |
2979 | 298 @@ at this point, R0=block, R1-R11 (free) |
299 @@ R12=__const_ptr_, R14=&block[n] | |
300 add r14, r0, #14 @ R14=&block[7], better start from the last col, and decrease the value until col=0, i.e. R14=block. | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
301 __col_loop: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
302 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
303 __b_evaluation2: |
2979 | 304 @@ at this point, R0=block (temp), R1-R11 (free) |
305 @@ R12=__const_ptr_, R14=&block[n] | |
306 @@ proceed with b0-b3 first, followed by a0-a3 | |
307 @@ MUL16(b0, W1, col[8x1]); | |
308 @@ MUL16(b1, W3, col[8x1]); | |
309 @@ MUL16(b2, W5, col[8x1]); | |
310 @@ MUL16(b3, W7, col[8x1]); | |
311 @@ MAC16(b0, W3, col[8x3]); | |
312 @@ MAC16(b1, -W7, col[8x3]); | |
313 @@ MAC16(b2, -W1, col[8x3]); | |
314 @@ MAC16(b3, -W5, col[8x3]); | |
315 ldr r8, [r12, #offW1] @ R8=W1 | |
316 ldrsh r7, [r14, #16] | |
317 mul r0, r8, r7 @ R0=W1*ROWr16[1]=b0 (ROWr16[1] must be the second arg, to have the possibility to save 1 cycle) | |
318 ldr r9, [r12, #offW3] @ R9=W3 | |
319 ldr r10, [r12, #offW5] @ R10=W5 | |
320 mul r1, r9, r7 @ R1=W3*ROWr16[1]=b1 (ROWr16[1] must be the second arg, to have the possibility to save 1 cycle) | |
321 ldr r11, [r12, #offW7] @ R11=W7 | |
322 mul r5, r10, r7 @ R5=W5*ROWr16[1]=b2 (ROWr16[1] must be the second arg, to have the possibility to save 1 cycle) | |
323 ldrsh r2, [r14, #48] | |
324 mul r7, r11, r7 @ R7=W7*ROWr16[1]=b3 (ROWr16[1] must be the second arg, to have the possibility to save 1 cycle) | |
325 teq r2, #0 @ if 0, then avoid muls | |
326 mlane r0, r9, r2, r0 @ R0+=W3*ROWr16[3]=b0 (ROWr16[3] must be the second arg, to have the possibility to save 1 cycle) | |
327 rsbne r2, r2, #0 @ R2=-ROWr16[3] | |
328 mlane r1, r11, r2, r1 @ R1-=W7*ROWr16[3]=b1 (ROWr16[3] must be the second arg, to have the possibility to save 1 cycle) | |
329 mlane r5, r8, r2, r5 @ R5-=W1*ROWr16[3]=b2 (ROWr16[3] must be the second arg, to have the possibility to save 1 cycle) | |
330 mlane r7, r10, r2, r7 @ R7-=W5*ROWr16[3]=b3 (ROWr16[3] must be the second arg, to have the possibility to save 1 cycle) | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
331 |
2979 | 332 @@ at this point, R0=b0, R1=b1, R2 (free), R3 (free), R4 (free), |
333 @@ R5=b2, R6 (free), R7=b3, R8=W1, R9=W3, R10=W5, R11=W7, | |
334 @@ R12=__const_ptr_, R14=&block[n] | |
335 @@ MAC16(b0, W5, col[5x8]); | |
336 @@ MAC16(b2, W7, col[5x8]); | |
337 @@ MAC16(b3, W3, col[5x8]); | |
338 @@ MAC16(b1, -W1, col[5x8]); | |
339 @@ MAC16(b0, W7, col[7x8]); | |
340 @@ MAC16(b2, W3, col[7x8]); | |
341 @@ MAC16(b3, -W1, col[7x8]); | |
342 @@ MAC16(b1, -W5, col[7x8]); | |
343 ldrsh r3, [r14, #80] @ R3=COLr16[5x8] | |
344 teq r3, #0 @ if 0 then avoid muls | |
345 mlane r0, r10, r3, r0 @ R0+=W5*ROWr16[5x8]=b0 | |
346 mlane r5, r11, r3, r5 @ R5+=W7*ROWr16[5x8]=b2 | |
347 mlane r7, r9, r3, r7 @ R7+=W3*ROWr16[5x8]=b3 | |
348 rsbne r3, r3, #0 @ R3=-ROWr16[5x8] | |
349 ldrsh r4, [r14, #112] @ R4=COLr16[7x8] | |
350 mlane r1, r8, r3, r1 @ R7-=W1*ROWr16[5x8]=b1 | |
351 @@ R3 is free now | |
352 teq r4, #0 @ if 0 then avoid muls | |
353 mlane r0, r11, r4, r0 @ R0+=W7*ROWr16[7x8]=b0 | |
354 mlane r5, r9, r4, r5 @ R5+=W3*ROWr16[7x8]=b2 | |
355 rsbne r4, r4, #0 @ R4=-ROWr16[7x8] | |
356 mlane r7, r8, r4, r7 @ R7-=W1*ROWr16[7x8]=b3 | |
357 mlane r1, r10, r4, r1 @ R1-=W5*ROWr16[7x8]=b1 | |
358 @@ R4 is free now | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
359 __end_b_evaluation2: |
2979 | 360 @@ at this point, R0=b0, R1=b1, R2 (free), R3 (free), R4 (free), |
361 @@ R5=b2, R6 (free), R7=b3, R8 (free), R9 (free), R10 (free), R11 (free), | |
362 @@ R12=__const_ptr_, R14=&block[n] | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
363 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
364 __a_evaluation2: |
2979 | 365 @@ a0 = (W4 * col[8x0]) + (1 << (COL_SHIFT - 1)); |
366 @@ a1 = a0 + W6 * row[2]; | |
367 @@ a2 = a0 - W6 * row[2]; | |
368 @@ a3 = a0 - W2 * row[2]; | |
369 @@ a0 = a0 + W2 * row[2]; | |
370 ldrsh r6, [r14, #0] | |
371 ldr r9, [r12, #offW4] @ R9=W4 | |
372 mul r6, r9, r6 @ R6=W4*ROWr16[0] | |
373 ldr r10, [r12, #offW6] @ R10=W6 | |
374 ldrsh r4, [r14, #32] @ R4=ROWr16[2] (a3 not defined yet) | |
375 add r6, r6, #COL_SHIFTED_1 @ R6=W4*ROWr16[0] + 1<<(COL_SHIFT-1) (a0) | |
376 mul r11, r10, r4 @ R11=W6*ROWr16[2] | |
377 ldr r8, [r12, #offW2] @ R8=W2 | |
378 add r2, r6, r11 @ R2=a0+W6*ROWr16[2] (a1) | |
379 sub r3, r6, r11 @ R3=a0-W6*ROWr16[2] (a2) | |
380 mul r11, r8, r4 @ R11=W2*ROWr16[2] | |
381 sub r4, r6, r11 @ R4=a0-W2*ROWr16[2] (a3) | |
382 add r6, r6, r11 @ R6=a0+W2*ROWr16[2] (a0) | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
383 |
2979 | 384 @@ at this point, R0=b0, R1=b1, R2=a1, R3=a2, R4=a3, |
385 @@ R5=b2, R6=a0, R7=b3, R8=W2, R9=W4, R10=W6, R11 (free), | |
386 @@ R12=__const_ptr_, R14=&block[n] | |
387 @@ a0 += W4*row[4] | |
388 @@ a1 -= W4*row[4] | |
389 @@ a2 -= W4*row[4] | |
390 @@ a3 += W4*row[4] | |
391 ldrsh r11, [r14, #64] @ R11=ROWr16[4] | |
392 teq r11, #0 @ if null avoid muls | |
393 mulne r11, r9, r11 @ R11=W4*ROWr16[4] | |
394 @@ R9 is free now | |
395 addne r6, r6, r11 @ R6+=W4*ROWr16[4] (a0) | |
396 subne r2, r2, r11 @ R2-=W4*ROWr16[4] (a1) | |
397 subne r3, r3, r11 @ R3-=W4*ROWr16[4] (a2) | |
398 ldrsh r9, [r14, #96] @ R9=ROWr16[6] | |
399 addne r4, r4, r11 @ R4+=W4*ROWr16[4] (a3) | |
400 @@ W6 alone is no more useful, save W2*ROWr16[6] in it instead | |
401 teq r9, #0 @ if null avoid muls | |
402 mulne r11, r10, r9 @ R11=W6*ROWr16[6] | |
403 addne r6, r6, r11 @ R6+=W6*ROWr16[6] (a0) | |
404 mulne r10, r8, r9 @ R10=W2*ROWr16[6] | |
405 @@ a0 += W6*row[6]; | |
406 @@ a3 -= W6*row[6]; | |
407 @@ a1 -= W2*row[6]; | |
408 @@ a2 += W2*row[6]; | |
409 subne r4, r4, r11 @ R4-=W6*ROWr16[6] (a3) | |
410 subne r2, r2, r10 @ R2-=W2*ROWr16[6] (a1) | |
411 addne r3, r3, r10 @ R3+=W2*ROWr16[6] (a2) | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
412 __end_a_evaluation2: |
2979 | 413 @@ at this point, R0=b0, R1=b1, R2=a1, R3=a2, R4=a3, |
414 @@ R5=b2, R6=a0, R7=b3, R8 (free), R9 (free), R10 (free), R11 (free), | |
415 @@ R12=__const_ptr_, R14=&block[n] | |
416 @@ col[0 ] = ((a0 + b0) >> COL_SHIFT); | |
417 @@ col[8 ] = ((a1 + b1) >> COL_SHIFT); | |
418 @@ col[16] = ((a2 + b2) >> COL_SHIFT); | |
419 @@ col[24] = ((a3 + b3) >> COL_SHIFT); | |
420 @@ col[32] = ((a3 - b3) >> COL_SHIFT); | |
421 @@ col[40] = ((a2 - b2) >> COL_SHIFT); | |
422 @@ col[48] = ((a1 - b1) >> COL_SHIFT); | |
423 @@ col[56] = ((a0 - b0) >> COL_SHIFT); | |
424 @@@@@ no optimisation here @@@@@ | |
425 add r8, r6, r0 @ R8=a0+b0 | |
426 add r9, r2, r1 @ R9=a1+b1 | |
427 mov r8, r8, asr #COL_SHIFT | |
428 mov r9, r9, asr #COL_SHIFT | |
429 strh r8, [r14, #0] | |
430 strh r9, [r14, #16] | |
431 add r8, r3, r5 @ R8=a2+b2 | |
432 add r9, r4, r7 @ R9=a3+b3 | |
433 mov r8, r8, asr #COL_SHIFT | |
434 mov r9, r9, asr #COL_SHIFT | |
435 strh r8, [r14, #32] | |
436 strh r9, [r14, #48] | |
437 sub r8, r4, r7 @ R8=a3-b3 | |
438 sub r9, r3, r5 @ R9=a2-b2 | |
439 mov r8, r8, asr #COL_SHIFT | |
440 mov r9, r9, asr #COL_SHIFT | |
441 strh r8, [r14, #64] | |
442 strh r9, [r14, #80] | |
443 sub r8, r2, r1 @ R8=a1-b1 | |
444 sub r9, r6, r0 @ R9=a0-b0 | |
445 mov r8, r8, asr #COL_SHIFT | |
446 mov r9, r9, asr #COL_SHIFT | |
447 strh r8, [r14, #96] | |
448 strh r9, [r14, #112] | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
449 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
450 __end_col_loop: |
2979 | 451 @@ at this point, R0-R11 (free) |
452 @@ R12=__const_ptr_, R14=&block[n] | |
453 ldr r0, [sp, #0] @ R0=block | |
454 teq r0, r14 @ compare current &block[n] to block, when block is reached, the loop is finished. | |
455 sub r14, r14, #2 | |
456 bne __col_loop | |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
457 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
458 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
459 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
460 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
461 __end_simple_idct_ARM: |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
462 @@ restore registers to previous status! |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
463 add sp, sp, #8 @@ the local variables! |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
464 ldmfd sp!, {r4-r11, r15} @@ update PC with LR content. |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
465 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
466 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
467 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
468 @@ kind of sub-function, here not to overload the common case. |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
469 __end_bef_a_evaluation: |
2979 | 470 add r2, r6, r11 @ R2=a0+W6*ROWr16[2] (a1) |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
471 mul r11, r8, r4 @ R11=W2*ROWr16[2] |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
472 sub r4, r6, r11 @ R4=a0-W2*ROWr16[2] (a3) |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
473 add r6, r6, r11 @ R6=a0+W2*ROWr16[2] (a0) |
2979 | 474 bal __end_a_evaluation |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
475 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
476 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
477 __constant_ptr__: @@ see #defines at the beginning of the source code for values. |
2979 | 478 .align |
1347
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
479 .word W1 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
480 .word W2 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
481 .word W3 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
482 .word W4 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
483 .word W5 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
484 .word W6 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
485 .word W7 |
cca26199ab17
Optimized simple idct for arm by Frederic 'dilb' Boulay <dilb@handhelds.org>. Currently licensed under the GPLv2, but the author allowed to license it under the LGPL, feel free to change
al3x
parents:
diff
changeset
|
486 .word MASK_MSHW |