annotate arm/h264dsp_neon.S @ 12530:63edd10ad4bc libavcodec tip

Try to fix crashes introduced by r25218 r25218 made assumptions about the existence of past reference frames that weren't necessarily true.
author darkshikari
date Tue, 28 Sep 2010 09:06:22 +0000
parents 69bbfd8f2ba5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
1 /*
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
2 * Copyright (c) 2008 Mans Rullgard <mans@mansr.com>
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
3 *
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
4 * This file is part of FFmpeg.
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
5 *
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
6 * FFmpeg is free software; you can redistribute it and/or
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
7 * modify it under the terms of the GNU Lesser General Public
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
8 * License as published by the Free Software Foundation; either
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
9 * version 2.1 of the License, or (at your option) any later version.
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
10 *
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
11 * FFmpeg is distributed in the hope that it will be useful,
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
12 * but WITHOUT ANY WARRANTY; without even the implied warranty of
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
13 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
14 * Lesser General Public License for more details.
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
15 *
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
16 * You should have received a copy of the GNU Lesser General Public
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
17 * License along with FFmpeg; if not, write to the Free Software
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
18 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
19 */
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
20
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
21 #include "asm.S"
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
22
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
23 .macro transpose_8x8 r0 r1 r2 r3 r4 r5 r6 r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
24 vtrn.32 \r0, \r4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
25 vtrn.32 \r1, \r5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
26 vtrn.32 \r2, \r6
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
27 vtrn.32 \r3, \r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
28 vtrn.16 \r0, \r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
29 vtrn.16 \r1, \r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
30 vtrn.16 \r4, \r6
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
31 vtrn.16 \r5, \r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
32 vtrn.8 \r0, \r1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
33 vtrn.8 \r2, \r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
34 vtrn.8 \r4, \r5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
35 vtrn.8 \r6, \r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
36 .endm
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
37
9864
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
38 .macro transpose_4x4 r0 r1 r2 r3
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
39 vtrn.16 \r0, \r2
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
40 vtrn.16 \r1, \r3
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
41 vtrn.8 \r0, \r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
42 vtrn.8 \r2, \r3
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
43 .endm
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
44
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
45 .macro swap4 r0 r1 r2 r3 r4 r5 r6 r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
46 vswp \r0, \r4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
47 vswp \r1, \r5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
48 vswp \r2, \r6
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
49 vswp \r3, \r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
50 .endm
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
51
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
52 .macro transpose16_4x4 r0 r1 r2 r3 r4 r5 r6 r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
53 vtrn.32 \r0, \r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
54 vtrn.32 \r1, \r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
55 vtrn.32 \r4, \r6
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
56 vtrn.32 \r5, \r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
57 vtrn.16 \r0, \r1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
58 vtrn.16 \r2, \r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
59 vtrn.16 \r4, \r5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
60 vtrn.16 \r6, \r7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
61 .endm
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
62
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
63 /* chroma_mc8(uint8_t *dst, uint8_t *src, int stride, int h, int x, int y) */
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
64 .macro h264_chroma_mc8 type
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
65 function ff_\type\()_h264_chroma_mc8_neon, export=1
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
66 push {r4-r7, lr}
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
67 ldrd r4, [sp, #20]
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
68 .ifc \type,avg
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
69 mov lr, r0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
70 .endif
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
71 pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
72 pld [r1, r2]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
73
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
74 muls r7, r4, r5
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
75 rsb r6, r7, r5, lsl #3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
76 rsb ip, r7, r4, lsl #3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
77 sub r4, r7, r4, lsl #3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
78 sub r4, r4, r5, lsl #3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
79 add r4, r4, #64
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
80
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
81 beq 2f
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
82
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
83 add r5, r1, r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
84
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
85 vdup.8 d0, r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
86 lsl r4, r2, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
87 vdup.8 d1, ip
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
88 vld1.64 {d4, d5}, [r1], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
89 vdup.8 d2, r6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
90 vld1.64 {d6, d7}, [r5], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
91 vdup.8 d3, r7
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
92
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
93 vext.8 d5, d4, d5, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
94 vext.8 d7, d6, d7, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
95
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
96 1: pld [r5]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
97 vmull.u8 q8, d4, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
98 vmlal.u8 q8, d5, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
99 vld1.64 {d4, d5}, [r1], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
100 vmlal.u8 q8, d6, d2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
101 vext.8 d5, d4, d5, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
102 vmlal.u8 q8, d7, d3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
103 vmull.u8 q9, d6, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
104 subs r3, r3, #2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
105 vmlal.u8 q9, d7, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
106 vmlal.u8 q9, d4, d2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
107 vmlal.u8 q9, d5, d3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
108 vrshrn.u16 d16, q8, #6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
109 vld1.64 {d6, d7}, [r5], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
110 pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
111 vrshrn.u16 d17, q9, #6
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
112 .ifc \type,avg
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
113 vld1.64 {d20}, [lr,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
114 vld1.64 {d21}, [lr,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
115 vrhadd.u8 q8, q8, q10
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
116 .endif
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
117 vext.8 d7, d6, d7, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
118 vst1.64 {d16}, [r0,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
119 vst1.64 {d17}, [r0,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
120 bgt 1b
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
121
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
122 pop {r4-r7, pc}
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
123
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
124 2: tst r6, r6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
125 add ip, ip, r6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
126 vdup.8 d0, r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
127 vdup.8 d1, ip
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
128
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
129 beq 4f
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
130
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
131 add r5, r1, r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
132 lsl r4, r2, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
133 vld1.64 {d4}, [r1], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
134 vld1.64 {d6}, [r5], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
135
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
136 3: pld [r5]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
137 vmull.u8 q8, d4, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
138 vmlal.u8 q8, d6, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
139 vld1.64 {d4}, [r1], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
140 vmull.u8 q9, d6, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
141 vmlal.u8 q9, d4, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
142 vld1.64 {d6}, [r5], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
143 vrshrn.u16 d16, q8, #6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
144 vrshrn.u16 d17, q9, #6
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
145 .ifc \type,avg
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
146 vld1.64 {d20}, [lr,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
147 vld1.64 {d21}, [lr,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
148 vrhadd.u8 q8, q8, q10
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
149 .endif
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
150 subs r3, r3, #2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
151 pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
152 vst1.64 {d16}, [r0,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
153 vst1.64 {d17}, [r0,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
154 bgt 3b
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
155
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
156 pop {r4-r7, pc}
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
157
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
158 4: vld1.64 {d4, d5}, [r1], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
159 vld1.64 {d6, d7}, [r1], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
160 vext.8 d5, d4, d5, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
161 vext.8 d7, d6, d7, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
162
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
163 5: pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
164 subs r3, r3, #2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
165 vmull.u8 q8, d4, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
166 vmlal.u8 q8, d5, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
167 vld1.64 {d4, d5}, [r1], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
168 vmull.u8 q9, d6, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
169 vmlal.u8 q9, d7, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
170 pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
171 vext.8 d5, d4, d5, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
172 vrshrn.u16 d16, q8, #6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
173 vrshrn.u16 d17, q9, #6
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
174 .ifc \type,avg
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
175 vld1.64 {d20}, [lr,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
176 vld1.64 {d21}, [lr,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
177 vrhadd.u8 q8, q8, q10
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
178 .endif
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
179 vld1.64 {d6, d7}, [r1], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
180 vext.8 d7, d6, d7, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
181 vst1.64 {d16}, [r0,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
182 vst1.64 {d17}, [r0,:64], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
183 bgt 5b
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
184
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
185 pop {r4-r7, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
186 endfunc
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
187 .endm
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
188
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
189 /* chroma_mc4(uint8_t *dst, uint8_t *src, int stride, int h, int x, int y) */
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
190 .macro h264_chroma_mc4 type
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
191 function ff_\type\()_h264_chroma_mc4_neon, export=1
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
192 push {r4-r7, lr}
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
193 ldrd r4, [sp, #20]
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
194 .ifc \type,avg
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
195 mov lr, r0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
196 .endif
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
197 pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
198 pld [r1, r2]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
199
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
200 muls r7, r4, r5
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
201 rsb r6, r7, r5, lsl #3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
202 rsb ip, r7, r4, lsl #3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
203 sub r4, r7, r4, lsl #3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
204 sub r4, r4, r5, lsl #3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
205 add r4, r4, #64
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
206
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
207 beq 2f
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
208
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
209 add r5, r1, r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
210
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
211 vdup.8 d0, r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
212 lsl r4, r2, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
213 vdup.8 d1, ip
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
214 vld1.64 {d4}, [r1], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
215 vdup.8 d2, r6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
216 vld1.64 {d6}, [r5], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
217 vdup.8 d3, r7
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
218
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
219 vext.8 d5, d4, d5, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
220 vext.8 d7, d6, d7, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
221 vtrn.32 d4, d5
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
222 vtrn.32 d6, d7
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
223
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
224 vtrn.32 d0, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
225 vtrn.32 d2, d3
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
226
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
227 1: pld [r5]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
228 vmull.u8 q8, d4, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
229 vmlal.u8 q8, d6, d2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
230 vld1.64 {d4}, [r1], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
231 vext.8 d5, d4, d5, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
232 vtrn.32 d4, d5
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
233 vmull.u8 q9, d6, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
234 vmlal.u8 q9, d4, d2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
235 vld1.64 {d6}, [r5], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
236 vadd.i16 d16, d16, d17
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
237 vadd.i16 d17, d18, d19
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
238 vrshrn.u16 d16, q8, #6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
239 subs r3, r3, #2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
240 pld [r1]
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
241 .ifc \type,avg
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
242 vld1.32 {d20[0]}, [lr,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
243 vld1.32 {d20[1]}, [lr,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
244 vrhadd.u8 d16, d16, d20
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
245 .endif
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
246 vext.8 d7, d6, d7, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
247 vtrn.32 d6, d7
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
248 vst1.32 {d16[0]}, [r0,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
249 vst1.32 {d16[1]}, [r0,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
250 bgt 1b
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
251
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
252 pop {r4-r7, pc}
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
253
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
254 2: tst r6, r6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
255 add ip, ip, r6
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
256 vdup.8 d0, r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
257 vdup.8 d1, ip
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
258 vtrn.32 d0, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
259
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
260 beq 4f
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
261
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
262 vext.32 d1, d0, d1, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
263 add r5, r1, r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
264 lsl r4, r2, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
265 vld1.32 {d4[0]}, [r1], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
266 vld1.32 {d4[1]}, [r5], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
267
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
268 3: pld [r5]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
269 vmull.u8 q8, d4, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
270 vld1.32 {d4[0]}, [r1], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
271 vmull.u8 q9, d4, d1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
272 vld1.32 {d4[1]}, [r5], r4
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
273 vadd.i16 d16, d16, d17
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
274 vadd.i16 d17, d18, d19
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
275 vrshrn.u16 d16, q8, #6
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
276 .ifc \type,avg
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
277 vld1.32 {d20[0]}, [lr,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
278 vld1.32 {d20[1]}, [lr,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
279 vrhadd.u8 d16, d16, d20
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
280 .endif
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
281 subs r3, r3, #2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
282 pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
283 vst1.32 {d16[0]}, [r0,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
284 vst1.32 {d16[1]}, [r0,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
285 bgt 3b
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
286
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
287 pop {r4-r7, pc}
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
288
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
289 4: vld1.64 {d4}, [r1], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
290 vld1.64 {d6}, [r1], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
291 vext.8 d5, d4, d5, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
292 vext.8 d7, d6, d7, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
293 vtrn.32 d4, d5
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
294 vtrn.32 d6, d7
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
295
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
296 5: vmull.u8 q8, d4, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
297 vmull.u8 q9, d6, d0
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
298 subs r3, r3, #2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
299 vld1.64 {d4}, [r1], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
300 vext.8 d5, d4, d5, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
301 vtrn.32 d4, d5
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
302 vadd.i16 d16, d16, d17
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
303 vadd.i16 d17, d18, d19
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
304 pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
305 vrshrn.u16 d16, q8, #6
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
306 .ifc \type,avg
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
307 vld1.32 {d20[0]}, [lr,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
308 vld1.32 {d20[1]}, [lr,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
309 vrhadd.u8 d16, d16, d20
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
310 .endif
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
311 vld1.64 {d6}, [r1], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
312 vext.8 d7, d6, d7, #1
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
313 vtrn.32 d6, d7
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
314 pld [r1]
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
315 vst1.32 {d16[0]}, [r0,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
316 vst1.32 {d16[1]}, [r0,:32], r2
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
317 bgt 5b
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
318
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
319 pop {r4-r7, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
320 endfunc
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
321 .endm
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
322
10617
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
323 .macro h264_chroma_mc2 type
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
324 function ff_\type\()_h264_chroma_mc2_neon, export=1
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
325 push {r4-r6, lr}
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
326 ldr r4, [sp, #16]
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
327 ldr lr, [sp, #20]
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
328 pld [r1]
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
329 pld [r1, r2]
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
330 orrs r5, r4, lr
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
331 beq 2f
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
332
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
333 mul r5, r4, lr
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
334 rsb r6, r5, lr, lsl #3
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
335 rsb r12, r5, r4, lsl #3
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
336 sub r4, r5, r4, lsl #3
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
337 sub r4, r4, lr, lsl #3
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
338 add r4, r4, #64
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
339 vdup.8 d0, r4
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
340 vdup.8 d2, r12
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
341 vdup.8 d1, r6
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
342 vdup.8 d3, r5
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
343 vtrn.16 q0, q1
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
344 1:
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
345 vld1.32 {d4[0]}, [r1], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
346 vld1.32 {d4[1]}, [r1], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
347 vrev64.32 d5, d4
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
348 vld1.32 {d5[1]}, [r1]
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
349 vext.8 q3, q2, q2, #1
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
350 vtrn.16 q2, q3
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
351 vmull.u8 q8, d4, d0
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
352 vmlal.u8 q8, d5, d1
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
353 .ifc \type,avg
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
354 vld1.16 {d18[0]}, [r0,:16], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
355 vld1.16 {d18[1]}, [r0,:16]
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
356 sub r0, r0, r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
357 .endif
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
358 vtrn.32 d16, d17
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
359 vadd.i16 d16, d16, d17
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
360 vrshrn.u16 d16, q8, #6
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
361 .ifc \type,avg
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
362 vrhadd.u8 d16, d16, d18
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
363 .endif
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
364 vst1.16 {d16[0]}, [r0,:16], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
365 vst1.16 {d16[1]}, [r0,:16], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
366 subs r3, r3, #2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
367 bgt 1b
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
368 pop {r4-r6, pc}
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
369 2:
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
370 .ifc \type,put
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
371 ldrh r5, [r1], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
372 strh r5, [r0], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
373 ldrh r6, [r1], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
374 strh r6, [r0], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
375 .else
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
376 vld1.16 {d16[0]}, [r1], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
377 vld1.16 {d16[1]}, [r1], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
378 vld1.16 {d18[0]}, [r0,:16], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
379 vld1.16 {d18[1]}, [r0,:16]
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
380 sub r0, r0, r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
381 vrhadd.u8 d16, d16, d18
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
382 vst1.16 {d16[0]}, [r0,:16], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
383 vst1.16 {d16[1]}, [r0,:16], r2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
384 .endif
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
385 subs r3, r3, #2
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
386 bgt 2b
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
387 pop {r4-r6, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
388 endfunc
10617
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
389 .endm
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
390
8336
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
391 .text
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
392 .align
c8401acb05d1 ARM: NEON optimised {put,avg}_h264_chroma_mc[48]
mru
parents:
diff changeset
393
8626
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
394 h264_chroma_mc8 put
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
395 h264_chroma_mc8 avg
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
396 h264_chroma_mc4 put
8d425ee85ddb ARM: simplify ff_put/avg_h264_chroma_mc4/8_neon definitions, no code change
mru
parents: 8359
diff changeset
397 h264_chroma_mc4 avg
10617
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
398 h264_chroma_mc2 put
5506cbb012b4 ARM: NEON 2xN chroma MC
mru
parents: 10616
diff changeset
399 h264_chroma_mc2 avg
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
400
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
401 /* H.264 loop filter */
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
402
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
403 .macro h264_loop_filter_start
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
404 ldr ip, [sp]
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
405 tst r2, r2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
406 ldr ip, [ip]
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
407 tstne r3, r3
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
408 vmov.32 d24[0], ip
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
409 and ip, ip, ip, lsl #16
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
410 bxeq lr
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
411 ands ip, ip, ip, lsl #8
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
412 bxlt lr
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
413 .endm
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
414
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
415 .macro align_push_regs
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
416 and ip, sp, #15
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
417 add ip, ip, #32
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
418 sub sp, sp, ip
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
419 vst1.64 {d12-d15}, [sp,:128]
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
420 sub sp, sp, #32
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
421 vst1.64 {d8-d11}, [sp,:128]
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
422 .endm
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
423
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
424 .macro align_pop_regs
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
425 vld1.64 {d8-d11}, [sp,:128]!
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
426 vld1.64 {d12-d15}, [sp,:128], ip
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
427 .endm
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
428
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
429 .macro h264_loop_filter_luma
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
430 vdup.8 q11, r2 @ alpha
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
431 vmovl.u8 q12, d24
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
432 vabd.u8 q6, q8, q0 @ abs(p0 - q0)
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
433 vmovl.u16 q12, d24
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
434 vabd.u8 q14, q9, q8 @ abs(p1 - p0)
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
435 vsli.16 q12, q12, #8
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
436 vabd.u8 q15, q1, q0 @ abs(q1 - q0)
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
437 vsli.32 q12, q12, #16
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
438 vclt.u8 q6, q6, q11 @ < alpha
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
439 vdup.8 q11, r3 @ beta
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
440 vclt.s8 q7, q12, #0
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
441 vclt.u8 q14, q14, q11 @ < beta
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
442 vclt.u8 q15, q15, q11 @ < beta
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
443 vbic q6, q6, q7
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
444 vabd.u8 q4, q10, q8 @ abs(p2 - p0)
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
445 vand q6, q6, q14
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
446 vabd.u8 q5, q2, q0 @ abs(q2 - q0)
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
447 vclt.u8 q4, q4, q11 @ < beta
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
448 vand q6, q6, q15
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
449 vclt.u8 q5, q5, q11 @ < beta
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
450 vand q4, q4, q6
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
451 vand q5, q5, q6
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
452 vand q12, q12, q6
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
453 vrhadd.u8 q14, q8, q0
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
454 vsub.i8 q6, q12, q4
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
455 vqadd.u8 q7, q9, q12
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
456 vhadd.u8 q10, q10, q14
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
457 vsub.i8 q6, q6, q5
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
458 vhadd.u8 q14, q2, q14
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
459 vmin.u8 q7, q7, q10
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
460 vqsub.u8 q11, q9, q12
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
461 vqadd.u8 q2, q1, q12
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
462 vmax.u8 q7, q7, q11
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
463 vqsub.u8 q11, q1, q12
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
464 vmin.u8 q14, q2, q14
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
465 vmovl.u8 q2, d0
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
466 vmax.u8 q14, q14, q11
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
467 vmovl.u8 q10, d1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
468 vsubw.u8 q2, q2, d16
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
469 vsubw.u8 q10, q10, d17
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
470 vshl.i16 q2, q2, #2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
471 vshl.i16 q10, q10, #2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
472 vaddw.u8 q2, q2, d18
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
473 vaddw.u8 q10, q10, d19
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
474 vsubw.u8 q2, q2, d2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
475 vsubw.u8 q10, q10, d3
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
476 vrshrn.i16 d4, q2, #3
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
477 vrshrn.i16 d5, q10, #3
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
478 vbsl q4, q7, q9
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
479 vbsl q5, q14, q1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
480 vneg.s8 q7, q6
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
481 vmovl.u8 q14, d16
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
482 vmin.s8 q2, q2, q6
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
483 vmovl.u8 q6, d17
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
484 vmax.s8 q2, q2, q7
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
485 vmovl.u8 q11, d0
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
486 vmovl.u8 q12, d1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
487 vaddw.s8 q14, q14, d4
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
488 vaddw.s8 q6, q6, d5
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
489 vsubw.s8 q11, q11, d4
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
490 vsubw.s8 q12, q12, d5
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
491 vqmovun.s16 d16, q14
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
492 vqmovun.s16 d17, q6
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
493 vqmovun.s16 d0, q11
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
494 vqmovun.s16 d1, q12
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
495 .endm
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
496
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
497 function ff_h264_v_loop_filter_luma_neon, export=1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
498 h264_loop_filter_start
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
499
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
500 vld1.64 {d0, d1}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
501 vld1.64 {d2, d3}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
502 vld1.64 {d4, d5}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
503 sub r0, r0, r1, lsl #2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
504 sub r0, r0, r1, lsl #1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
505 vld1.64 {d20,d21}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
506 vld1.64 {d18,d19}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
507 vld1.64 {d16,d17}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
508
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
509 align_push_regs
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
510
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
511 h264_loop_filter_luma
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
512
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
513 sub r0, r0, r1, lsl #1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
514 vst1.64 {d8, d9}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
515 vst1.64 {d16,d17}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
516 vst1.64 {d0, d1}, [r0,:128], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
517 vst1.64 {d10,d11}, [r0,:128]
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
518
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
519 align_pop_regs
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
520 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
521 endfunc
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
522
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
523 function ff_h264_h_loop_filter_luma_neon, export=1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
524 h264_loop_filter_start
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
525
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
526 sub r0, r0, #4
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
527 vld1.64 {d6}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
528 vld1.64 {d20}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
529 vld1.64 {d18}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
530 vld1.64 {d16}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
531 vld1.64 {d0}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
532 vld1.64 {d2}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
533 vld1.64 {d4}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
534 vld1.64 {d26}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
535 vld1.64 {d7}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
536 vld1.64 {d21}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
537 vld1.64 {d19}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
538 vld1.64 {d17}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
539 vld1.64 {d1}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
540 vld1.64 {d3}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
541 vld1.64 {d5}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
542 vld1.64 {d27}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
543
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
544 transpose_8x8 q3, q10, q9, q8, q0, q1, q2, q13
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
545
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
546 align_push_regs
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
547
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
548 h264_loop_filter_luma
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
549
9864
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
550 transpose_4x4 q4, q8, q0, q5
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
551
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
552 sub r0, r0, r1, lsl #4
9864
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
553 add r0, r0, #2
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
554 vst1.32 {d8[0]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
555 vst1.32 {d16[0]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
556 vst1.32 {d0[0]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
557 vst1.32 {d10[0]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
558 vst1.32 {d8[1]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
559 vst1.32 {d16[1]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
560 vst1.32 {d0[1]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
561 vst1.32 {d10[1]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
562 vst1.32 {d9[0]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
563 vst1.32 {d17[0]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
564 vst1.32 {d1[0]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
565 vst1.32 {d11[0]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
566 vst1.32 {d9[1]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
567 vst1.32 {d17[1]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
568 vst1.32 {d1[1]}, [r0], r1
f5ffd813dc7f ARM: slightly faster NEON H264 horizontal loop filter
mru
parents: 9072
diff changeset
569 vst1.32 {d11[1]}, [r0], r1
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
570
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
571 align_pop_regs
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
572 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
573 endfunc
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
574
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
575 .macro h264_loop_filter_chroma
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
576 vdup.8 d22, r2 @ alpha
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
577 vmovl.u8 q12, d24
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
578 vabd.u8 d26, d16, d0 @ abs(p0 - q0)
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
579 vmovl.u8 q2, d0
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
580 vabd.u8 d28, d18, d16 @ abs(p1 - p0)
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
581 vsubw.u8 q2, q2, d16
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
582 vsli.16 d24, d24, #8
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
583 vshl.i16 q2, q2, #2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
584 vabd.u8 d30, d2, d0 @ abs(q1 - q0)
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
585 vaddw.u8 q2, q2, d18
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
586 vclt.u8 d26, d26, d22 @ < alpha
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
587 vsubw.u8 q2, q2, d2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
588 vdup.8 d22, r3 @ beta
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
589 vrshrn.i16 d4, q2, #3
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
590 vclt.u8 d28, d28, d22 @ < beta
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
591 vclt.u8 d30, d30, d22 @ < beta
12167
69bbfd8f2ba5 ARM: NEON H264 chroma loop filter 3 cycles faster
mru
parents: 12166
diff changeset
592 vmin.s8 d4, d4, d24
69bbfd8f2ba5 ARM: NEON H264 chroma loop filter 3 cycles faster
mru
parents: 12166
diff changeset
593 vneg.s8 d25, d24
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
594 vand d26, d26, d28
12167
69bbfd8f2ba5 ARM: NEON H264 chroma loop filter 3 cycles faster
mru
parents: 12166
diff changeset
595 vmax.s8 d4, d4, d25
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
596 vand d26, d26, d30
12167
69bbfd8f2ba5 ARM: NEON H264 chroma loop filter 3 cycles faster
mru
parents: 12166
diff changeset
597 vmovl.u8 q11, d0
69bbfd8f2ba5 ARM: NEON H264 chroma loop filter 3 cycles faster
mru
parents: 12166
diff changeset
598 vand d4, d4, d26
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
599 vmovl.u8 q14, d16
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
600 vaddw.s8 q14, q14, d4
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
601 vsubw.s8 q11, q11, d4
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
602 vqmovun.s16 d16, q14
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
603 vqmovun.s16 d0, q11
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
604 .endm
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
605
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
606 function ff_h264_v_loop_filter_chroma_neon, export=1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
607 h264_loop_filter_start
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
608
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
609 sub r0, r0, r1, lsl #1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
610 vld1.64 {d18}, [r0,:64], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
611 vld1.64 {d16}, [r0,:64], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
612 vld1.64 {d0}, [r0,:64], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
613 vld1.64 {d2}, [r0,:64]
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
614
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
615 h264_loop_filter_chroma
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
616
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
617 sub r0, r0, r1, lsl #1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
618 vst1.64 {d16}, [r0,:64], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
619 vst1.64 {d0}, [r0,:64], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
620
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
621 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
622 endfunc
8337
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
623
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
624 function ff_h264_h_loop_filter_chroma_neon, export=1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
625 h264_loop_filter_start
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
626
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
627 sub r0, r0, #2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
628 vld1.32 {d18[0]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
629 vld1.32 {d16[0]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
630 vld1.32 {d0[0]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
631 vld1.32 {d2[0]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
632 vld1.32 {d18[1]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
633 vld1.32 {d16[1]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
634 vld1.32 {d0[1]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
635 vld1.32 {d2[1]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
636
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
637 vtrn.16 d18, d0
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
638 vtrn.16 d16, d2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
639 vtrn.8 d18, d16
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
640 vtrn.8 d0, d2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
641
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
642 h264_loop_filter_chroma
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
643
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
644 vtrn.16 d18, d0
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
645 vtrn.16 d16, d2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
646 vtrn.8 d18, d16
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
647 vtrn.8 d0, d2
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
648
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
649 sub r0, r0, r1, lsl #3
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
650 vst1.32 {d18[0]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
651 vst1.32 {d16[0]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
652 vst1.32 {d0[0]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
653 vst1.32 {d2[0]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
654 vst1.32 {d18[1]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
655 vst1.32 {d16[1]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
656 vst1.32 {d0[1]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
657 vst1.32 {d2[1]}, [r0], r1
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
658
d43b7f4c5c1c ARM: NEON optimised H.264 loop filter
mru
parents: 8336
diff changeset
659 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
660 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
661
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
662 /* H.264 qpel MC */
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
663
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
664 .macro lowpass_const r
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
665 movw \r, #5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
666 movt \r, #20
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
667 vmov.32 d6[0], \r
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
668 .endm
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
669
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
670 .macro lowpass_8 r0, r1, r2, r3, d0, d1, narrow=1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
671 .if \narrow
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
672 t0 .req q0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
673 t1 .req q8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
674 .else
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
675 t0 .req \d0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
676 t1 .req \d1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
677 .endif
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
678 vext.8 d2, \r0, \r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
679 vext.8 d3, \r0, \r1, #3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
680 vaddl.u8 q1, d2, d3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
681 vext.8 d4, \r0, \r1, #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
682 vext.8 d5, \r0, \r1, #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
683 vaddl.u8 q2, d4, d5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
684 vext.8 d30, \r0, \r1, #5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
685 vaddl.u8 t0, \r0, d30
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
686 vext.8 d18, \r2, \r3, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
687 vmla.i16 t0, q1, d6[1]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
688 vext.8 d19, \r2, \r3, #3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
689 vaddl.u8 q9, d18, d19
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
690 vext.8 d20, \r2, \r3, #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
691 vmls.i16 t0, q2, d6[0]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
692 vext.8 d21, \r2, \r3, #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
693 vaddl.u8 q10, d20, d21
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
694 vext.8 d31, \r2, \r3, #5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
695 vaddl.u8 t1, \r2, d31
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
696 vmla.i16 t1, q9, d6[1]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
697 vmls.i16 t1, q10, d6[0]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
698 .if \narrow
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
699 vqrshrun.s16 \d0, t0, #5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
700 vqrshrun.s16 \d1, t1, #5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
701 .endif
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
702 .unreq t0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
703 .unreq t1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
704 .endm
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
705
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
706 .macro lowpass_8_1 r0, r1, d0, narrow=1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
707 .if \narrow
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
708 t0 .req q0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
709 .else
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
710 t0 .req \d0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
711 .endif
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
712 vext.8 d2, \r0, \r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
713 vext.8 d3, \r0, \r1, #3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
714 vaddl.u8 q1, d2, d3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
715 vext.8 d4, \r0, \r1, #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
716 vext.8 d5, \r0, \r1, #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
717 vaddl.u8 q2, d4, d5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
718 vext.8 d30, \r0, \r1, #5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
719 vaddl.u8 t0, \r0, d30
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
720 vmla.i16 t0, q1, d6[1]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
721 vmls.i16 t0, q2, d6[0]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
722 .if \narrow
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
723 vqrshrun.s16 \d0, t0, #5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
724 .endif
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
725 .unreq t0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
726 .endm
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
727
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
728 .macro lowpass_8.16 r0, r1, l0, h0, l1, h1, d
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
729 vext.16 q1, \r0, \r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
730 vext.16 q0, \r0, \r1, #3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
731 vaddl.s16 q9, d2, d0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
732 vext.16 q2, \r0, \r1, #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
733 vaddl.s16 q1, d3, d1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
734 vext.16 q3, \r0, \r1, #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
735 vaddl.s16 q10, d4, d6
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
736 vext.16 \r1, \r0, \r1, #5
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
737 vaddl.s16 q2, d5, d7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
738 vaddl.s16 q0, \h0, \h1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
739 vaddl.s16 q8, \l0, \l1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
740
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
741 vshl.i32 q3, q9, #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
742 vshl.i32 q9, q9, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
743 vshl.i32 q15, q10, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
744 vadd.i32 q9, q9, q3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
745 vadd.i32 q10, q10, q15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
746
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
747 vshl.i32 q3, q1, #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
748 vshl.i32 q1, q1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
749 vshl.i32 q15, q2, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
750 vadd.i32 q1, q1, q3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
751 vadd.i32 q2, q2, q15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
752
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
753 vadd.i32 q9, q9, q8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
754 vsub.i32 q9, q9, q10
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
755
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
756 vadd.i32 q1, q1, q0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
757 vsub.i32 q1, q1, q2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
758
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
759 vrshrn.s32 d18, q9, #10
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
760 vrshrn.s32 d19, q1, #10
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
761
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
762 vqmovun.s16 \d, q9
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
763 .endm
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
764
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
765 function put_h264_qpel16_h_lowpass_neon_packed
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
766 mov r4, lr
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
767 mov ip, #16
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
768 mov r3, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
769 bl put_h264_qpel8_h_lowpass_neon
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
770 sub r1, r1, r2, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
771 add r1, r1, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
772 mov ip, #16
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
773 mov lr, r4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
774 b put_h264_qpel8_h_lowpass_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
775 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
776
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
777 .macro h264_qpel_h_lowpass type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
778 function \type\()_h264_qpel16_h_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
779 push {lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
780 mov ip, #16
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
781 bl \type\()_h264_qpel8_h_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
782 sub r0, r0, r3, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
783 sub r1, r1, r2, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
784 add r0, r0, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
785 add r1, r1, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
786 mov ip, #16
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
787 pop {lr}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
788 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
789
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
790 function \type\()_h264_qpel8_h_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
791 1: vld1.64 {d0, d1}, [r1], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
792 vld1.64 {d16,d17}, [r1], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
793 subs ip, ip, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
794 lowpass_8 d0, d1, d16, d17, d0, d16
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
795 .ifc \type,avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
796 vld1.8 {d2}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
797 vrhadd.u8 d0, d0, d2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
798 vld1.8 {d3}, [r0,:64]
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
799 vrhadd.u8 d16, d16, d3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
800 sub r0, r0, r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
801 .endif
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
802 vst1.64 {d0}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
803 vst1.64 {d16}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
804 bne 1b
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
805 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
806 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
807 .endm
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
808
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
809 h264_qpel_h_lowpass put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
810 h264_qpel_h_lowpass avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
811
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
812 .macro h264_qpel_h_lowpass_l2 type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
813 function \type\()_h264_qpel16_h_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
814 push {lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
815 mov ip, #16
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
816 bl \type\()_h264_qpel8_h_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
817 sub r0, r0, r2, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
818 sub r1, r1, r2, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
819 sub r3, r3, r2, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
820 add r0, r0, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
821 add r1, r1, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
822 add r3, r3, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
823 mov ip, #16
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
824 pop {lr}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
825 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
826
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
827 function \type\()_h264_qpel8_h_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
828 1: vld1.64 {d0, d1}, [r1], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
829 vld1.64 {d16,d17}, [r1], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
830 vld1.64 {d28}, [r3], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
831 vld1.64 {d29}, [r3], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
832 subs ip, ip, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
833 lowpass_8 d0, d1, d16, d17, d0, d1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
834 vrhadd.u8 q0, q0, q14
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
835 .ifc \type,avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
836 vld1.8 {d2}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
837 vrhadd.u8 d0, d0, d2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
838 vld1.8 {d3}, [r0,:64]
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
839 vrhadd.u8 d1, d1, d3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
840 sub r0, r0, r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
841 .endif
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
842 vst1.64 {d0}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
843 vst1.64 {d1}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
844 bne 1b
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
845 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
846 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
847 .endm
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
848
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
849 h264_qpel_h_lowpass_l2 put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
850 h264_qpel_h_lowpass_l2 avg
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
851
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
852 function put_h264_qpel16_v_lowpass_neon_packed
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
853 mov r4, lr
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
854 mov r2, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
855 bl put_h264_qpel8_v_lowpass_neon
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
856 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
857 bl put_h264_qpel8_v_lowpass_neon
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
858 sub r1, r1, r3, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
859 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
860 add r1, r1, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
861 bl put_h264_qpel8_v_lowpass_neon
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
862 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
863 mov lr, r4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
864 b put_h264_qpel8_v_lowpass_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
865 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
866
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
867 .macro h264_qpel_v_lowpass type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
868 function \type\()_h264_qpel16_v_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
869 mov r4, lr
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
870 bl \type\()_h264_qpel8_v_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
871 sub r1, r1, r3, lsl #2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
872 bl \type\()_h264_qpel8_v_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
873 sub r0, r0, r2, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
874 add r0, r0, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
875 sub r1, r1, r3, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
876 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
877 add r1, r1, #8
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
878 bl \type\()_h264_qpel8_v_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
879 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
880 mov lr, r4
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
881 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
882
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
883 function \type\()_h264_qpel8_v_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
884 vld1.64 {d8}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
885 vld1.64 {d10}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
886 vld1.64 {d12}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
887 vld1.64 {d14}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
888 vld1.64 {d22}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
889 vld1.64 {d24}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
890 vld1.64 {d26}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
891 vld1.64 {d28}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
892 vld1.64 {d9}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
893 vld1.64 {d11}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
894 vld1.64 {d13}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
895 vld1.64 {d15}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
896 vld1.64 {d23}, [r1]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
897
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
898 transpose_8x8 q4, q5, q6, q7, q11, q12, q13, q14
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
899 lowpass_8 d8, d9, d10, d11, d8, d10
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
900 lowpass_8 d12, d13, d14, d15, d12, d14
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
901 lowpass_8 d22, d23, d24, d25, d22, d24
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
902 lowpass_8 d26, d27, d28, d29, d26, d28
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
903 transpose_8x8 d8, d10, d12, d14, d22, d24, d26, d28
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
904
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
905 .ifc \type,avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
906 vld1.8 {d9}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
907 vrhadd.u8 d8, d8, d9
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
908 vld1.8 {d11}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
909 vrhadd.u8 d10, d10, d11
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
910 vld1.8 {d13}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
911 vrhadd.u8 d12, d12, d13
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
912 vld1.8 {d15}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
913 vrhadd.u8 d14, d14, d15
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
914 vld1.8 {d23}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
915 vrhadd.u8 d22, d22, d23
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
916 vld1.8 {d25}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
917 vrhadd.u8 d24, d24, d25
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
918 vld1.8 {d27}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
919 vrhadd.u8 d26, d26, d27
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
920 vld1.8 {d29}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
921 vrhadd.u8 d28, d28, d29
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
922 sub r0, r0, r2, lsl #3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
923 .endif
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
924
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
925 vst1.64 {d8}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
926 vst1.64 {d10}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
927 vst1.64 {d12}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
928 vst1.64 {d14}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
929 vst1.64 {d22}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
930 vst1.64 {d24}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
931 vst1.64 {d26}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
932 vst1.64 {d28}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
933
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
934 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
935 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
936 .endm
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
937
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
938 h264_qpel_v_lowpass put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
939 h264_qpel_v_lowpass avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
940
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
941 .macro h264_qpel_v_lowpass_l2 type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
942 function \type\()_h264_qpel16_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
943 mov r4, lr
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
944 bl \type\()_h264_qpel8_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
945 sub r1, r1, r3, lsl #2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
946 bl \type\()_h264_qpel8_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
947 sub r0, r0, r3, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
948 sub ip, ip, r2, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
949 add r0, r0, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
950 add ip, ip, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
951 sub r1, r1, r3, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
952 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
953 add r1, r1, #8
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
954 bl \type\()_h264_qpel8_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
955 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
956 mov lr, r4
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
957 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
958
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
959 function \type\()_h264_qpel8_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
960 vld1.64 {d8}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
961 vld1.64 {d10}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
962 vld1.64 {d12}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
963 vld1.64 {d14}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
964 vld1.64 {d22}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
965 vld1.64 {d24}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
966 vld1.64 {d26}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
967 vld1.64 {d28}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
968 vld1.64 {d9}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
969 vld1.64 {d11}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
970 vld1.64 {d13}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
971 vld1.64 {d15}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
972 vld1.64 {d23}, [r1]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
973
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
974 transpose_8x8 q4, q5, q6, q7, q11, q12, q13, q14
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
975 lowpass_8 d8, d9, d10, d11, d8, d9
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
976 lowpass_8 d12, d13, d14, d15, d12, d13
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
977 lowpass_8 d22, d23, d24, d25, d22, d23
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
978 lowpass_8 d26, d27, d28, d29, d26, d27
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
979 transpose_8x8 d8, d9, d12, d13, d22, d23, d26, d27
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
980
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
981 vld1.64 {d0}, [ip], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
982 vld1.64 {d1}, [ip], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
983 vld1.64 {d2}, [ip], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
984 vld1.64 {d3}, [ip], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
985 vld1.64 {d4}, [ip], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
986 vrhadd.u8 q0, q0, q4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
987 vld1.64 {d5}, [ip], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
988 vrhadd.u8 q1, q1, q6
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
989 vld1.64 {d10}, [ip], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
990 vrhadd.u8 q2, q2, q11
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
991 vld1.64 {d11}, [ip], r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
992 vrhadd.u8 q5, q5, q13
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
993
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
994 .ifc \type,avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
995 vld1.8 {d16}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
996 vrhadd.u8 d0, d0, d16
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
997 vld1.8 {d17}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
998 vrhadd.u8 d1, d1, d17
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
999 vld1.8 {d16}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1000 vrhadd.u8 d2, d2, d16
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1001 vld1.8 {d17}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1002 vrhadd.u8 d3, d3, d17
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1003 vld1.8 {d16}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1004 vrhadd.u8 d4, d4, d16
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1005 vld1.8 {d17}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1006 vrhadd.u8 d5, d5, d17
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1007 vld1.8 {d16}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1008 vrhadd.u8 d10, d10, d16
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1009 vld1.8 {d17}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1010 vrhadd.u8 d11, d11, d17
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1011 sub r0, r0, r3, lsl #3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1012 .endif
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1013
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1014 vst1.64 {d0}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1015 vst1.64 {d1}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1016 vst1.64 {d2}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1017 vst1.64 {d3}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1018 vst1.64 {d4}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1019 vst1.64 {d5}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1020 vst1.64 {d10}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1021 vst1.64 {d11}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1022
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1023 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1024 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1025 .endm
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1026
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1027 h264_qpel_v_lowpass_l2 put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1028 h264_qpel_v_lowpass_l2 avg
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1029
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1030 function put_h264_qpel8_hv_lowpass_neon_top
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1031 lowpass_const ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1032 mov ip, #12
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1033 1: vld1.64 {d0, d1}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1034 vld1.64 {d16,d17}, [r1], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1035 subs ip, ip, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1036 lowpass_8 d0, d1, d16, d17, q11, q12, narrow=0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1037 vst1.64 {d22-d25}, [r4,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1038 bne 1b
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1039
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1040 vld1.64 {d0, d1}, [r1]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1041 lowpass_8_1 d0, d1, q12, narrow=0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1042
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1043 mov ip, #-16
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1044 add r4, r4, ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1045 vld1.64 {d30,d31}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1046 vld1.64 {d20,d21}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1047 vld1.64 {d18,d19}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1048 vld1.64 {d16,d17}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1049 vld1.64 {d14,d15}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1050 vld1.64 {d12,d13}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1051 vld1.64 {d10,d11}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1052 vld1.64 {d8, d9}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1053 vld1.64 {d6, d7}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1054 vld1.64 {d4, d5}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1055 vld1.64 {d2, d3}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1056 vld1.64 {d0, d1}, [r4,:128]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1057
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1058 swap4 d1, d3, d5, d7, d8, d10, d12, d14
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1059 transpose16_4x4 q0, q1, q2, q3, q4, q5, q6, q7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1060
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1061 swap4 d17, d19, d21, d31, d24, d26, d28, d22
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1062 transpose16_4x4 q8, q9, q10, q15, q12, q13, q14, q11
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1063
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1064 vst1.64 {d30,d31}, [r4,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1065 vst1.64 {d6, d7}, [r4,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1066 vst1.64 {d20,d21}, [r4,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1067 vst1.64 {d4, d5}, [r4,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1068 vst1.64 {d18,d19}, [r4,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1069 vst1.64 {d2, d3}, [r4,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1070 vst1.64 {d16,d17}, [r4,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1071 vst1.64 {d0, d1}, [r4,:128]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1072
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1073 lowpass_8.16 q4, q12, d8, d9, d24, d25, d8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1074 lowpass_8.16 q5, q13, d10, d11, d26, d27, d9
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1075 lowpass_8.16 q6, q14, d12, d13, d28, d29, d10
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1076 lowpass_8.16 q7, q11, d14, d15, d22, d23, d11
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1077
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1078 vld1.64 {d16,d17}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1079 vld1.64 {d30,d31}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1080 lowpass_8.16 q8, q15, d16, d17, d30, d31, d12
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1081 vld1.64 {d16,d17}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1082 vld1.64 {d30,d31}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1083 lowpass_8.16 q8, q15, d16, d17, d30, d31, d13
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1084 vld1.64 {d16,d17}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1085 vld1.64 {d30,d31}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1086 lowpass_8.16 q8, q15, d16, d17, d30, d31, d14
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1087 vld1.64 {d16,d17}, [r4,:128], ip
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1088 vld1.64 {d30,d31}, [r4,:128]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1089 lowpass_8.16 q8, q15, d16, d17, d30, d31, d15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1090
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1091 transpose_8x8 d12, d13, d14, d15, d8, d9, d10, d11
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1092
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1093 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1094 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1095
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1096 .macro h264_qpel8_hv_lowpass type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1097 function \type\()_h264_qpel8_hv_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1098 mov r10, lr
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1099 bl put_h264_qpel8_hv_lowpass_neon_top
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1100 .ifc \type,avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1101 vld1.8 {d0}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1102 vrhadd.u8 d12, d12, d0
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1103 vld1.8 {d1}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1104 vrhadd.u8 d13, d13, d1
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1105 vld1.8 {d2}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1106 vrhadd.u8 d14, d14, d2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1107 vld1.8 {d3}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1108 vrhadd.u8 d15, d15, d3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1109 vld1.8 {d4}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1110 vrhadd.u8 d8, d8, d4
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1111 vld1.8 {d5}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1112 vrhadd.u8 d9, d9, d5
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1113 vld1.8 {d6}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1114 vrhadd.u8 d10, d10, d6
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1115 vld1.8 {d7}, [r0,:64], r2
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1116 vrhadd.u8 d11, d11, d7
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1117 sub r0, r0, r2, lsl #3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1118 .endif
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1119 vst1.64 {d12}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1120 vst1.64 {d13}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1121 vst1.64 {d14}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1122 vst1.64 {d15}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1123 vst1.64 {d8}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1124 vst1.64 {d9}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1125 vst1.64 {d10}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1126 vst1.64 {d11}, [r0,:64], r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1127
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1128 mov lr, r10
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1129 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1130 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1131 .endm
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1132
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1133 h264_qpel8_hv_lowpass put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1134 h264_qpel8_hv_lowpass avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1135
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1136 .macro h264_qpel8_hv_lowpass_l2 type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1137 function \type\()_h264_qpel8_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1138 mov r10, lr
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1139 bl put_h264_qpel8_hv_lowpass_neon_top
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1140
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1141 vld1.64 {d0, d1}, [r2,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1142 vld1.64 {d2, d3}, [r2,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1143 vrhadd.u8 q0, q0, q6
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1144 vld1.64 {d4, d5}, [r2,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1145 vrhadd.u8 q1, q1, q7
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1146 vld1.64 {d6, d7}, [r2,:128]!
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1147 vrhadd.u8 q2, q2, q4
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1148 vrhadd.u8 q3, q3, q5
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1149 .ifc \type,avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1150 vld1.8 {d16}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1151 vrhadd.u8 d0, d0, d16
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1152 vld1.8 {d17}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1153 vrhadd.u8 d1, d1, d17
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1154 vld1.8 {d18}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1155 vrhadd.u8 d2, d2, d18
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1156 vld1.8 {d19}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1157 vrhadd.u8 d3, d3, d19
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1158 vld1.8 {d20}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1159 vrhadd.u8 d4, d4, d20
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1160 vld1.8 {d21}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1161 vrhadd.u8 d5, d5, d21
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1162 vld1.8 {d22}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1163 vrhadd.u8 d6, d6, d22
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1164 vld1.8 {d23}, [r0,:64], r3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1165 vrhadd.u8 d7, d7, d23
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1166 sub r0, r0, r3, lsl #3
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1167 .endif
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1168 vst1.64 {d0}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1169 vst1.64 {d1}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1170 vst1.64 {d2}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1171 vst1.64 {d3}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1172 vst1.64 {d4}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1173 vst1.64 {d5}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1174 vst1.64 {d6}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1175 vst1.64 {d7}, [r0,:64], r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1176
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1177 mov lr, r10
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1178 bx lr
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1179 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1180 .endm
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1181
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1182 h264_qpel8_hv_lowpass_l2 put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1183 h264_qpel8_hv_lowpass_l2 avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1184
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1185 .macro h264_qpel16_hv type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1186 function \type\()_h264_qpel16_hv_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1187 mov r9, lr
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1188 bl \type\()_h264_qpel8_hv_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1189 sub r1, r1, r3, lsl #2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1190 bl \type\()_h264_qpel8_hv_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1191 sub r1, r1, r3, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1192 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1193 add r1, r1, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1194 sub r0, r0, r2, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1195 add r0, r0, #8
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1196 bl \type\()_h264_qpel8_hv_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1197 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1198 mov lr, r9
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1199 b \type\()_h264_qpel8_hv_lowpass_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1200 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1201
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1202 function \type\()_h264_qpel16_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1203 mov r9, lr
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1204 sub r2, r4, #256
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1205 bl \type\()_h264_qpel8_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1206 sub r1, r1, r3, lsl #2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1207 bl \type\()_h264_qpel8_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1208 sub r1, r1, r3, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1209 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1210 add r1, r1, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1211 sub r0, r0, r3, lsl #4
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1212 add r0, r0, #8
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1213 bl \type\()_h264_qpel8_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1214 sub r1, r1, r3, lsl #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1215 mov lr, r9
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1216 b \type\()_h264_qpel8_hv_lowpass_l2_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1217 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1218 .endm
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1219
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1220 h264_qpel16_hv put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1221 h264_qpel16_hv avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1222
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1223 .macro h264_qpel8 type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1224 function ff_\type\()_h264_qpel8_mc10_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1225 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1226 mov r3, r1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1227 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1228 mov ip, #8
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1229 b \type\()_h264_qpel8_h_lowpass_l2_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1230 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1231
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1232 function ff_\type\()_h264_qpel8_mc20_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1233 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1234 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1235 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1236 mov ip, #8
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1237 b \type\()_h264_qpel8_h_lowpass_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1238 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1239
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1240 function ff_\type\()_h264_qpel8_mc30_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1241 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1242 add r3, r1, #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1243 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1244 mov ip, #8
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1245 b \type\()_h264_qpel8_h_lowpass_l2_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1246 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1247
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1248 function ff_\type\()_h264_qpel8_mc01_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1249 push {lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1250 mov ip, r1
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1251 \type\()_h264_qpel8_mc01:
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1252 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1253 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1254 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1255 vpush {d8-d15}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1256 bl \type\()_h264_qpel8_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1257 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1258 pop {pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1259 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1260
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1261 function ff_\type\()_h264_qpel8_mc11_neon, export=1
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1262 push {r0, r1, r11, lr}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1263 \type\()_h264_qpel8_mc11:
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1264 lowpass_const r3
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1265 mov r11, sp
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1266 bic sp, sp, #15
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1267 sub sp, sp, #64
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1268 mov r0, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1269 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1270 mov r3, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1271 mov ip, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1272 vpush {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1273 bl put_h264_qpel8_h_lowpass_neon
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1274 ldrd r0, [r11]
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1275 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1276 add ip, sp, #64
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1277 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1278 mov r2, #8
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1279 bl \type\()_h264_qpel8_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1280 vpop {d8-d15}
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1281 add sp, r11, #8
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1282 pop {r11, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1283 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1284
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1285 function ff_\type\()_h264_qpel8_mc21_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1286 push {r0, r1, r4, r10, r11, lr}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1287 \type\()_h264_qpel8_mc21:
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1288 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1289 mov r11, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1290 bic sp, sp, #15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1291 sub sp, sp, #(8*8+16*12)
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1292 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1293 mov r3, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1294 mov r0, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1295 mov ip, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1296 vpush {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1297 bl put_h264_qpel8_h_lowpass_neon
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1298 mov r4, r0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1299 ldrd r0, [r11]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1300 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1301 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1302 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1303 sub r2, r4, #64
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1304 bl \type\()_h264_qpel8_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1305 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1306 add sp, r11, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1307 pop {r4, r10, r11, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1308 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1309
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1310 function ff_\type\()_h264_qpel8_mc31_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1311 add r1, r1, #1
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1312 push {r0, r1, r11, lr}
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1313 sub r1, r1, #1
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1314 b \type\()_h264_qpel8_mc11
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1315 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1316
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1317 function ff_\type\()_h264_qpel8_mc02_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1318 push {lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1319 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1320 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1321 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1322 vpush {d8-d15}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1323 bl \type\()_h264_qpel8_v_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1324 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1325 pop {pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1326 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1327
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1328 function ff_\type\()_h264_qpel8_mc12_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1329 push {r0, r1, r4, r10, r11, lr}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1330 \type\()_h264_qpel8_mc12:
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1331 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1332 mov r11, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1333 bic sp, sp, #15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1334 sub sp, sp, #(8*8+16*12)
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1335 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1336 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1337 mov r2, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1338 mov r0, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1339 vpush {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1340 bl put_h264_qpel8_v_lowpass_neon
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1341 mov r4, r0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1342 ldrd r0, [r11]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1343 sub r1, r1, r3, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1344 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1345 sub r2, r4, #64
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1346 bl \type\()_h264_qpel8_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1347 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1348 add sp, r11, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1349 pop {r4, r10, r11, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1350 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1351
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1352 function ff_\type\()_h264_qpel8_mc22_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1353 push {r4, r10, r11, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1354 mov r11, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1355 bic sp, sp, #15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1356 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1357 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1358 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1359 sub sp, sp, #(16*12)
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1360 mov r4, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1361 vpush {d8-d15}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1362 bl \type\()_h264_qpel8_hv_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1363 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1364 mov sp, r11
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1365 pop {r4, r10, r11, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1366 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1367
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1368 function ff_\type\()_h264_qpel8_mc32_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1369 push {r0, r1, r4, r10, r11, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1370 add r1, r1, #1
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1371 b \type\()_h264_qpel8_mc12
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1372 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1373
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1374 function ff_\type\()_h264_qpel8_mc03_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1375 push {lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1376 add ip, r1, r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1377 b \type\()_h264_qpel8_mc01
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1378 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1379
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1380 function ff_\type\()_h264_qpel8_mc13_neon, export=1
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1381 push {r0, r1, r11, lr}
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1382 add r1, r1, r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1383 b \type\()_h264_qpel8_mc11
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1384 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1385
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1386 function ff_\type\()_h264_qpel8_mc23_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1387 push {r0, r1, r4, r10, r11, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1388 add r1, r1, r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1389 b \type\()_h264_qpel8_mc21
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1390 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1391
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1392 function ff_\type\()_h264_qpel8_mc33_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1393 add r1, r1, #1
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1394 push {r0, r1, r11, lr}
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1395 add r1, r1, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1396 sub r1, r1, #1
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1397 b \type\()_h264_qpel8_mc11
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1398 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1399 .endm
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1400
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1401 h264_qpel8 put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1402 h264_qpel8 avg
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1403
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1404 .macro h264_qpel16 type
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1405 function ff_\type\()_h264_qpel16_mc10_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1406 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1407 mov r3, r1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1408 sub r1, r1, #2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1409 b \type\()_h264_qpel16_h_lowpass_l2_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1410 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1411
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1412 function ff_\type\()_h264_qpel16_mc20_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1413 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1414 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1415 mov r3, r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1416 b \type\()_h264_qpel16_h_lowpass_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1417 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1418
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1419 function ff_\type\()_h264_qpel16_mc30_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1420 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1421 add r3, r1, #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1422 sub r1, r1, #2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1423 b \type\()_h264_qpel16_h_lowpass_l2_neon
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1424 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1425
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1426 function ff_\type\()_h264_qpel16_mc01_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1427 push {r4, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1428 mov ip, r1
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1429 \type\()_h264_qpel16_mc01:
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1430 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1431 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1432 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1433 vpush {d8-d15}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1434 bl \type\()_h264_qpel16_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1435 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1436 pop {r4, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1437 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1438
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1439 function ff_\type\()_h264_qpel16_mc11_neon, export=1
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1440 push {r0, r1, r4, r11, lr}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1441 \type\()_h264_qpel16_mc11:
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1442 lowpass_const r3
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1443 mov r11, sp
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1444 bic sp, sp, #15
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1445 sub sp, sp, #256
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1446 mov r0, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1447 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1448 mov r3, #16
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1449 vpush {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1450 bl put_h264_qpel16_h_lowpass_neon
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1451 ldrd r0, [r11]
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1452 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1453 add ip, sp, #64
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1454 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1455 mov r2, #16
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1456 bl \type\()_h264_qpel16_v_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1457 vpop {d8-d15}
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1458 add sp, r11, #8
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1459 pop {r4, r11, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1460 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1461
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1462 function ff_\type\()_h264_qpel16_mc21_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1463 push {r0, r1, r4-r5, r9-r11, lr}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1464 \type\()_h264_qpel16_mc21:
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1465 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1466 mov r11, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1467 bic sp, sp, #15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1468 sub sp, sp, #(16*16+16*12)
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1469 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1470 mov r0, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1471 vpush {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1472 bl put_h264_qpel16_h_lowpass_neon_packed
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1473 mov r4, r0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1474 ldrd r0, [r11]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1475 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1476 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1477 mov r3, r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1478 bl \type\()_h264_qpel16_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1479 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1480 add sp, r11, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1481 pop {r4-r5, r9-r11, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1482 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1483
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1484 function ff_\type\()_h264_qpel16_mc31_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1485 add r1, r1, #1
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1486 push {r0, r1, r4, r11, lr}
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1487 sub r1, r1, #1
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1488 b \type\()_h264_qpel16_mc11
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1489 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1490
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1491 function ff_\type\()_h264_qpel16_mc02_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1492 push {r4, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1493 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1494 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1495 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1496 vpush {d8-d15}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1497 bl \type\()_h264_qpel16_v_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1498 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1499 pop {r4, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1500 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1501
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1502 function ff_\type\()_h264_qpel16_mc12_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1503 push {r0, r1, r4-r5, r9-r11, lr}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1504 \type\()_h264_qpel16_mc12:
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1505 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1506 mov r11, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1507 bic sp, sp, #15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1508 sub sp, sp, #(16*16+16*12)
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1509 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1510 mov r0, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1511 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1512 vpush {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1513 bl put_h264_qpel16_v_lowpass_neon_packed
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1514 mov r4, r0
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1515 ldrd r0, [r11]
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1516 sub r1, r1, r3, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1517 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1518 mov r2, r3
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1519 bl \type\()_h264_qpel16_hv_lowpass_l2_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1520 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1521 add sp, r11, #8
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1522 pop {r4-r5, r9-r11, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1523 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1524
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1525 function ff_\type\()_h264_qpel16_mc22_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1526 push {r4, r9-r11, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1527 lowpass_const r3
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1528 mov r11, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1529 bic sp, sp, #15
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1530 sub r1, r1, r2, lsl #1
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1531 sub r1, r1, #2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1532 mov r3, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1533 sub sp, sp, #(16*12)
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1534 mov r4, sp
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1535 vpush {d8-d15}
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1536 bl \type\()_h264_qpel16_hv_lowpass_neon
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1537 vpop {d8-d15}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1538 mov sp, r11
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1539 pop {r4, r9-r11, pc}
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1540 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1541
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1542 function ff_\type\()_h264_qpel16_mc32_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1543 push {r0, r1, r4-r5, r9-r11, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1544 add r1, r1, #1
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1545 b \type\()_h264_qpel16_mc12
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1546 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1547
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1548 function ff_\type\()_h264_qpel16_mc03_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1549 push {r4, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1550 add ip, r1, r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1551 b \type\()_h264_qpel16_mc01
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1552 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1553
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1554 function ff_\type\()_h264_qpel16_mc13_neon, export=1
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1555 push {r0, r1, r4, r11, lr}
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1556 add r1, r1, r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1557 b \type\()_h264_qpel16_mc11
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1558 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1559
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1560 function ff_\type\()_h264_qpel16_mc23_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1561 push {r0, r1, r4-r5, r9-r11, lr}
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1562 add r1, r1, r2
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1563 b \type\()_h264_qpel16_mc21
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1564 endfunc
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1565
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1566 function ff_\type\()_h264_qpel16_mc33_neon, export=1
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1567 add r1, r1, #1
10385
bc98e5724513 ARM: align stack in NEON h264 mc functions
mru
parents: 10349
diff changeset
1568 push {r0, r1, r4, r11, lr}
8338
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1569 add r1, r1, r2
b294a0d5bc50 ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
mru
parents: 8337
diff changeset
1570 sub r1, r1, #1
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1571 b \type\()_h264_qpel16_mc11
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1572 endfunc
10616
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1573 .endm
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1574
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1575 h264_qpel16 put
d3b98479ef62 ARM: NEON 16x16 and 8x8 avg qpel MC
mru
parents: 10385
diff changeset
1576 h264_qpel16 avg
8663
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1577
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1578 @ Biweighted prediction
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1579
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1580 .macro biweight_16 macs, macd
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1581 vdup.8 d0, r4
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1582 vdup.8 d1, r5
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1583 vmov q2, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1584 vmov q3, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1585 1: subs ip, ip, #2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1586 vld1.8 {d20-d21},[r0,:128], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1587 \macd q2, d0, d20
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1588 pld [r0]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1589 \macd q3, d0, d21
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1590 vld1.8 {d22-d23},[r1,:128], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1591 \macs q2, d1, d22
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1592 pld [r1]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1593 \macs q3, d1, d23
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1594 vmov q12, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1595 vld1.8 {d28-d29},[r0,:128], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1596 vmov q13, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1597 \macd q12, d0, d28
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1598 pld [r0]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1599 \macd q13, d0, d29
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1600 vld1.8 {d30-d31},[r1,:128], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1601 \macs q12, d1, d30
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1602 pld [r1]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1603 \macs q13, d1, d31
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1604 vshl.s16 q2, q2, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1605 vshl.s16 q3, q3, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1606 vqmovun.s16 d4, q2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1607 vqmovun.s16 d5, q3
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1608 vshl.s16 q12, q12, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1609 vshl.s16 q13, q13, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1610 vqmovun.s16 d24, q12
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1611 vqmovun.s16 d25, q13
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1612 vmov q3, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1613 vst1.8 {d4- d5}, [r6,:128], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1614 vmov q2, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1615 vst1.8 {d24-d25},[r6,:128], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1616 bne 1b
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1617 pop {r4-r6, pc}
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1618 .endm
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1619
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1620 .macro biweight_8 macs, macd
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1621 vdup.8 d0, r4
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1622 vdup.8 d1, r5
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1623 vmov q1, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1624 vmov q10, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1625 1: subs ip, ip, #2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1626 vld1.8 {d4},[r0,:64], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1627 \macd q1, d0, d4
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1628 pld [r0]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1629 vld1.8 {d5},[r1,:64], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1630 \macs q1, d1, d5
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1631 pld [r1]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1632 vld1.8 {d6},[r0,:64], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1633 \macd q10, d0, d6
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1634 pld [r0]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1635 vld1.8 {d7},[r1,:64], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1636 \macs q10, d1, d7
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1637 pld [r1]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1638 vshl.s16 q1, q1, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1639 vqmovun.s16 d2, q1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1640 vshl.s16 q10, q10, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1641 vqmovun.s16 d4, q10
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1642 vmov q10, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1643 vst1.8 {d2},[r6,:64], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1644 vmov q1, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1645 vst1.8 {d4},[r6,:64], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1646 bne 1b
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1647 pop {r4-r6, pc}
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1648 .endm
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1649
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1650 .macro biweight_4 macs, macd
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1651 vdup.8 d0, r4
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1652 vdup.8 d1, r5
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1653 vmov q1, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1654 vmov q10, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1655 1: subs ip, ip, #4
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1656 vld1.32 {d4[0]},[r0,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1657 vld1.32 {d4[1]},[r0,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1658 \macd q1, d0, d4
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1659 pld [r0]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1660 vld1.32 {d5[0]},[r1,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1661 vld1.32 {d5[1]},[r1,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1662 \macs q1, d1, d5
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1663 pld [r1]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1664 blt 2f
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1665 vld1.32 {d6[0]},[r0,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1666 vld1.32 {d6[1]},[r0,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1667 \macd q10, d0, d6
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1668 pld [r0]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1669 vld1.32 {d7[0]},[r1,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1670 vld1.32 {d7[1]},[r1,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1671 \macs q10, d1, d7
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1672 pld [r1]
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1673 vshl.s16 q1, q1, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1674 vqmovun.s16 d2, q1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1675 vshl.s16 q10, q10, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1676 vqmovun.s16 d4, q10
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1677 vmov q10, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1678 vst1.32 {d2[0]},[r6,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1679 vst1.32 {d2[1]},[r6,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1680 vmov q1, q8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1681 vst1.32 {d4[0]},[r6,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1682 vst1.32 {d4[1]},[r6,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1683 bne 1b
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1684 pop {r4-r6, pc}
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1685 2: vshl.s16 q1, q1, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1686 vqmovun.s16 d2, q1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1687 vst1.32 {d2[0]},[r6,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1688 vst1.32 {d2[1]},[r6,:32], r2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1689 pop {r4-r6, pc}
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1690 .endm
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1691
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1692 .macro biweight_func w
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1693 function biweight_h264_pixels_\w\()_neon
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1694 push {r4-r6, lr}
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1695 add r4, sp, #16
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1696 ldm r4, {r4-r6}
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1697 lsr lr, r4, #31
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1698 add r6, r6, #1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1699 eors lr, lr, r5, lsr #30
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1700 orr r6, r6, #1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1701 vdup.16 q9, r3
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1702 lsl r6, r6, r3
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1703 vmvn q9, q9
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1704 vdup.16 q8, r6
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1705 mov r6, r0
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1706 beq 10f
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1707 subs lr, lr, #1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1708 beq 20f
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1709 subs lr, lr, #1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1710 beq 30f
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1711 b 40f
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1712 10: biweight_\w vmlal.u8, vmlal.u8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1713 20: rsb r4, r4, #0
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1714 biweight_\w vmlal.u8, vmlsl.u8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1715 30: rsb r4, r4, #0
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1716 rsb r5, r5, #0
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1717 biweight_\w vmlsl.u8, vmlsl.u8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1718 40: rsb r5, r5, #0
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1719 biweight_\w vmlsl.u8, vmlal.u8
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1720 endfunc
8663
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1721 .endm
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1722
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1723 .macro biweight_entry w, h, b=1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1724 function ff_biweight_h264_pixels_\w\()x\h\()_neon, export=1
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1725 mov ip, #\h
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1726 .if \b
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1727 b biweight_h264_pixels_\w\()_neon
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1728 .endif
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1729 endfunc
8663
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1730 .endm
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1731
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1732 biweight_entry 16, 8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1733 biweight_entry 16, 16, b=0
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1734 biweight_func 16
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1735
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1736 biweight_entry 8, 16
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1737 biweight_entry 8, 4
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1738 biweight_entry 8, 8, b=0
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1739 biweight_func 8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1740
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1741 biweight_entry 4, 8
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1742 biweight_entry 4, 2
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1743 biweight_entry 4, 4, b=0
23f7711e777e ARM: NEON optimised H.264 biweighted prediction
mru
parents: 8626
diff changeset
1744 biweight_func 4
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1745
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1746 @ Weighted prediction
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1747
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1748 .macro weight_16 add
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1749 vdup.8 d0, r3
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1750 1: subs ip, ip, #2
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1751 vld1.8 {d20-d21},[r0,:128], r1
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1752 vmull.u8 q2, d0, d20
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1753 pld [r0]
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1754 vmull.u8 q3, d0, d21
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1755 vld1.8 {d28-d29},[r0,:128], r1
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1756 vmull.u8 q12, d0, d28
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1757 pld [r0]
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1758 vmull.u8 q13, d0, d29
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1759 \add q2, q8, q2
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1760 vrshl.s16 q2, q2, q9
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1761 \add q3, q8, q3
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1762 vrshl.s16 q3, q3, q9
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1763 vqmovun.s16 d4, q2
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1764 vqmovun.s16 d5, q3
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1765 \add q12, q8, q12
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1766 vrshl.s16 q12, q12, q9
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1767 \add q13, q8, q13
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1768 vrshl.s16 q13, q13, q9
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1769 vqmovun.s16 d24, q12
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1770 vqmovun.s16 d25, q13
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1771 vst1.8 {d4- d5}, [r4,:128], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1772 vst1.8 {d24-d25},[r4,:128], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1773 bne 1b
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1774 pop {r4, pc}
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1775 .endm
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1776
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1777 .macro weight_8 add
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1778 vdup.8 d0, r3
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1779 1: subs ip, ip, #2
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1780 vld1.8 {d4},[r0,:64], r1
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1781 vmull.u8 q1, d0, d4
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1782 pld [r0]
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1783 vld1.8 {d6},[r0,:64], r1
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1784 vmull.u8 q10, d0, d6
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1785 \add q1, q8, q1
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1786 pld [r0]
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1787 vrshl.s16 q1, q1, q9
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1788 vqmovun.s16 d2, q1
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1789 \add q10, q8, q10
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1790 vrshl.s16 q10, q10, q9
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1791 vqmovun.s16 d4, q10
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1792 vst1.8 {d2},[r4,:64], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1793 vst1.8 {d4},[r4,:64], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1794 bne 1b
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1795 pop {r4, pc}
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1796 .endm
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1797
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1798 .macro weight_4 add
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1799 vdup.8 d0, r3
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1800 vmov q1, q8
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1801 vmov q10, q8
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1802 1: subs ip, ip, #4
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1803 vld1.32 {d4[0]},[r0,:32], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1804 vld1.32 {d4[1]},[r0,:32], r1
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1805 vmull.u8 q1, d0, d4
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1806 pld [r0]
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1807 blt 2f
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1808 vld1.32 {d6[0]},[r0,:32], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1809 vld1.32 {d6[1]},[r0,:32], r1
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1810 vmull.u8 q10, d0, d6
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1811 pld [r0]
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1812 \add q1, q8, q1
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1813 vrshl.s16 q1, q1, q9
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1814 vqmovun.s16 d2, q1
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1815 \add q10, q8, q10
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1816 vrshl.s16 q10, q10, q9
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1817 vqmovun.s16 d4, q10
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1818 vmov q10, q8
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1819 vst1.32 {d2[0]},[r4,:32], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1820 vst1.32 {d2[1]},[r4,:32], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1821 vmov q1, q8
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1822 vst1.32 {d4[0]},[r4,:32], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1823 vst1.32 {d4[1]},[r4,:32], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1824 bne 1b
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1825 pop {r4, pc}
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1826 2: \add q1, q8, q1
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1827 vrshl.s16 q1, q1, q9
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1828 vqmovun.s16 d2, q1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1829 vst1.32 {d2[0]},[r4,:32], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1830 vst1.32 {d2[1]},[r4,:32], r1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1831 pop {r4, pc}
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1832 .endm
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1833
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1834 .macro weight_func w
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1835 function weight_h264_pixels_\w\()_neon
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1836 push {r4, lr}
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1837 ldr r4, [sp, #8]
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1838 cmp r2, #1
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1839 lsl r4, r4, r2
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1840 vdup.16 q8, r4
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1841 mov r4, r0
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1842 ble 20f
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1843 rsb lr, r2, #1
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1844 vdup.16 q9, lr
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1845 cmp r3, #0
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1846 blt 10f
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1847 weight_\w vhadd.s16
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1848 10: rsb r3, r3, #0
9072
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1849 weight_\w vhsub.s16
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1850 20: rsb lr, r2, #0
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1851 vdup.16 q9, lr
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1852 cmp r3, #0
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1853 blt 10f
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1854 weight_\w vadd.s16
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1855 10: rsb r3, r3, #0
d56b711c6c5d ARM: fix corner-case overflow in H.264 weighted prediction
mru
parents: 8664
diff changeset
1856 weight_\w vsub.s16
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1857 endfunc
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1858 .endm
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1859
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1860 .macro weight_entry w, h, b=1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1861 function ff_weight_h264_pixels_\w\()x\h\()_neon, export=1
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1862 mov ip, #\h
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1863 .if \b
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1864 b weight_h264_pixels_\w\()_neon
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1865 .endif
11443
361a5fcb4393 ARM: set size of asm functions in object files
mru
parents: 10617
diff changeset
1866 endfunc
8664
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1867 .endm
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1868
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1869 weight_entry 16, 8
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1870 weight_entry 16, 16, b=0
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1871 weight_func 16
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1872
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1873 weight_entry 8, 16
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1874 weight_entry 8, 4
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1875 weight_entry 8, 8, b=0
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1876 weight_func 8
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1877
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1878 weight_entry 4, 8
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1879 weight_entry 4, 2
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1880 weight_entry 4, 4, b=0
882c351e69c2 ARM: NEON optimised H.264 weighted prediction
mru
parents: 8663
diff changeset
1881 weight_func 4