Mercurial > mplayer.hg
comparison libswscale/internal_bfin.S @ 27156:23f1738030fc
whitespace cosmetics
author | diego |
---|---|
date | Fri, 04 Jul 2008 13:14:29 +0000 |
parents | a8ff60976ccb |
children | 65b8334df960 |
comparison
equal
deleted
inserted
replaced
27155:d5d7ecbbf0d8 | 27156:23f1738030fc |
---|---|
22 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | 22 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
23 */ | 23 */ |
24 | 24 |
25 | 25 |
26 /* | 26 /* |
27 YUV420 to RGB565 conversion. This routine takes a YUV 420 planar macroblock | 27 YUV420 to RGB565 conversion. This routine takes a YUV 420 planar macroblock |
28 and converts it to RGB565. R:5 bits, G:6 bits, B:5 bits.. packed into shorts | 28 and converts it to RGB565. R:5 bits, G:6 bits, B:5 bits.. packed into shorts |
29 | 29 |
30 | 30 |
31 The following calculation is used for the conversion: | 31 The following calculation is used for the conversion: |
32 | 32 |
33 r = clipz((y-oy)*cy + crv*(v-128)) | 33 r = clipz((y-oy)*cy + crv*(v-128)) |
34 g = clipz((y-oy)*cy + cgv*(v-128) + cgu*(u-128)) | 34 g = clipz((y-oy)*cy + cgv*(v-128) + cgu*(u-128)) |
35 b = clipz((y-oy)*cy + cbu*(u-128)) | 35 b = clipz((y-oy)*cy + cbu*(u-128)) |
36 | 36 |
37 y,u,v are pre scaled by a factor of 4 i.e. left shifted to gain precision. | 37 y,u,v are pre scaled by a factor of 4 i.e. left shifted to gain precision. |
38 | 38 |
39 | 39 |
40 New factorization to eliminate the truncation error which was | 40 New factorization to eliminate the truncation error which was |
41 occuring due to the byteop3p. | 41 occuring due to the byteop3p. |
42 | 42 |
43 | 43 |
44 1) use the bytop16m to subtract quad bytes we use this in U8 this | 44 1) use the bytop16m to subtract quad bytes we use this in U8 this |
45 then so the offsets need to be renormalized to 8bits. | 45 then so the offsets need to be renormalized to 8bits. |
46 | 46 |
47 2) scale operands up by a factor of 4 not 8 because Blackfin | 47 2) scale operands up by a factor of 4 not 8 because Blackfin |
48 multiplies include a shift. | 48 multiplies include a shift. |
49 | 49 |
50 3) compute into the accumulators cy*yx0, cy*yx1 | 50 3) compute into the accumulators cy*yx0, cy*yx1 |
51 | 51 |
52 4) compute each of the linear equations | 52 4) compute each of the linear equations |
53 r = clipz((y-oy)*cy + crv*(v-128)) | 53 r = clipz((y - oy) * cy + crv * (v - 128)) |
54 | 54 |
55 g = clipz((y-oy)*cy + cgv*(v-128) + cgu*(u-128)) | 55 g = clipz((y - oy) * cy + cgv * (v - 128) + cgu * (u - 128)) |
56 | 56 |
57 b = clipz((y-oy)*cy + cbu*(u-128)) | 57 b = clipz((y - oy) * cy + cbu * (u - 128)) |
58 | 58 |
59 reuse of the accumulators requires that we actually multiply | 59 reuse of the accumulators requires that we actually multiply |
60 twice once with addition and the second time with a subtaction. | 60 twice once with addition and the second time with a subtaction. |
61 | 61 |
62 because of this we need to compute the equations in the order R B | 62 because of this we need to compute the equations in the order R B |
63 then G saving the writes for B in the case of 24/32 bit color | 63 then G saving the writes for B in the case of 24/32 bit color |
64 formats. | 64 formats. |
65 | 65 |
66 api: yuv2rgb_kind (uint8_t *Y, uint8_t *U, uint8_t *V, int *out, | 66 api: yuv2rgb_kind (uint8_t *Y, uint8_t *U, uint8_t *V, int *out, |
67 int dW, uint32_t *coeffs); | 67 int dW, uint32_t *coeffs); |
68 | 68 |
69 A B | 69 A B |
70 --- --- | 70 --- --- |
71 i2 = cb i3 = cr | 71 i2 = cb i3 = cr |
72 i1 = coeff i0 = y | 72 i1 = coeff i0 = y |
73 | 73 |
74 Where coeffs have the following layout in memory. | 74 Where coeffs have the following layout in memory. |
75 | 75 |
76 uint32_t oy,oc,zero,cy,crv,rmask,cbu,bmask,cgu,cgv; | 76 uint32_t oy,oc,zero,cy,crv,rmask,cbu,bmask,cgu,cgv; |
77 | 77 |
78 coeffs is a pointer to oy. | 78 coeffs is a pointer to oy. |
79 | 79 |
80 the {rgb} masks are only utilized by the 565 packing algorithm. Note the data | 80 the {rgb} masks are only utilized by the 565 packing algorithm. Note the data |
81 replication is used to simplify the internal algorithms for the dual mac architecture | 81 replication is used to simplify the internal algorithms for the dual mac architecture |
82 of BlackFin. | 82 of BlackFin. |
83 | 83 |
84 All routines are exported with _ff_bfin_ as a symbol prefix | 84 All routines are exported with _ff_bfin_ as a symbol prefix |
85 | 85 |
86 rough performance gain compared against -O3: | 86 rough performance gain compared against -O3: |
87 | 87 |
88 2779809/1484290 187.28% | 88 2779809/1484290 187.28% |
89 | 89 |
90 which translates to ~33c/pel to ~57c/pel for the reference vs 17.5 | 90 which translates to ~33c/pel to ~57c/pel for the reference vs 17.5 |
91 c/pel for the optimized implementations. Not sure why there is such a | 91 c/pel for the optimized implementations. Not sure why there is such a |
92 huge variation on the reference codes on Blackfin I guess it must have | 92 huge variation on the reference codes on Blackfin I guess it must have |
93 to do with the memory system. | 93 to do with the memory system. |
94 | |
95 */ | 94 */ |
96 | 95 |
97 #define mL3 .text | 96 #define mL3 .text |
98 #ifdef __FDPIC__ | 97 #ifdef __FDPIC__ |
99 #define mL1 .l1.text | 98 #define mL1 .l1.text |