Mercurial > mplayer.hg
comparison libswscale/yuv2rgb_altivec.c @ 27156:23f1738030fc
whitespace cosmetics
author | diego |
---|---|
date | Fri, 04 Jul 2008 13:14:29 +0000 |
parents | d5d7ecbbf0d8 |
children | 65b8334df960 |
comparison
equal
deleted
inserted
replaced
27155:d5d7ecbbf0d8 | 27156:23f1738030fc |
---|---|
19 * along with FFmpeg; if not, write to the Free Software | 19 * along with FFmpeg; if not, write to the Free Software |
20 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | 20 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
21 */ | 21 */ |
22 | 22 |
23 /* | 23 /* |
24 convert I420 YV12 to RGB in various formats, | 24 convert I420 YV12 to RGB in various formats, |
25 it rejects images that are not in 420 formats | 25 it rejects images that are not in 420 formats |
26 it rejects images that don't have widths of multiples of 16 | 26 it rejects images that don't have widths of multiples of 16 |
27 it rejects images that don't have heights of multiples of 2 | 27 it rejects images that don't have heights of multiples of 2 |
28 reject defers to C simulation codes. | 28 reject defers to C simulation codes. |
29 | 29 |
30 lots of optimizations to be done here | 30 lots of optimizations to be done here |
31 | 31 |
32 1. need to fix saturation code, I just couldn't get it to fly with packs and adds. | 32 1. need to fix saturation code, I just couldn't get it to fly with packs and adds. |
33 so we currently use max min to clip | 33 so we currently use max min to clip |
34 | 34 |
35 2. the inefficient use of chroma loading needs a bit of brushing up | 35 2. the inefficient use of chroma loading needs a bit of brushing up |
36 | 36 |
37 3. analysis of pipeline stalls needs to be done, use shark to identify pipeline stalls | 37 3. analysis of pipeline stalls needs to be done, use shark to identify pipeline stalls |
38 | 38 |
39 | 39 |
40 MODIFIED to calculate coeffs from currently selected color space. | 40 MODIFIED to calculate coeffs from currently selected color space. |
41 MODIFIED core to be a macro which you spec the output format. | 41 MODIFIED core to be a macro which you spec the output format. |
42 ADDED UYVY conversion which is never called due to some thing in SWSCALE. | 42 ADDED UYVY conversion which is never called due to some thing in SWSCALE. |
43 CORRECTED algorithim selection to be strict on input formats. | 43 CORRECTED algorithim selection to be strict on input formats. |
44 ADDED runtime detection of altivec. | 44 ADDED runtime detection of altivec. |
45 | 45 |
46 ADDED altivec_yuv2packedX vertical scl + RGB converter | 46 ADDED altivec_yuv2packedX vertical scl + RGB converter |
47 | 47 |
48 March 27,2004 | 48 March 27,2004 |
49 PERFORMANCE ANALYSIS | 49 PERFORMANCE ANALYSIS |
50 | 50 |
51 The C version use 25% of the processor or ~250Mips for D1 video rawvideo used as test | 51 The C version use 25% of the processor or ~250Mips for D1 video rawvideo used as test |
52 The ALTIVEC version uses 10% of the processor or ~100Mips for D1 video same sequence | 52 The ALTIVEC version uses 10% of the processor or ~100Mips for D1 video same sequence |
53 | 53 |
54 720*480*30 ~10MPS | 54 720*480*30 ~10MPS |
55 | 55 |
56 so we have roughly 10clocks per pixel this is too high something has to be wrong. | 56 so we have roughly 10clocks per pixel this is too high something has to be wrong. |
57 | 57 |
58 OPTIMIZED clip codes to utilize vec_max and vec_packs removing the need for vec_min. | 58 OPTIMIZED clip codes to utilize vec_max and vec_packs removing the need for vec_min. |
59 | 59 |
60 OPTIMIZED DST OUTPUT cache/dma controls. we are pretty much | 60 OPTIMIZED DST OUTPUT cache/dma controls. we are pretty much |
61 guaranteed to have the input video frame it was just decompressed so | 61 guaranteed to have the input video frame it was just decompressed so |
62 it probably resides in L1 caches. However we are creating the | 62 it probably resides in L1 caches. However we are creating the |
63 output video stream this needs to use the DSTST instruction to | 63 output video stream this needs to use the DSTST instruction to |
64 optimize for the cache. We couple this with the fact that we are | 64 optimize for the cache. We couple this with the fact that we are |
65 not going to be visiting the input buffer again so we mark it Least | 65 not going to be visiting the input buffer again so we mark it Least |
66 Recently Used. This shaves 25% of the processor cycles off. | 66 Recently Used. This shaves 25% of the processor cycles off. |
67 | 67 |
68 Now MEMCPY is the largest mips consumer in the system, probably due | 68 Now MEMCPY is the largest mips consumer in the system, probably due |
69 to the inefficient X11 stuff. | 69 to the inefficient X11 stuff. |
70 | 70 |
71 GL libraries seem to be very slow on this machine 1.33Ghz PB running | 71 GL libraries seem to be very slow on this machine 1.33Ghz PB running |
72 Jaguar, this is not the case for my 1Ghz PB. I thought it might be | 72 Jaguar, this is not the case for my 1Ghz PB. I thought it might be |
73 a versioning issues, however I have libGL.1.2.dylib for both | 73 a versioning issues, however I have libGL.1.2.dylib for both |
74 machines. ((We need to figure this out now)) | 74 machines. ((We need to figure this out now)) |
75 | 75 |
76 GL2 libraries work now with patch for RGB32 | 76 GL2 libraries work now with patch for RGB32 |
77 | 77 |
78 NOTE quartz vo driver ARGB32_to_RGB24 consumes 30% of the processor | 78 NOTE quartz vo driver ARGB32_to_RGB24 consumes 30% of the processor |
79 | 79 |
80 Integrated luma prescaling adjustment for saturation/contrast/brightness adjustment. | 80 Integrated luma prescaling adjustment for saturation/contrast/brightness adjustment. |
81 */ | 81 */ |
82 | 82 |
83 #include <stdio.h> | 83 #include <stdio.h> |
84 #include <stdlib.h> | 84 #include <stdlib.h> |
85 #include <string.h> | 85 #include <string.h> |