Mercurial > mplayer.hg
comparison libswscale/yuv2rgb_altivec.c @ 27155:d5d7ecbbf0d8
Place license header at the top of the file for consistency.
author | diego |
---|---|
date | Fri, 04 Jul 2008 13:12:47 +0000 |
parents | e69f41b2c8c0 |
children | 23f1738030fc |
comparison
equal
deleted
inserted
replaced
27154:01526c8e2d75 | 27155:d5d7ecbbf0d8 |
---|---|
1 /* | 1 /* |
2 marc.hoffman@analog.com March 8, 2004 | 2 * AltiVec acceleration for colorspace conversion |
3 | 3 * |
4 AltiVec acceleration for colorspace conversion revision 0.2 | 4 * copyright (C) 2004 Marc Hoffman <marc.hoffman@analog.com> |
5 | 5 * |
6 convert I420 YV12 to RGB in various formats, | |
7 it rejects images that are not in 420 formats | |
8 it rejects images that don't have widths of multiples of 16 | |
9 it rejects images that don't have heights of multiples of 2 | |
10 reject defers to C simulation codes. | |
11 | |
12 lots of optimizations to be done here | |
13 | |
14 1. need to fix saturation code, I just couldn't get it to fly with packs and adds. | |
15 so we currently use max min to clip | |
16 | |
17 2. the inefficient use of chroma loading needs a bit of brushing up | |
18 | |
19 3. analysis of pipeline stalls needs to be done, use shark to identify pipeline stalls | |
20 | |
21 | |
22 MODIFIED to calculate coeffs from currently selected color space. | |
23 MODIFIED core to be a macro which you spec the output format. | |
24 ADDED UYVY conversion which is never called due to some thing in SWSCALE. | |
25 CORRECTED algorithim selection to be strict on input formats. | |
26 ADDED runtime detection of altivec. | |
27 | |
28 ADDED altivec_yuv2packedX vertical scl + RGB converter | |
29 | |
30 March 27,2004 | |
31 PERFORMANCE ANALYSIS | |
32 | |
33 The C version use 25% of the processor or ~250Mips for D1 video rawvideo used as test | |
34 The ALTIVEC version uses 10% of the processor or ~100Mips for D1 video same sequence | |
35 | |
36 720*480*30 ~10MPS | |
37 | |
38 so we have roughly 10clocks per pixel this is too high something has to be wrong. | |
39 | |
40 OPTIMIZED clip codes to utilize vec_max and vec_packs removing the need for vec_min. | |
41 | |
42 OPTIMIZED DST OUTPUT cache/dma controls. we are pretty much | |
43 guaranteed to have the input video frame it was just decompressed so | |
44 it probably resides in L1 caches. However we are creating the | |
45 output video stream this needs to use the DSTST instruction to | |
46 optimize for the cache. We couple this with the fact that we are | |
47 not going to be visiting the input buffer again so we mark it Least | |
48 Recently Used. This shaves 25% of the processor cycles off. | |
49 | |
50 Now MEMCPY is the largest mips consumer in the system, probably due | |
51 to the inefficient X11 stuff. | |
52 | |
53 GL libraries seem to be very slow on this machine 1.33Ghz PB running | |
54 Jaguar, this is not the case for my 1Ghz PB. I thought it might be | |
55 a versioning issues, however I have libGL.1.2.dylib for both | |
56 machines. ((We need to figure this out now)) | |
57 | |
58 GL2 libraries work now with patch for RGB32 | |
59 | |
60 NOTE quartz vo driver ARGB32_to_RGB24 consumes 30% of the processor | |
61 | |
62 Integrated luma prescaling adjustment for saturation/contrast/brightness adjustment. | |
63 */ | |
64 | |
65 /* | |
66 * This file is part of FFmpeg. | 6 * This file is part of FFmpeg. |
67 * | 7 * |
68 * FFmpeg is free software; you can redistribute it and/or modify | 8 * FFmpeg is free software; you can redistribute it and/or modify |
69 * it under the terms of the GNU General Public License as published by | 9 * it under the terms of the GNU General Public License as published by |
70 * the Free Software Foundation; either version 2 of the License, or | 10 * the Free Software Foundation; either version 2 of the License, or |
77 * | 17 * |
78 * You should have received a copy of the GNU General Public License | 18 * You should have received a copy of the GNU General Public License |
79 * along with FFmpeg; if not, write to the Free Software | 19 * along with FFmpeg; if not, write to the Free Software |
80 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | 20 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
81 */ | 21 */ |
22 | |
23 /* | |
24 convert I420 YV12 to RGB in various formats, | |
25 it rejects images that are not in 420 formats | |
26 it rejects images that don't have widths of multiples of 16 | |
27 it rejects images that don't have heights of multiples of 2 | |
28 reject defers to C simulation codes. | |
29 | |
30 lots of optimizations to be done here | |
31 | |
32 1. need to fix saturation code, I just couldn't get it to fly with packs and adds. | |
33 so we currently use max min to clip | |
34 | |
35 2. the inefficient use of chroma loading needs a bit of brushing up | |
36 | |
37 3. analysis of pipeline stalls needs to be done, use shark to identify pipeline stalls | |
38 | |
39 | |
40 MODIFIED to calculate coeffs from currently selected color space. | |
41 MODIFIED core to be a macro which you spec the output format. | |
42 ADDED UYVY conversion which is never called due to some thing in SWSCALE. | |
43 CORRECTED algorithim selection to be strict on input formats. | |
44 ADDED runtime detection of altivec. | |
45 | |
46 ADDED altivec_yuv2packedX vertical scl + RGB converter | |
47 | |
48 March 27,2004 | |
49 PERFORMANCE ANALYSIS | |
50 | |
51 The C version use 25% of the processor or ~250Mips for D1 video rawvideo used as test | |
52 The ALTIVEC version uses 10% of the processor or ~100Mips for D1 video same sequence | |
53 | |
54 720*480*30 ~10MPS | |
55 | |
56 so we have roughly 10clocks per pixel this is too high something has to be wrong. | |
57 | |
58 OPTIMIZED clip codes to utilize vec_max and vec_packs removing the need for vec_min. | |
59 | |
60 OPTIMIZED DST OUTPUT cache/dma controls. we are pretty much | |
61 guaranteed to have the input video frame it was just decompressed so | |
62 it probably resides in L1 caches. However we are creating the | |
63 output video stream this needs to use the DSTST instruction to | |
64 optimize for the cache. We couple this with the fact that we are | |
65 not going to be visiting the input buffer again so we mark it Least | |
66 Recently Used. This shaves 25% of the processor cycles off. | |
67 | |
68 Now MEMCPY is the largest mips consumer in the system, probably due | |
69 to the inefficient X11 stuff. | |
70 | |
71 GL libraries seem to be very slow on this machine 1.33Ghz PB running | |
72 Jaguar, this is not the case for my 1Ghz PB. I thought it might be | |
73 a versioning issues, however I have libGL.1.2.dylib for both | |
74 machines. ((We need to figure this out now)) | |
75 | |
76 GL2 libraries work now with patch for RGB32 | |
77 | |
78 NOTE quartz vo driver ARGB32_to_RGB24 consumes 30% of the processor | |
79 | |
80 Integrated luma prescaling adjustment for saturation/contrast/brightness adjustment. | |
81 */ | |
82 | 82 |
83 #include <stdio.h> | 83 #include <stdio.h> |
84 #include <stdlib.h> | 84 #include <stdlib.h> |
85 #include <string.h> | 85 #include <string.h> |
86 #include <inttypes.h> | 86 #include <inttypes.h> |