mplayer.hg: libswscale/yuv2rgb_altivec.c comparison

comparison libswscale/yuv2rgb_altivec.c @ 27155:d5d7ecbbf0d8

Place license header at the top of the file for consistency.

author	diego
date	Fri, 04 Jul 2008 13:12:47 +0000
parents	e69f41b2c8c0
children	23f1738030fc

comparison

equal deleted inserted replaced

-:01526c8e2d75
+:d5d7ecbbf0d8
 /*
-marc.hoffman@analog.com    March 8, 2004
+* AltiVec acceleration for colorspace conversion
+*
-AltiVec acceleration for colorspace conversion revision 0.2
+* copyright (C) 2004 Marc Hoffman <marc.hoffman@analog.com>
+*
-convert I420 YV12 to RGB in various formats,
-it rejects images that are not in 420 formats
-it rejects images that don't have widths of multiples of 16
-it rejects images that don't have heights of multiples of 2
-reject defers to C simulation codes.
-lots of optimizations to be done here
-1. need to fix saturation code, I just couldn't get it to fly with packs and adds.
-so we currently use max min to clip
-2. the inefficient use of chroma loading needs a bit of brushing up
-3. analysis of pipeline stalls needs to be done, use shark to identify pipeline stalls
-MODIFIED to calculate coeffs from currently selected color space.
-MODIFIED core to be a macro which you spec the output format.
-ADDED UYVY conversion which is never called due to some thing in SWSCALE.
-CORRECTED algorithim selection to be strict on input formats.
-ADDED runtime detection of altivec.
-ADDED altivec_yuv2packedX vertical scl + RGB converter
-March 27,2004
-PERFORMANCE ANALYSIS
-The C version use 25% of the processor or ~250Mips for D1 video rawvideo used as test
-The ALTIVEC version uses 10% of the processor or ~100Mips for D1 video same sequence
-720*480*30  ~10MPS
-so we have roughly 10clocks per pixel this is too high something has to be wrong.
-OPTIMIZED clip codes to utilize vec_max and vec_packs removing the need for vec_min.
-OPTIMIZED DST OUTPUT cache/dma controls. we are pretty much
-guaranteed to have the input video frame it was just decompressed so
-it probably resides in L1 caches.  However we are creating the
-output video stream this needs to use the DSTST instruction to
-optimize for the cache.  We couple this with the fact that we are
-not going to be visiting the input buffer again so we mark it Least
-Recently Used.  This shaves 25% of the processor cycles off.
-Now MEMCPY is the largest mips consumer in the system, probably due
-to the inefficient X11 stuff.
-GL libraries seem to be very slow on this machine 1.33Ghz PB running
-Jaguar, this is not the case for my 1Ghz PB.  I thought it might be
-a versioning issues, however I have libGL.1.2.dylib for both
-machines. ((We need to figure this out now))
-GL2 libraries work now with patch for RGB32
-NOTE quartz vo driver ARGB32_to_RGB24 consumes 30% of the processor
-Integrated luma prescaling adjustment for saturation/contrast/brightness adjustment.
-*/
-/*
 * This file is part of FFmpeg.
 *
 * FFmpeg is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 *
 * You should have received a copy of the GNU General Public License
 * along with FFmpeg; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 */
+/*
+convert I420 YV12 to RGB in various formats,
+it rejects images that are not in 420 formats
+it rejects images that don't have widths of multiples of 16
+it rejects images that don't have heights of multiples of 2
+reject defers to C simulation codes.
+lots of optimizations to be done here
+1. need to fix saturation code, I just couldn't get it to fly with packs and adds.
+so we currently use max min to clip
+2. the inefficient use of chroma loading needs a bit of brushing up
+3. analysis of pipeline stalls needs to be done, use shark to identify pipeline stalls
+MODIFIED to calculate coeffs from currently selected color space.
+MODIFIED core to be a macro which you spec the output format.
+ADDED UYVY conversion which is never called due to some thing in SWSCALE.
+CORRECTED algorithim selection to be strict on input formats.
+ADDED runtime detection of altivec.
+ADDED altivec_yuv2packedX vertical scl + RGB converter
+March 27,2004
+PERFORMANCE ANALYSIS
+The C version use 25% of the processor or ~250Mips for D1 video rawvideo used as test
+The ALTIVEC version uses 10% of the processor or ~100Mips for D1 video same sequence
+720*480*30  ~10MPS
+so we have roughly 10clocks per pixel this is too high something has to be wrong.
+OPTIMIZED clip codes to utilize vec_max and vec_packs removing the need for vec_min.
+OPTIMIZED DST OUTPUT cache/dma controls. we are pretty much
+guaranteed to have the input video frame it was just decompressed so
+it probably resides in L1 caches.  However we are creating the
+output video stream this needs to use the DSTST instruction to
+optimize for the cache.  We couple this with the fact that we are
+not going to be visiting the input buffer again so we mark it Least
+Recently Used.  This shaves 25% of the processor cycles off.
+Now MEMCPY is the largest mips consumer in the system, probably due
+to the inefficient X11 stuff.
+GL libraries seem to be very slow on this machine 1.33Ghz PB running
+Jaguar, this is not the case for my 1Ghz PB.  I thought it might be
+a versioning issues, however I have libGL.1.2.dylib for both
+machines. ((We need to figure this out now))
+GL2 libraries work now with patch for RGB32
+NOTE quartz vo driver ARGB32_to_RGB24 consumes 30% of the processor
+Integrated luma prescaling adjustment for saturation/contrast/brightness adjustment.
+*/
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <inttypes.h>

Mercurial > mplayer.hg

comparison libswscale/yuv2rgb_altivec.c @ 27155:d5d7ecbbf0d8