mplayer.hg: DOCS/tech/dr-methods.txt annotate

annotate DOCS/tech/dr-methods.txt @ 29285:1230bcd21ac6

synced with r2769

author	ptt
date	Thu, 28 May 2009 15:39:02 +0000
parents	0f1b5b68af32
children

rev	line source
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	1 DIRECT RENDERING METHODS -- by A'rpi
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	2 ======================== (based on a mail to -dev-eng)
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	3
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	4 Ok. It seems none of you really knows what direct rendering means...
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	5 I'll try to explain now! :)
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	6
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	7 At first, there are 2 different way, both called direct rendering.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	8 The main point is the same, but they work different.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	9
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	10 method 1: decoding directly to externally provided buffers.
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	11 so, the codec decodes macroblocks directly to the buffer provided by the
21768 f665d42cd019 spellfix nicodvb parents: 6846 diff changeset	12 caller. as this buffer will be read later (for MC of next frame) it's not
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	13 a good idea to place such buffers in slow video ram. but.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	14 there are many video out drivers using buffers in system ram, and using some
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	15 way of memcpy or DMA to blit it to video ram at display time.
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	16 for example, Xv and X11 (normal and Shm too) are such thingie.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	17 XImage will be a buffer in system ram (!) and X*PutImage will copy it to
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	18 video ram. Only nvidia and ati rage128 Xv drivers use DMA, others just
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	19 memcpy it. Also some opengl drivers (including Matrox) uses DMA to copy from
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	20 texsubimage to video ram.
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	21 The current mplayer way mean: codec allocates some buffer, and decode image
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	22 to that buffer. then this buffer is copied to X11's buffer. then Xserver
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	23 copies this buffer to video ram. So one more memcpy than required...
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	24 direct rendering can remove this extra memcpy, and use Xserver's memory
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	25 buffers for decoding buffer. Note again: it helps only if the external
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	26 buffer is in fast system ram.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	27
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	28 method 2: decoding to internal buffers, but blit after each macroblocks,
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	29 including optional colorspace conversion.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	30 advantages: it can blit into video ram, as it keeps the copy in its internal
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	31 buffers for next frame's MC. skipped macroblocks won't be copied again to
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	32 video ram (except if video buffer address changes between frames -> hw
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	33 double/triple buffering)
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	34 Just avoiding blitting of skipped MBs mean about 100% speedup (2 times
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	35 faster) for low bitrate (<700kbit) divxes. It even makes possible to watch
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	36 VCD resolution divx on p200mmx with DGA.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	37 how does it work? the codec works as normally, decodes macroblocks into its
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	38 internal buffer. but after each decoded macroblock, it immediatelly copies
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	39 this macroblock to the video ram. it's in the L1 cache, so it will be fast.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	40 skipped macroblocks can be skipped easily -> less vram write -> more speedup.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	41 but, as it copies directly to video ram, it must do colorspace conversion if
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	42 needed (for example divx -> rgb DGA), and cannot be used with scaling.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	43 another interesting question of such direct rendering is the planar formats.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	44 Eugene K. of Divx4 told me that he experienced worse performance blittig
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	45 yv12 blocks (copied 3 blocks to 3 different (Y,U,V) buffers) than doing
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	46 (really unneeded) yv12->yuy2 conversion on-the-fly.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	47 so, divx4 codec (with -vc divx4 api) converts from its internal yv12 buffer
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	48 to the external yuy2.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	49
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	50 method 2a:
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	51 libmpeg2 already uses simplified variation of this: when it finish decoding a
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	52 slice (a horizontal line of MBs) it copies it to external (video ram) buffer
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	53 (using callback to libvo), so at least it copies from L2 cache instead of
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	54 slow ram. for non-predictive (B) frames it can re-use this cached memory
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	55 for the next slice - so it uses less memory and has better cache utilization:
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	56 it gave me 23% -> 20% VOB decoding speedup on p3. libavcodec supports
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	57 per-slice callbacks too, but no slice-memory reusing for B frames yet.
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	58
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	59 method 2b:
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	60 some codecs (indeo vfw 3/4 using IF09, and libavcodec) can export the 'bitmap'
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	61 of skipped macroblocks - so libvo driver can do selective blitting: copy only
f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	62 the changed macroblocks to slow vram.
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	63
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	64 so, again: the main difference between method 1 and 2:
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	65 method1 stores decoded data only once: in the external read/write buffer.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	66 method2 stores decoded data twice: in its internal read/write buffer (for
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	67 later reading) and in the write-only slow video ram.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	68
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	69 both methods can make big speedup, depending on codec behaviour and libvo
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	70 driver. for example, IPB mpegs could combine these, use method 2 for I/P
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	71 frames and method 1 for B frams. mpeg2dec does already this.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	72 for I-only type video (like mjpeg) method 1 is better. for I/P type video
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	73 with MC (like divx, h263 etc) method 2 is the best choice.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	74 for I/P type videos without MC (like FLI, CVID) could use method 1 with
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	75 static buffer or method 2 with double/triple buffering.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	76
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	77 i hope it is clear now.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	78 and i hope even nick understand what are we talking about...
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	79
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	80 ah, and at the end, the abilities of codecs:
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	81 libmpeg2,libavcodec: can do method 1 and 2 (but slice level copy, not MB level)
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	82 vfw, dshow: can do method 2, with static or variable address external buffer
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	83 odivx, and most native codecs like fli, cvid, rle: can do method 1
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	84 divx4: can do method 2 (with old odivx api it does method 1)
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	85 xanim: they currently can't do DR, but they exports their
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	86 internal buffers. but it's very easy to implement menthod 1 support,
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	87 and a bit harder but possible without any rewrite to do method 2.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	88
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	89 so, dshow and divx4 already implements all requirements of method 2.
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	90 libmpeg2 and libavcodec implements method 1 and 2a (lavc 2b too)
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	91
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	92 anyway, in the ideal world, we need all codecs support both methods.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	93 anyway 2: in ideal world, there are no libvo drivers having buffer in system
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	94 ram and memcpy to video ram...
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	95 anyway 3: in our really ideal world, all libvo driver has its buffers in
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	96 fast sytem ram and does blitting with DMA... :)
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	97
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	98 ============================================================================
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	99 MPlayer NOW! -- The libmpcodecs way.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	100
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	101 libmpcodecs replaced old draw callbacks with mpi (mplayer image) struct.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	102 steps of decoding with libmpcodecs:
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	103 1. codec requests an mpi from libmpcodecs core (vd.c)
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	104 2. vd creates an mpi struct filled by codec's requirements (size, stride,
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	105 colorspace, flags, type)
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	106 3. vd asks libvo (control(VOCTRL_GET_IMAGE)), if it can provide such buffer:
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	107 - if it can -> do direct rendering
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	108 - it it can not -> allocate system ram area with memalign()/malloc()
29263 0f1b5b68af32 whitespace cosmetics: Remove all trailing whitespace. diego parents: 21768 diff changeset	109 Note: codec may request EXPORT buffer, it means buffer allocation is
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	110 done inside the codec, so we cannot do DR :(
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	111 4. codec decodes one frame to the mpi struct (system ram or direct rendering)
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	112 5. if it isn't DR, we call libvo's draw functions to blit image to video ram
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	113
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	114 current possible buffer setups:
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	115 - EXPORT - codec handles buffer allocation and it exports its buffer pointers
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	116 used for opendivx, xanim and libavcodec
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	117 - STATIC - codec requires a single static buffer with constant preserved content
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	118 used by codecs which do partial updating of image, but doesn't require reading
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	119 of previous frame. most rle-based codecs, like cvid, rle8, qtrle, qtsmc etc.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	120 - TEMP - codec requires a buffer, but it doesn't depend on previous frame at all
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	121 used for I-only codecs (like mjpeg) and for codecs supporting method-2 direct
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	122 rendering with variable buffer address (vfw, dshow, divx4).
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	123 - IP - codec requires 2 (or more) read/write buffers. it's for codecs supporting
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	124 method-1 direct rendering but using motion compensation (ie. reading from
6846 f0cb56e4e986 typo fixes and updates around libavcodec arpi parents: 4994 diff changeset	125 previous frame buffer). could be used for libavcodec (divx3/4,h263).
4994 e74227031a12 dr + libmpcodecs arpi parents: diff changeset	126 IP buffer stays from 2 (or more) STATIC buffers.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	127 - IPB - similar to IP, but also have one (or more) TEMP buffers for B frames.
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	128 it will be used for libmpeg2 and libavcodec (mpeg1/2/4).
e74227031a12 dr + libmpcodecs arpi parents: diff changeset	129 IPB buffer stays from 2 (or more) STATIC buffers and 1 (or more) TEMP buffer.

Mercurial > mplayer.hg

annotate DOCS/tech/dr-methods.txt @ 29285:1230bcd21ac6