Mercurial > mplayer.hg
annotate DOCS/tech/dr-methods.txt @ 34725:e902766a55db
add HV60 to divx decoders, samples/V-codecs/HV60
author | compn |
---|---|
date | Wed, 14 Mar 2012 03:42:23 +0000 |
parents | 0f1b5b68af32 |
children |
rev | line source |
---|---|
4994 | 1 DIRECT RENDERING METHODS -- by A'rpi |
2 ======================== (based on a mail to -dev-eng) | |
3 | |
4 Ok. It seems none of you really knows what direct rendering means... | |
5 I'll try to explain now! :) | |
6 | |
7 At first, there are 2 different way, both called direct rendering. | |
8 The main point is the same, but they work different. | |
9 | |
10 method 1: decoding directly to externally provided buffers. | |
6846 | 11 so, the codec decodes macroblocks directly to the buffer provided by the |
21768 | 12 caller. as this buffer will be read later (for MC of next frame) it's not |
4994 | 13 a good idea to place such buffers in slow video ram. but. |
14 there are many video out drivers using buffers in system ram, and using some | |
6846 | 15 way of memcpy or DMA to blit it to video ram at display time. |
4994 | 16 for example, Xv and X11 (normal and Shm too) are such thingie. |
17 XImage will be a buffer in system ram (!) and X*PutImage will copy it to | |
18 video ram. Only nvidia and ati rage128 Xv drivers use DMA, others just | |
19 memcpy it. Also some opengl drivers (including Matrox) uses DMA to copy from | |
6846 | 20 texsubimage to video ram. |
21 The current mplayer way mean: codec allocates some buffer, and decode image | |
4994 | 22 to that buffer. then this buffer is copied to X11's buffer. then Xserver |
23 copies this buffer to video ram. So one more memcpy than required... | |
6846 | 24 direct rendering can remove this extra memcpy, and use Xserver's memory |
4994 | 25 buffers for decoding buffer. Note again: it helps only if the external |
26 buffer is in fast system ram. | |
27 | |
28 method 2: decoding to internal buffers, but blit after each macroblocks, | |
29 including optional colorspace conversion. | |
30 advantages: it can blit into video ram, as it keeps the copy in its internal | |
31 buffers for next frame's MC. skipped macroblocks won't be copied again to | |
32 video ram (except if video buffer address changes between frames -> hw | |
33 double/triple buffering) | |
34 Just avoiding blitting of skipped MBs mean about 100% speedup (2 times | |
35 faster) for low bitrate (<700kbit) divxes. It even makes possible to watch | |
36 VCD resolution divx on p200mmx with DGA. | |
37 how does it work? the codec works as normally, decodes macroblocks into its | |
38 internal buffer. but after each decoded macroblock, it immediatelly copies | |
39 this macroblock to the video ram. it's in the L1 cache, so it will be fast. | |
40 skipped macroblocks can be skipped easily -> less vram write -> more speedup. | |
41 but, as it copies directly to video ram, it must do colorspace conversion if | |
42 needed (for example divx -> rgb DGA), and cannot be used with scaling. | |
43 another interesting question of such direct rendering is the planar formats. | |
44 Eugene K. of Divx4 told me that he experienced worse performance blittig | |
45 yv12 blocks (copied 3 blocks to 3 different (Y,U,V) buffers) than doing | |
46 (really unneeded) yv12->yuy2 conversion on-the-fly. | |
47 so, divx4 codec (with -vc divx4 api) converts from its internal yv12 buffer | |
48 to the external yuy2. | |
49 | |
6846 | 50 method 2a: |
4994 | 51 libmpeg2 already uses simplified variation of this: when it finish decoding a |
52 slice (a horizontal line of MBs) it copies it to external (video ram) buffer | |
53 (using callback to libvo), so at least it copies from L2 cache instead of | |
6846 | 54 slow ram. for non-predictive (B) frames it can re-use this cached memory |
55 for the next slice - so it uses less memory and has better cache utilization: | |
56 it gave me 23% -> 20% VOB decoding speedup on p3. libavcodec supports | |
57 per-slice callbacks too, but no slice-memory reusing for B frames yet. | |
58 | |
59 method 2b: | |
60 some codecs (indeo vfw 3/4 using IF09, and libavcodec) can export the 'bitmap' | |
61 of skipped macroblocks - so libvo driver can do selective blitting: copy only | |
62 the changed macroblocks to slow vram. | |
4994 | 63 |
64 so, again: the main difference between method 1 and 2: | |
65 method1 stores decoded data only once: in the external read/write buffer. | |
66 method2 stores decoded data twice: in its internal read/write buffer (for | |
67 later reading) and in the write-only slow video ram. | |
68 | |
69 both methods can make big speedup, depending on codec behaviour and libvo | |
70 driver. for example, IPB mpegs could combine these, use method 2 for I/P | |
71 frames and method 1 for B frams. mpeg2dec does already this. | |
72 for I-only type video (like mjpeg) method 1 is better. for I/P type video | |
73 with MC (like divx, h263 etc) method 2 is the best choice. | |
74 for I/P type videos without MC (like FLI, CVID) could use method 1 with | |
75 static buffer or method 2 with double/triple buffering. | |
76 | |
77 i hope it is clear now. | |
78 and i hope even nick understand what are we talking about... | |
79 | |
80 ah, and at the end, the abilities of codecs: | |
6846 | 81 libmpeg2,libavcodec: can do method 1 and 2 (but slice level copy, not MB level) |
4994 | 82 vfw, dshow: can do method 2, with static or variable address external buffer |
83 odivx, and most native codecs like fli, cvid, rle: can do method 1 | |
84 divx4: can do method 2 (with old odivx api it does method 1) | |
6846 | 85 xanim: they currently can't do DR, but they exports their |
4994 | 86 internal buffers. but it's very easy to implement menthod 1 support, |
87 and a bit harder but possible without any rewrite to do method 2. | |
88 | |
89 so, dshow and divx4 already implements all requirements of method 2. | |
6846 | 90 libmpeg2 and libavcodec implements method 1 and 2a (lavc 2b too) |
4994 | 91 |
92 anyway, in the ideal world, we need all codecs support both methods. | |
93 anyway 2: in ideal world, there are no libvo drivers having buffer in system | |
94 ram and memcpy to video ram... | |
95 anyway 3: in our really ideal world, all libvo driver has its buffers in | |
96 fast sytem ram and does blitting with DMA... :) | |
97 | |
98 ============================================================================ | |
99 MPlayer NOW! -- The libmpcodecs way. | |
100 | |
101 libmpcodecs replaced old draw callbacks with mpi (mplayer image) struct. | |
102 steps of decoding with libmpcodecs: | |
103 1. codec requests an mpi from libmpcodecs core (vd.c) | |
104 2. vd creates an mpi struct filled by codec's requirements (size, stride, | |
105 colorspace, flags, type) | |
106 3. vd asks libvo (control(VOCTRL_GET_IMAGE)), if it can provide such buffer: | |
107 - if it can -> do direct rendering | |
108 - it it can not -> allocate system ram area with memalign()/malloc() | |
29263
0f1b5b68af32
whitespace cosmetics: Remove all trailing whitespace.
diego
parents:
21768
diff
changeset
|
109 Note: codec may request EXPORT buffer, it means buffer allocation is |
4994 | 110 done inside the codec, so we cannot do DR :( |
6846 | 111 4. codec decodes one frame to the mpi struct (system ram or direct rendering) |
4994 | 112 5. if it isn't DR, we call libvo's draw functions to blit image to video ram |
113 | |
114 current possible buffer setups: | |
115 - EXPORT - codec handles buffer allocation and it exports its buffer pointers | |
116 used for opendivx, xanim and libavcodec | |
117 - STATIC - codec requires a single static buffer with constant preserved content | |
118 used by codecs which do partial updating of image, but doesn't require reading | |
119 of previous frame. most rle-based codecs, like cvid, rle8, qtrle, qtsmc etc. | |
120 - TEMP - codec requires a buffer, but it doesn't depend on previous frame at all | |
121 used for I-only codecs (like mjpeg) and for codecs supporting method-2 direct | |
122 rendering with variable buffer address (vfw, dshow, divx4). | |
123 - IP - codec requires 2 (or more) read/write buffers. it's for codecs supporting | |
124 method-1 direct rendering but using motion compensation (ie. reading from | |
6846 | 125 previous frame buffer). could be used for libavcodec (divx3/4,h263). |
4994 | 126 IP buffer stays from 2 (or more) STATIC buffers. |
127 - IPB - similar to IP, but also have one (or more) TEMP buffers for B frames. | |
128 it will be used for libmpeg2 and libavcodec (mpeg1/2/4). | |
129 IPB buffer stays from 2 (or more) STATIC buffers and 1 (or more) TEMP buffer. |