annotate DOCS/tech/dr-methods.txt @ 23927:91ccac9cc015

Add test for GNUisms It currently tests for case ... ranges only, but other tests (like GNU extensions to libc) can be added later
author ivo
date Mon, 30 Jul 2007 18:08:26 +0000
parents f665d42cd019
children 0f1b5b68af32
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
1 DIRECT RENDERING METHODS -- by A'rpi
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
2 ======================== (based on a mail to -dev-eng)
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
3
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
4 Ok. It seems none of you really knows what direct rendering means...
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
5 I'll try to explain now! :)
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
6
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
7 At first, there are 2 different way, both called direct rendering.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
8 The main point is the same, but they work different.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
9
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
10 method 1: decoding directly to externally provided buffers.
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
11 so, the codec decodes macroblocks directly to the buffer provided by the
21768
f665d42cd019 spellfix
nicodvb
parents: 6846
diff changeset
12 caller. as this buffer will be read later (for MC of next frame) it's not
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
13 a good idea to place such buffers in slow video ram. but.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
14 there are many video out drivers using buffers in system ram, and using some
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
15 way of memcpy or DMA to blit it to video ram at display time.
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
16 for example, Xv and X11 (normal and Shm too) are such thingie.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
17 XImage will be a buffer in system ram (!) and X*PutImage will copy it to
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
18 video ram. Only nvidia and ati rage128 Xv drivers use DMA, others just
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
19 memcpy it. Also some opengl drivers (including Matrox) uses DMA to copy from
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
20 texsubimage to video ram.
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
21 The current mplayer way mean: codec allocates some buffer, and decode image
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
22 to that buffer. then this buffer is copied to X11's buffer. then Xserver
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
23 copies this buffer to video ram. So one more memcpy than required...
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
24 direct rendering can remove this extra memcpy, and use Xserver's memory
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
25 buffers for decoding buffer. Note again: it helps only if the external
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
26 buffer is in fast system ram.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
27
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
28 method 2: decoding to internal buffers, but blit after each macroblocks,
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
29 including optional colorspace conversion.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
30 advantages: it can blit into video ram, as it keeps the copy in its internal
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
31 buffers for next frame's MC. skipped macroblocks won't be copied again to
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
32 video ram (except if video buffer address changes between frames -> hw
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
33 double/triple buffering)
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
34 Just avoiding blitting of skipped MBs mean about 100% speedup (2 times
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
35 faster) for low bitrate (<700kbit) divxes. It even makes possible to watch
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
36 VCD resolution divx on p200mmx with DGA.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
37 how does it work? the codec works as normally, decodes macroblocks into its
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
38 internal buffer. but after each decoded macroblock, it immediatelly copies
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
39 this macroblock to the video ram. it's in the L1 cache, so it will be fast.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
40 skipped macroblocks can be skipped easily -> less vram write -> more speedup.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
41 but, as it copies directly to video ram, it must do colorspace conversion if
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
42 needed (for example divx -> rgb DGA), and cannot be used with scaling.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
43 another interesting question of such direct rendering is the planar formats.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
44 Eugene K. of Divx4 told me that he experienced worse performance blittig
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
45 yv12 blocks (copied 3 blocks to 3 different (Y,U,V) buffers) than doing
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
46 (really unneeded) yv12->yuy2 conversion on-the-fly.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
47 so, divx4 codec (with -vc divx4 api) converts from its internal yv12 buffer
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
48 to the external yuy2.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
49
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
50 method 2a:
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
51 libmpeg2 already uses simplified variation of this: when it finish decoding a
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
52 slice (a horizontal line of MBs) it copies it to external (video ram) buffer
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
53 (using callback to libvo), so at least it copies from L2 cache instead of
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
54 slow ram. for non-predictive (B) frames it can re-use this cached memory
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
55 for the next slice - so it uses less memory and has better cache utilization:
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
56 it gave me 23% -> 20% VOB decoding speedup on p3. libavcodec supports
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
57 per-slice callbacks too, but no slice-memory reusing for B frames yet.
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
58
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
59 method 2b:
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
60 some codecs (indeo vfw 3/4 using IF09, and libavcodec) can export the 'bitmap'
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
61 of skipped macroblocks - so libvo driver can do selective blitting: copy only
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
62 the changed macroblocks to slow vram.
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
63
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
64 so, again: the main difference between method 1 and 2:
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
65 method1 stores decoded data only once: in the external read/write buffer.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
66 method2 stores decoded data twice: in its internal read/write buffer (for
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
67 later reading) and in the write-only slow video ram.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
68
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
69 both methods can make big speedup, depending on codec behaviour and libvo
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
70 driver. for example, IPB mpegs could combine these, use method 2 for I/P
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
71 frames and method 1 for B frams. mpeg2dec does already this.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
72 for I-only type video (like mjpeg) method 1 is better. for I/P type video
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
73 with MC (like divx, h263 etc) method 2 is the best choice.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
74 for I/P type videos without MC (like FLI, CVID) could use method 1 with
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
75 static buffer or method 2 with double/triple buffering.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
76
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
77 i hope it is clear now.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
78 and i hope even nick understand what are we talking about...
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
79
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
80 ah, and at the end, the abilities of codecs:
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
81 libmpeg2,libavcodec: can do method 1 and 2 (but slice level copy, not MB level)
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
82 vfw, dshow: can do method 2, with static or variable address external buffer
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
83 odivx, and most native codecs like fli, cvid, rle: can do method 1
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
84 divx4: can do method 2 (with old odivx api it does method 1)
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
85 xanim: they currently can't do DR, but they exports their
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
86 internal buffers. but it's very easy to implement menthod 1 support,
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
87 and a bit harder but possible without any rewrite to do method 2.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
88
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
89 so, dshow and divx4 already implements all requirements of method 2.
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
90 libmpeg2 and libavcodec implements method 1 and 2a (lavc 2b too)
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
91
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
92 anyway, in the ideal world, we need all codecs support both methods.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
93 anyway 2: in ideal world, there are no libvo drivers having buffer in system
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
94 ram and memcpy to video ram...
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
95 anyway 3: in our really ideal world, all libvo driver has its buffers in
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
96 fast sytem ram and does blitting with DMA... :)
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
97
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
98 ============================================================================
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
99 MPlayer NOW! -- The libmpcodecs way.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
100
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
101 libmpcodecs replaced old draw callbacks with mpi (mplayer image) struct.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
102 steps of decoding with libmpcodecs:
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
103 1. codec requests an mpi from libmpcodecs core (vd.c)
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
104 2. vd creates an mpi struct filled by codec's requirements (size, stride,
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
105 colorspace, flags, type)
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
106 3. vd asks libvo (control(VOCTRL_GET_IMAGE)), if it can provide such buffer:
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
107 - if it can -> do direct rendering
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
108 - it it can not -> allocate system ram area with memalign()/malloc()
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
109 Note: codec may request EXPORT buffer, it means buffer allocation is
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
110 done inside the codec, so we cannot do DR :(
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
111 4. codec decodes one frame to the mpi struct (system ram or direct rendering)
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
112 5. if it isn't DR, we call libvo's draw functions to blit image to video ram
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
113
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
114 current possible buffer setups:
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
115 - EXPORT - codec handles buffer allocation and it exports its buffer pointers
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
116 used for opendivx, xanim and libavcodec
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
117 - STATIC - codec requires a single static buffer with constant preserved content
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
118 used by codecs which do partial updating of image, but doesn't require reading
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
119 of previous frame. most rle-based codecs, like cvid, rle8, qtrle, qtsmc etc.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
120 - TEMP - codec requires a buffer, but it doesn't depend on previous frame at all
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
121 used for I-only codecs (like mjpeg) and for codecs supporting method-2 direct
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
122 rendering with variable buffer address (vfw, dshow, divx4).
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
123 - IP - codec requires 2 (or more) read/write buffers. it's for codecs supporting
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
124 method-1 direct rendering but using motion compensation (ie. reading from
6846
f0cb56e4e986 typo fixes and updates around libavcodec
arpi
parents: 4994
diff changeset
125 previous frame buffer). could be used for libavcodec (divx3/4,h263).
4994
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
126 IP buffer stays from 2 (or more) STATIC buffers.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
127 - IPB - similar to IP, but also have one (or more) TEMP buffers for B frames.
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
128 it will be used for libmpeg2 and libavcodec (mpeg1/2/4).
e74227031a12 dr + libmpcodecs
arpi
parents:
diff changeset
129 IPB buffer stays from 2 (or more) STATIC buffers and 1 (or more) TEMP buffer.