132
|
1 So, I'll describe how this stuff works.
|
|
2
|
|
3 The main modules:
|
|
4
|
876
|
5 1. streamer.c: this is the input layer, this reads the file or the VCD or
|
|
6 stdin. what it has to know: appropriate buffering by sector, seek, skip
|
|
7 functions, reading by bytes, or blocks with any size. The stream_t
|
|
8 structure describes the input stream, file/device.
|
132
|
9
|
136
|
10 2. demuxer.c: this does the demultiplexing of the input to audio and video
|
132
|
11 channels, and their reading by buffered packages.
|
|
12 The demuxer.c is basically a framework, which is the same for all the
|
|
13 input formats, and there are parsers for each of them (mpeg-es,
|
|
14 mpeg-ps, avi, avi-ni, asf), these are in the demux_*.c files.
|
|
15 The structure is the demuxer_t. There is only one demuxer.
|
|
16
|
551
|
17 2.a. demux_packet_t, that is DP.
|
|
18 Contains one chunk (avi) or packet (asf,mpg). They are stored in memory as
|
|
19 in chained list, cause of their different size.
|
|
20
|
876
|
21 2.b. demuxer stream, that is DS.
|
|
22 Struct: demux_stream_t
|
|
23 Every channel (a/v) has one. This contains the packets for the stream
|
|
24 (see 2.a). For now, there can be 3 for each demuxer :
|
|
25 - audio (d_audio)
|
|
26 - video (d_video)
|
|
27 - DVD subtitle (d_dvdsub)
|
132
|
28
|
551
|
29 2.c. stream header. There are 2 types (for now): sh_audio_t and sh_video_t
|
|
30 This contains every parameter essential for decoding, such as input/output
|
|
31 buffers, chosen codec, fps, etc. There are each for every stream in
|
|
32 the file. At least one for video, if sound is present then another,
|
|
33 but if there are more, then there'll be one structure for each.
|
|
34 These are filled according to the header (avi/asf), or demux_mpg.c
|
|
35 does it (mpg) if it founds a new stream. If a new stream is found,
|
|
36 the ====> Found audio/video stream: <id> messages is displayed.
|
|
37
|
|
38 The chosen stream header and its demuxer are connected together
|
|
39 (ds->sh and sh->ds) to simplify the usage. So it's enough to pass the
|
|
40 ds or the sh, depending on the function.
|
|
41
|
|
42 For example: we have an asf file, 6 streams inside it, 1 audio, 5
|
876
|
43 video. During the reading of the header, 6 sh structs are created, 1
|
|
44 audio and 5 video. When it starts reading the packet, it chooses the
|
|
45 stream for the first found audio & video packet, and sets the sh
|
|
46 pointers of d_audio and d_video according to them. So later it reads
|
|
47 only these streams. Of course the user can force choosing a specific
|
|
48 stream with
|
551
|
49 -vid and -aid switches.
|
|
50 A good example for this is the DVD, where the english stream is not
|
|
51 always the first, so every VOB has different language :)
|
|
52 That's when we have to use for example the -aid 128 switch.
|
|
53
|
132
|
54 Now, how this reading works?
|
|
55 - demuxer.c/demux_read_data() is called, it gets how many bytes,
|
|
56 and where (memory address), would we like to read, and from which
|
|
57 DS. The codecs call this.
|
|
58 - this checks if the given DS's buffer contains something, if so, it
|
|
59 reads from there as much as needed. If there isn't enough, it calls
|
|
60 ds_fill_buffer(), which:
|
|
61 - checks if the given DS has buffered packages (DP's), if so, it moves
|
|
62 the oldest to the buffer, and reads on. If the list is empty, it
|
|
63 calls demux_fill_buffer() :
|
|
64 - this calls the parser for the input format, which reads the file
|
|
65 onward, and moves the found packages to their buffers.
|
|
66 Well it we'd like an audio package, but only a bunch of video
|
|
67 packages are available, then sooner or later the:
|
|
68 DEMUXER: Too many (%d in %d bytes) audio packets in the buffer
|
|
69 error shows up.
|
|
70
|
|
71 So everything is ok 'till now, I want to move them to a separate lib.
|
|
72
|
|
73 Now, go on:
|
|
74
|
|
75 3. mplayer.c - ooh, he's the boss :)
|
1649
|
76 Its main purpose is connecting the other modules, and maintaining A/V
|
1500
|
77 sync.
|
877
|
78
|
1649
|
79 The given stream's actual position is in the 'timer' field of the
|
|
80 corresponding stream header (sh_audio / sh_video).
|
876
|
81
|
|
82 The structure of the playing loop :
|
|
83 while(not EOF) {
|
|
84 fill audio buffer (read & decode audio) + increase a_frame
|
|
85 read & decode a single video frame + increase v_frame
|
|
86 sleep (wait until a_frame>=v_frame)
|
|
87 display the frame
|
|
88 apply A-V PTS correction to a_frame
|
|
89 check for keys -> pause,seek,...
|
|
90 }
|
|
91
|
132
|
92 When playing (a/v), it increases the variables by the duration of the
|
876
|
93 played a/v.
|
|
94 - with audio this is played bytes / sh_audio->o_bps
|
|
95 Note: i_bps = number of compressed bytes for one second of audio
|
|
96 o_bps = number of uncompressed bytes for one second of audio
|
|
97 (this is = bps*samplerate*channels)
|
|
98 - with video this is usually == 1.0/fps, but I have to note that
|
138
|
99 fps doesn't really matters at video, for example asf doesn't have that,
|
|
100 instead there is "duration" and it can change per frame.
|
132
|
101 MPEG2 has "repeat_count" which delays the frame by 1-2.5 ...
|
|
102 Maybe only AVI and MPEG1 has fixed fps.
|
|
103
|
138
|
104 So everything works right until the audio and video are in perfect
|
132
|
105 synchronity, since the audio goes, it gives the timing, and if the
|
|
106 time of a frame passed, the next frame is displayed.
|
|
107 But what if these two aren't synchronized in the input file?
|
|
108 PTS correction kicks in. The input demuxers read the PTS (presentation
|
|
109 timestamp) of the packages, and with it we can see if the streams
|
|
110 are synchronized. Then MPlayer can correct the a_frame, within
|
|
111 a given maximal bounder (see -mc option). The summary of the
|
|
112 corrections can be found in c_total .
|
|
113
|
|
114 Of course this is not everything, several things suck.
|
|
115 For example the soundcards delay, which has to be corrected by
|
876
|
116 MPlayer! The audio delay is the sum of all these:
|
|
117 - bytes read since the last timestamp:
|
|
118 t1 = d_audio->pts_bytes/sh_audio->i_bps
|
|
119 - if Win32/ACM then the bytes stored in audio input buffer
|
|
120 t2 = a_in_buffer_len/sh_audio->i_bps
|
|
121 - uncompressed bytes in audio out buffer
|
|
122 t3 = a_buffer_len/sh_audio->o_bps
|
|
123 - not yet played bytes stored in the soundcard's (or DMA's) buffer
|
|
124 t4 = get_audio_delay()/sh_audio->o_bps
|
|
125
|
|
126 From this we can calculate what PTS we need for the just played
|
|
127 audio, then after we compare this with the video's PTS, we have
|
|
128 the difference!
|
132
|
129
|
|
130 Life didn't get simpler with AVI. There's the "official" timing
|
|
131 method, the BPS-based, so the header contains how many compressed
|
1500
|
132 audio bytes or chunks belong to one second of frames.
|
|
133 In the AVI stream header there are 2 important fields, the
|
|
134 dwSampleSize, and dwRate/dwScale pairs:
|
|
135 - If the dwSampleSize is 0, then it's VBR stream, so its bitrate
|
|
136 isn't constant. It means that 1 chunk stores 1 sample, and
|
|
137 dwRate/dwScale gives the chunks/sec value.
|
|
138 - If the dwSampleSize is >0, then it's constant bitrate, and the
|
|
139 time can be measured this way: time = (bytepos/dwSampleSize) /
|
|
140 (dwRate/dwScale) (so the sample's number is divided with the
|
|
141 samplerate). Now the audio can be handled as a stream, which can
|
|
142 be cut to chunks, but can be one chunk also.
|
132
|
143
|
1500
|
144 The other method can be used only for interleaved files: from
|
|
145 the order of the chunks, a timestamp (PTS) value can be calculated.
|
|
146 The PTS of the video chunks are simple: chunk number * fps
|
|
147 The audio is the same as the previous video chunk was.
|
|
148 We have to pay attention to the so called "audio preload", that is,
|
|
149 there is a delay between the audio and video streams. This is
|
|
150 usually 0.5-1.0 sec, but can be totally different.
|
|
151 The exact value was measured until now, but now the demux_avi.c
|
|
152 handles it: at the audio chunk after the first video, it calculates
|
|
153 the A/V difference, and take this as a measure for audio preload.
|
876
|
154
|
|
155 3.a. audio playback:
|
|
156 Some words on audio playback:
|
|
157 Not the playing is hard, but:
|
|
158 1. knowing when to write into the buffer, without blocking
|
|
159 2. knowing how much was played of what we wrote into
|
|
160 The first is needed for audio decoding, and to keep the buffer
|
|
161 full (so the audio will never skip). And the second is needed for
|
|
162 correct timing, because some soundcards delay even 3-7 seconds,
|
|
163 which can't be forgotten about.
|
|
164 To solve this, the OSS gives several possibilities:
|
|
165 - ioctl(SNDCTL_DSP_GETODELAY): tells how many unplayed bytes are in
|
|
166 the soundcard's buffer -> perfect for timing, but not all drivers
|
|
167 support it :(
|
|
168 - ioctl(SNDCTL_DSP_GETOSPACE): tells how much can we write into the
|
|
169 soundcard's buffer, without blocking. If the driver doesn't
|
|
170 support GETODELAY, we can use this to know how much the delay is.
|
|
171 - select(): should tell if we can write into the buffer without
|
|
172 blocking. Unfortunately it doesn't say how much we could :((
|
|
173 Also, doesn't/badly works with some drivers.
|
|
174 Only used if none of the above works.
|
132
|
175
|
|
176 4. Codecs. They are separate libs.
|
|
177 For example libac3, libmpeg2, xa/*, alaw.c, opendivx/*, loader, mp3lib.
|
1500
|
178
|
|
179 mplayer.c doesn't call the directly, but through the dec_audio.c and
|
|
180 dec_video.c files, so the mplayer.c doesn't have to know anything about
|
|
181 the codec.
|
132
|
182
|
551
|
183 5. libvo: this displays the frame.
|
|
184 The constants for different pixelformats are defined in img_format.h,
|
|
185 their usage is mandatory.
|
|
186
|
|
187 Each vo driver _has_ to implement these:
|
132
|
188
|
551
|
189 query_format() - queries if a given pixelformat is supported.
|
|
190 return value: flags:
|
|
191 0x1 - supported (by hardware or conversion)
|
|
192 0x2 - supported (by hardware, without conversion)
|
|
193 0x4 - sub/osd supported (has draw_alpha)
|
|
194 IMPORTANT: it's mandatorial that every vo driver support the YV12 format,
|
|
195 and one (or both) of BGR15 and BGR24, with conversion, if needed.
|
|
196 If these aren't supported, not every codec will work! The mpeg codecs
|
|
197 can output only YV12, and the older win32 DLLs only 15 and 24bpp.
|
1649
|
198 There is a fast MMX-optimized 15->16bpp converter, so it's not a
|
551
|
199 significant speed-decrease!
|
|
200
|
|
201 The BPP table, if the driver can't change bpp:
|
|
202 current bpp has to accept these
|
|
203 15 15
|
|
204 16 15,16
|
|
205 24 24
|
|
206 24,32 24,32
|
|
207
|
|
208 If it can change bpp (for example DGA 2, fbdev, svgalib), then if possible
|
|
209 we have to change to the desired bpp. If the hardware doesn't support,
|
|
210 we have to change to the one closest to it, and do conversion!
|
|
211
|
|
212 init() - this is called before displaying of the first frame -
|
|
213 initializing buffers, etc.
|
|
214
|
|
215 draw_slice(): this displays YV12 pictures (3 planes, one full sized that
|
|
216 contains brightness (Y), and 2 quarter-sized which the colour-info
|
|
217 (U,V). MPEG codecs (libmpeg2, opendivx) use this. This doesn't have
|
132
|
218 to display the whole frame, only update small parts of it.
|
551
|
219
|
|
220 draw_frame(): this is the older interface, this displays only complete
|
133
|
221 frames, and can do only packed format (YUY2, RGB/BGR).
|
132
|
222 Win32 codecs use this (DivX, Indeo, etc).
|
551
|
223
|
|
224 draw_alpha(): this displays subtitles and OSD.
|
|
225 It's a bit tricky to use it, since it's not a part of libvo API,
|
|
226 but a callback-style stuff. The flip_page() has to call
|
|
227 vo_draw_text(), so that it passes the size of the screen and the
|
|
228 corresponding draw_alpha() implementation for the pixelformat
|
|
229 (function pointer). The vo_draw_text() checks the characters to draw,
|
|
230 and calls draw_alpha() for each.
|
|
231 As a help, osd.c contains draw_alpha for each pixelformats, use this
|
|
232 if possible!
|
|
233
|
|
234 flip_page(): this is called after each frame, this diplays the buffer for
|
|
235 real. This is 'swapbuffers' when double-buffering.
|
|
236
|
986
|
237 6. libao2: this control audio playing
|
551
|
238
|
986
|
239 As in libvo (see 5.) also here are some drivers, based on the same API:
|
|
240
|
991
|
241 static int control(int cmd, int arg);
|
986
|
242 This is for reading/setting driver-specific and other special parameters.
|
|
243 Not really used for now.
|
|
244
|
|
245 static int init(int rate,int channels,int format,int flags);
|
|
246 The init of driver, opens device, sets sample rate, channels, sample format
|
|
247 parameters.
|
|
248 Sample format: usually AFMT_S16_LE or AFMT_U8, for more definitions see
|
|
249 dec_audio.c and linux/soundcards.h files!
|
|
250
|
|
251 static void uninit();
|
|
252 Guess what.
|
|
253 Ok I help: closes the device, not (yet) called when exit.
|
|
254
|
994
|
255 static void reset();
|
986
|
256 Resets device. To be exact, it's for deleting buffers' contents,
|
|
257 so after reset() the previously received stuff won't be output.
|
|
258 (called if pause or seek)
|
|
259
|
|
260 static int get_space();
|
|
261 Returns how many bytes can be written into the audio buffer without
|
|
262 blocking (making caller process wait). If the buffer is (nearly) full,
|
|
263 has to return 0!
|
|
264 If it never gives 0, MPlayer won't work!
|
|
265
|
|
266 static int play(void* data,int len,int flags);
|
|
267 Plays a bit of audio, which is received throught the "data" memory area, with
|
|
268 a size of "len". The "flags" isn't used yet. It has to copy the data, because
|
|
269 they can be overwritten after the call is made. Doesn't really have to use
|
|
270 all the bytes, it has to give back how many have been used (copied to
|
|
271 buffer).
|
|
272
|
|
273 static int get_delay();
|
|
274 Has to return how many bytes are in the audio buffer. Be exact, if possible,
|
|
275 since the whole timing depends on this! In the worst case, return the size
|
|
276 of the buffer.
|
|
277
|
|
278 !!! Because the video is synchronized to the audio (card), it's very important
|
1649
|
279 !!! that the get_space and get_delay functions be correctly implemented!
|
986
|
280
|