Mercurial > mplayer.hg
view DOCS/tech/general.txt @ 782:14bac9d91e22
audio buffering fixed
author | arpi_esp |
---|---|
date | Sun, 13 May 2001 16:29:10 +0000 |
parents | e6263c6d377a |
children | 1f26877717f1 |
line wrap: on
line source
So, I'll describe how this stuff works. The basis of the program's structure is basically logical, however it's a big hack :) The main modules: 1. streamer.c: this is the input, this reads the file or the VCD. what it has to know: appropriate buffering by sector, seek, skip functions, reading by bytes, or blocks with any size. The stream_t structure describes the input stream, file/device. 2. demuxer.c: this does the demultiplexing of the input to audio and video channels, and their reading by buffered packages. The demuxer.c is basically a framework, which is the same for all the input formats, and there are parsers for each of them (mpeg-es, mpeg-ps, avi, avi-ni, asf), these are in the demux_*.c files. The structure is the demuxer_t. There is only one demuxer. 2.a. demux_packet_t, that is DP. Contains one chunk (avi) or packet (asf,mpg). They are stored in memory as in chained list, cause of their different size. 2.b. demuxer stream, that is DS. Struct: demux_stream_t Every channel (a/v) has one. This contains the packets for the stream (see 2.a). For now, there can be 2 for each demuxer, one for the audio and one for the video. 2.c. stream header. There are 2 types (for now): sh_audio_t and sh_video_t This contains every parameter essential for decoding, such as input/output buffers, chosen codec, fps, etc. There are each for every stream in the file. At least one for video, if sound is present then another, but if there are more, then there'll be one structure for each. These are filled according to the header (avi/asf), or demux_mpg.c does it (mpg) if it founds a new stream. If a new stream is found, the ====> Found audio/video stream: <id> messages is displayed. The chosen stream header and its demuxer are connected together (ds->sh and sh->ds) to simplify the usage. So it's enough to pass the ds or the sh, depending on the function. For example: we have an asf file, 6 streams inside it, 1 audio, 5 video. During the reading of the header, 6 sh structs are created, 1 audio and 5 video. When it starts reading the packet, it chooses the first found audio & video stream, and sets the sh pointers of d_audio and d_video according to them. So later it reads only these streams. Of course the user can force choosing a specific stream with -vid and -aid switches. A good example for this is the DVD, where the english stream is not always the first, so every VOB has different language :) That's when we have to use for example the -aid 128 switch. Now, how this reading works? - demuxer.c/demux_read_data() is called, it gets how many bytes, and where (memory address), would we like to read, and from which DS. The codecs call this. - this checks if the given DS's buffer contains something, if so, it reads from there as much as needed. If there isn't enough, it calls ds_fill_buffer(), which: - checks if the given DS has buffered packages (DP's), if so, it moves the oldest to the buffer, and reads on. If the list is empty, it calls demux_fill_buffer() : - this calls the parser for the input format, which reads the file onward, and moves the found packages to their buffers. Well it we'd like an audio package, but only a bunch of video packages are available, then sooner or later the: DEMUXER: Too many (%d in %d bytes) audio packets in the buffer error shows up. So everything is ok 'till now, I want to move them to a separate lib. Now, go on: 3. mplayer.c - ooh, he's the boss :) The timing is solved odd, since it has/recommended to be done differently for each of the formats, and sometimes can be done in many ways. There are the a_frame and v_frame float variables, they store the just played a/v position is seconds. A new frame is displayed if v_frame<a_frame, and sound is decoded if a_frame<v_frame. When playing (a/v), it increases the variables by the duration of the played a/v. In video, it's usually 1.0/fps, but I have to mention that fps doesn't really matters at video, for example asf doesn't have that, instead there is "duration" and it can change per frame. MPEG2 has "repeat_count" which delays the frame by 1-2.5 ... Maybe only AVI and MPEG1 has fixed fps. So everything works right until the audio and video are in perfect synchronity, since the audio goes, it gives the timing, and if the time of a frame passed, the next frame is displayed. But what if these two aren't synchronized in the input file? PTS correction kicks in. The input demuxers read the PTS (presentation timestamp) of the packages, and with it we can see if the streams are synchronized. Then MPlayer can correct the a_frame, within a given maximal bounder (see -mc option). The summary of the corrections can be found in c_total . Of course this is not everything, several things suck. For example the soundcards delay, which has to be corrected by MPlayer: that's why it needs the size of the audio buffer. It can be measured with select(), which is unfortunately not supported by every card... That's when it has to be given with the -abs option. Then there's another problem: in MPEG, the PTS is not given by frames, rather by sectors, which can contain 10 frames, or only 0.1 . In order this won't fuck up timing, we average the PTS by 5 frames, and use this when correcting. Life didn't get simpler with AVI. There's the "official" timing method, the BPS-based, so the header contains how many compressed audio bytes belong to one second of frames. Of course this doesn't always work... why it should :) So I emulate the MPEG's PTS/sector method on AVI, that is the AVI parser calculates a fake PTS for every read chunk, decided by the type of the frames. This is how my timing is done. And sometimes this works better. In AVI, usually there is a bigger piece of audio stored first, then comes the video. This needs to be calculated into the delay, this is called "Initial PTS delay". Of course there are 2 of them, one is stored in the header and not really used :) the other isn't stored anywhere, this can only be measured... 4. Codecs. They are separate libs. For example libac3, libmpeg2, xa/*, alaw.c, opendivx/*, loader, mp3lib. mplayer.c calls them if a piece of audio or video needs to be played. (see the beginning of 3.) And they call the appropriate demuxer, to get the compressed data. (see 2.) We have to pass the appropriate stream header as parameter (sh_audio/ sh_video), this should contain all the needed info for decoding (the demuxer too: sh->ds). The codecs' seprating is underway, the audio is already done, the video is work-in-progress. The aim is that mplayer.c won't have to know which are the codecs and how to use 'em, instead it should call an init/decode audio/video function. 5. libvo: this displays the frame. The constants for different pixelformats are defined in img_format.h, their usage is mandatory. Each vo driver _has_ to implement these: query_format() - queries if a given pixelformat is supported. return value: flags: 0x1 - supported (by hardware or conversion) 0x2 - supported (by hardware, without conversion) 0x4 - sub/osd supported (has draw_alpha) IMPORTANT: it's mandatorial that every vo driver support the YV12 format, and one (or both) of BGR15 and BGR24, with conversion, if needed. If these aren't supported, not every codec will work! The mpeg codecs can output only YV12, and the older win32 DLLs only 15 and 24bpp. There is a fast MMX-using 15->16bpp converter, so it's not a significant speed-decrease! The BPP table, if the driver can't change bpp: current bpp has to accept these 15 15 16 15,16 24 24 24,32 24,32 If it can change bpp (for example DGA 2, fbdev, svgalib), then if possible we have to change to the desired bpp. If the hardware doesn't support, we have to change to the one closest to it, and do conversion! init() - this is called before displaying of the first frame - initializing buffers, etc. draw_slice(): this displays YV12 pictures (3 planes, one full sized that contains brightness (Y), and 2 quarter-sized which the colour-info (U,V). MPEG codecs (libmpeg2, opendivx) use this. This doesn't have to display the whole frame, only update small parts of it. draw_frame(): this is the older interface, this displays only complete frames, and can do only packed format (YUY2, RGB/BGR). Win32 codecs use this (DivX, Indeo, etc). draw_alpha(): this displays subtitles and OSD. It's a bit tricky to use it, since it's not a part of libvo API, but a callback-style stuff. The flip_page() has to call vo_draw_text(), so that it passes the size of the screen and the corresponding draw_alpha() implementation for the pixelformat (function pointer). The vo_draw_text() checks the characters to draw, and calls draw_alpha() for each. As a help, osd.c contains draw_alpha for each pixelformats, use this if possible! flip_page(): this is called after each frame, this diplays the buffer for real. This is 'swapbuffers' when double-buffering.