Mercurial > mplayer.hg
changeset 23242:1925a60a5d56
These files are now in the separate NUT repository.
author | diego |
---|---|
date | Tue, 08 May 2007 08:37:52 +0000 |
parents | e42491f6fa84 |
children | 0ec346252484 |
files | DOCS/tech/nut.txt DOCS/tech/oggless-xiph-codecs.txt |
diffstat | 2 files changed, 0 insertions(+), 1202 deletions(-) [+] |
line wrap: on
line diff
--- a/DOCS/tech/nut.txt Mon May 07 23:26:40 2007 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1075 +0,0 @@ -================================== -NUT Open Container Format 20061104 -================================== - - - -Intro: -====== - -NUT is a free multimedia container format for storage of audio, video, -subtitles and related user defined streams, it provides exact timestamps for -synchronization and seeking, is simple, has low overhead and can recover -in case of errors in the stream. - -Other common multimedia container formats are AVI, Ogg, Matroska, MP4, MOV -ASF, MPEG-PS, MPEG-TS. - - -Features / goals: - (supported by the format, not necessarily by a specific implementation) - -Simplicity - Use the same encoding for nearly all fields. - Simple decoding, so slow CPUs (and embedded systems) can handle it. - -Extensibility - No limit for the possible values of all fields (using universal vlc). - Allow adding of new headers in the future. - Allow adding more fields at the end of headers. - -Compactness - ~0.2% overhead for normal bitrates. - The index is <100kb per hour. - A typical file header is about 100 bytes (audio + video headers together). - A packet header is about ~1-5 bytes. - -Error resistance - Seeking / playback is possible without an index. - Headers & index can be repeated. - Damaged files can be played back with minimal data loss and fast - resynchronization times. - -The specification is frozen. All files following the specification will be -compatible unless the specification is unfrozen. - - -Definitions: -============ - -MUST The specific part must be done to conform to this standard. -SHOULD It is recommended to be done that way, but not strictly required. - -keyframe - A keyframe is a frame from which you can start decoding, a more - exact definition is below - The nth frame is a keyframe if and only if frames n, n+1, ... in - presentation order (that are all frames with a pts >= frame[n].pts) can - be decoded successfully without reference to frames prior n in storage - order (that are all frames with a dts < frame[n].dts). - If no such frames exist (for example due to using overlapped transforms - like the MDCT in an audio codec), then the definition shall be extended - by dropping n out of the set of frames which must be decodable, if this - is still insufficient then n+1 shall be dropped, and so on until there is - a keyframe. - Every frame which is marked as a keyframe MUST be a keyframe according to - the definition above, a muxer MUST mark every frame it knows is a keyframe - as such, a muxer SHOULD NOT analyze future frames to determine the - keyframe status of the current frame but instead just set the frame as - non-keyframe. - (FIXME maybe move somewhere else?) -pts - Presentation time of the first frame/sample that is completed by decoding - the coded frame. -dts - The time when a frame is input into a synchronous 1-in-1-out decoder. - - -Syntax: -======= - -Since NUT heavily uses variable length fields, the simplest way to describe it -is using a pseudocode approach. - - - -Conventions: -============ - -The data types have a name, used in the bitstream syntax description, a short -text description and a pseudocode (functional) definition, optional notes may -follow: - -name (text description) - functional definition - [Optional notes] - -The bitstream syntax elements have a tagname and a functional definition, they -are presented in a bottom-up approach, again optional notes may follow and -are reproduced in the tag description: - -name: (optional note) - functional definition - [Optional notes] - -The in-depth tag description follows the bitstream syntax. -The functional definition has a C-like syntax. - - - -Type definitions: -================= - -f(n) (n fixed bits in big-endian order) -u(n) (unsigned number encoded in n bits in MSB-first order) - -v (variable length value, unsigned) - value=0 - do{ - more_data u(1) - data u(7) - value= 128*value + data - }while(more_data) - -s (variable length value, signed) - temp v - temp++ - if(temp&1) value= -(temp>>1) - else value= (temp>>1) - -b (binary data or string, to be use in vb, see below) - for(i=0; i<length; i++){ - data[i] u(8) - } - [Note: strings MUST be encoded in UTF-8] - [Note: the character NUL (U+0000) is not legal within - or at the end of a string.] - -vb (variable length binary data or string) - length v - value b - -t (v coded universal timestamp) - tmp v - id= tmp % time_base_count - value= (tmp / time_base_count) * time_base[id] - - -Bitstream syntax: -================= - -file: - file_id_string - while(!eof){ - if(next_byte == 'N'){ - packet_header - switch(startcode){ - case main_startcode: main_header; break; - case stream_startcode:stream_header; break; - case info_startcode: info_packet; break; - case index_startcode: index; break; - case syncpoint_startcode: syncpoint; break; - } - packet_footer - }else - frame - } - -The structure of an undamaged file should look like the following, but -demuxers should be flexible and be able to deal with damaged headers so the -above is a better loop in practice (not to mention it is simpler). -Note: Demuxers MUST be able to deal with new and unknown headers. - -file: - file_id_string - while(!eof){ - packet_header, main_header, packet_footer - reserved_headers - for(i=0; i<stream_count; i++){ - packet_header, stream_header, packet_footer - reserved_headers - } - while(next_code == info_startcode){ - packet_header, info_packet, packet_footer - reserved_headers - } - if(next_code == index_startcode){ - packet_header, index_packet, packet_footer - } - if (!eof) while(next_code != main_startcode){ - if(next_code == syncpoint_startcode){ - packet_header, syncpoint, packet_footer - } - frame - reserved_headers - } - } - - -Common elements: ----------------- - -reserved_bytes: - for(i=0; i<forward_ptr - length_of_non_reserved; i++) - reserved u(8) - [A demuxer MUST ignore any reserved bytes. - A muxer MUST NOT write any reserved bytes, as this would make it - impossible to add new fields at the end of packets in the future - in a compatible way.] - -packet_header - startcode f(64) - forward_ptr v - if(forward_ptr > 4096) - header_checksum u(32) - -packet_footer - checksum u(32) - -reserved_headers - while(next_byte == 'N' && next_code != main_startcode - && next_code != stream_startcode - && next_code != info_startcode - && next_code != index_startcode - && next_code != syncpoint_startcode){ - packet_header - reserved_bytes - packet_footer - } - - Headers: - -main_header: - version v - stream_count v - max_distance v - time_base_count v - for(i=0; i<time_base_count; i++) - time_base_num v - time_base_denom v - time_base[i]= time_base_num/time_base_denom - tmp_pts=0 - tmp_mul=1 - tmp_stream=0 - for(i=0; i<256; ){ - tmp_flag v - tmp_fields v - if(tmp_fields>0) tmp_pts s - if(tmp_fields>1) tmp_mul v - if(tmp_fields>2) tmp_stream v - if(tmp_fields>3) tmp_size v - else tmp_size=0 - if(tmp_fields>4) tmp_res v - else tmp_res=0 - if(tmp_fields>5) count v - else count= tmp_mul - tmp_size - for(j=6; j<tmp_fields; j++){ - tmp_reserved[i] v - } - for(j=0; j<count && i<256; j++, i++){ - if (i == 'N') { - flags[i]= FLAG_INVALID; - j--; - continue; - } - flags[i]= tmp_flag; - stream_id[i]= tmp_stream; - data_size_mul[i]= tmp_mul; - data_size_lsb[i]= tmp_size + j; - pts_delta[i]= tmp_pts; - reserved_count[i]= tmp_res; - } - } - reserved_bytes - -stream_header: - stream_id v - stream_class v - fourcc vb - time_base_id v - msb_pts_shift v - max_pts_distance v - decode_delay v - stream_flags v - codec_specific_data vb - if(stream_class == video){ - width v - height v - sample_width v - sample_height v - colorspace_type v - }else if(stream_class == audio){ - samplerate_num v - samplerate_denom v - channel_count v - } - reserved_bytes - - Basic Packets: - -frame: - frame_code f(8) - frame_flags= flags[frame_code] - frame_res= reserved_count[frame_code] - if(frame_flags&FLAG_CODED){ - coded_flags v - frame_flags ^= coded_flags - } - if(frame_flags&FLAG_STREAM_ID){ - stream_id v - } - if(frame_flags&FLAG_CODED_PTS){ - coded_pts v - } - if(frame_flags&FLAG_SIZE_MSB){ - data_size_msb v - } - if(frame_flags&FLAG_RESERVED) - frame_res v - for(i=0; i<frame_res; i++) - reserved v - if(frame_flags&FLAG_CHECKSUM){ - checksum u(32) - } - data - -index: - max_pts t - syncpoints v - for(i=0; i<syncpoints; i++){ - syncpoint_pos_div16 v - } - for(i=0; i<stream_count; i++){ - last_pts= -1 - for(j=0; j<syncpoints; ){ - x v - type= x & 1 - x>>=1 - n=j - if(type){ - flag= x & 1 - x>>=1 - while(x--) - has_keyframe[n++][i]=flag - has_keyframe[n++][i]=!flag; - }else{ - while(x != 1){ - has_keyframe[n++][i]=x&1; - x>>=1; - } - } - for(; j<n && j<syncpoints; j++){ - if (!has_keyframe[j][i]) continue - A v - if(!A){ - A v - B v - eor_pts[j][i] = last_pts + A + B - }else - B=0 - keyframe_pts[j][i] = last_pts + A - last_pts += A + B - } - } - } - reserved_bytes - index_ptr u(64) - -info_packet: - stream_id_plus1 v - chapter_id s (Note: Due to a typo this was v - until 2006-11-04.) - chapter_start t - chapter_len v - count v - for(i=0; i<count; i++){ - name vb - value s - if (value==-1){ - type= "UTF-8" - value vb - }else if (value==-2){ - type vb - value vb - }else if (value==-3){ - type= "s" - value s - }else if (value==-4){ - type= "t" - value t - }else if (value<-4){ - type= "r" - value.den= -value-4 - value.num s - }else{ - type= "v" - } - } - reserved_bytes - -syncpoint: - global_key_pts t - back_ptr_div16 v - reserved_bytes - - Complete definition: - - -Tag description: ----------------- - -file_id_string - "nut/multimedia container\0" - The very first thing in every NUT file, useful for identifying NUT files. - -*_startcode (f(64)) - all startcodes start with 'N' - -main_startcode (f(64)) - 0x7A561F5F04ADULL + (((uint64_t)('N'<<8) + 'M')<<48) - -stream_startcode (f(64)) - 0x11405BF2F9DBULL + (((uint64_t)('N'<<8) + 'S')<<48) - -syncpoint_startcode (f(64)) - 0xE4ADEECA4569ULL + (((uint64_t)('N'<<8) + 'K')<<48) - -index_startcode (f(64)) - 0xDD672F23E64EULL + (((uint64_t)('N'<<8) + 'X')<<48) - -info_startcode (f(64)) - 0xAB68B596BA78ULL + (((uint64_t)('N'<<8) + 'I')<<48) - -version (v) - NUT version. The current value is 3. All lower values are pre-freeze. - -stream_count (v) - number of streams in this file - -time_base_count (v) - number of different time bases in this file - This MUST NOT be 0. - -forward_ptr (v) - Size of the packet data (exactly the distance from the first byte - after the packet_header to the first byte of the next packet). - Every NUT packet contains a forward_ptr immediately after its startcode - with the exception of frame_code-based packets. The forward pointer - can be used to skip over the packet without decoding its contents. - -max_distance (v) - maximum distance between startcodes. If p1 and p2 are the byte - positions of the first byte of two consecutive startcodes, then - p2-p1 MUST be less than or equal to max_distance unless the entire - span from p1 to p2 comprises a single packet or a syncpoint - followed by a single frame. This imposition places efficient upper - bounds on seek operations and allows for the detection of damaged - frame headers, should a chain of frame headers pass max_distance - without encountering any startcode. - - Syncpoints SHOULD be placed immediately before a keyframe if the - previous frame of the same stream was a non-keyframe, unless such - non-keyframe - keyframe transitions are very frequent. - - SHOULD be set to <=32768. - If the stored value is >65536 then max_distance MUST be set to 65536. - - This is also half the maximum frame size without a checksum after the - frame header. - - -max_pts_distance (v) - Maximum absolute difference of the pts of the new frame from last_pts in - the timebase of the stream, without a checksum after the frame header. - A frame header MUST include a checksum if abs(pts-last_pts) is - strictly greater than max_pts_distance. - Note that last_pts is not necessarily the pts of the last frame - on the same stream, as it is altered by syncpoint timestamps. - SHOULD NOT be higher than 1/timebase. - -stream_id (v) - Stream identifier - stream_id MUST be < stream_count - -stream_class (v) - 0 video - 1 audio - 2 subtitles - 3 userdata - Note: The remaining values are reserved and MUST NOT be used. - A demuxer MUST ignore streams with reserved classes. - -fourcc (vb) - identification for the codec - example: "H264" - MUST contain 2 or 4 bytes, note, this might be increased in the future - if needed. - The ID values used are the same as in AVI, so if a codec uses a specific - FourCC in AVI then the same FourCC MUST be used here. - -time_base_num (v) / time_base_denom (v) = time_base - the length of a timer tick in seconds, this MUST be equal to the 1/fps - if FLAG_FIXED_FPS is set - time_base_num and time_base_denom MUST NOT be 0 - time_base_num and time_base_denom MUST be relatively prime - time_base_denom MUST be < 2^31 - examples: - fps time_base_num time_base_denom - 30 1 30 - 29.97 1001 30000 - 23.976 1001 24000 - There MUST NOT be 2 identical timebases in a file. - There SHOULD NOT be more timebases than streams. - -time_base_id (v) - index into the time_base table - MUST be < time_base_count. - -convert_ts - To switch from 2 different timebases, the following calculation is - defined: - - ln = from_time_base_num*to_time_base_denom - sn = from_timestamp - d1 = from_time_base_denom - d2 = to_time_base_num - timestamp = (ln/d1*sn + ln%d1*sn/d1)/d2 - Note: This calculation MUST be done with unsigned 64 bit integers, and - is equivalent to (ln*sn)/(d1*d2) but this would require a 96 bit integer. - -compare_ts - Compares timestamps from 2 different timebases, - if a is before b then compare_ts(a, b) = -1 - if a is after b then compare_ts(a, b) = 1 - else compare_ts(a, b) = 0 - - Care must be taken that this is done exactly with no rounding errors, - simply casting to float or double and doing the obvious - a*timebase > b*timebase is not compliant or correct, neither is the - same with integers, and - a*a_timebase.num*b_timebase.den > b*b_timebase.num*a_timebase.den - will overflow. One possible implementation which shouldn't overflow - within the range of legal timestamps and timebases is: - - if (convert_ts(a, a_timebase, b_timebase) < b) return -1; - if (convert_ts(b, b_timebase, a_timebase) < a) return 1; - return 0; - -msb_pts_shift (v) - amount of bits in lsb_pts - MUST be <16. - -decode_delay (v) - Size of the reordering buffer used to convert pts to dts. - Codecs which do not support B-frames normally use 0. - MPEG-1/MPEG-2-style codecs with B-frames use 1. - H.264-style B-pyramid uses 2. - H.264 and future codecs might need values >2. - Audio codecs generally use 0. (We are not aware of any, but it - is theoretically possible that a codec might need a value >0.) - decode_delay MUST NOT be set higher than necessary for a codec. - -stream_flags (v) - Bit Name Description - 1 FLAG_FIXED_FPS indicates that the fps is fixed - -codec_specific_data (vb) - Private global data for a codec (could be huffman tables or ...). - If a codec has a global header it SHOULD be placed in here instead of - at the start of every keyframe. - The exact format is specified in the codec specification. - For H.264 the NAL units MUST be formatted as in a bytestream - (with 00 00 01 prefixes). - codec_specific_data SHOULD contain exactly the essential global packets - needed to decode a stream, more specifically it SHOULD NOT contain packets - which contain only non essential metadata like author, title, ... - It also MUST NOT contain normal packets which cause the reference decoder - to generate any specific decoded samples. - The encoder name and version shall be considered essential as it is very - useful to work around possible encoder bugs. - The global headers MUST consist of the normal - sequence of header packets required for codec initialization, in the - order defined in the codec spec. An implementation MAY strip metadata and - other redundant information not necessary for correct playback from the - global headers as long as no incorrect values are stored and as long as - the stripped result is not less valid per codec spec as before stripping. - -frame_code (f(8)) - frame_code is an 8-bit field which exists before every frame, it can - store part of the size of the frame, the stream number, the timestamp - and some flags amongst other things. What is not directly stored - in it but is needed is stored in various fields immediately after it. - The values stored in it can be found in the main header. - The value 78 ('N') is forbidden to ensure that the byte is always - different from the first byte of any startcode. - A muxer SHOULD mark 0x00 and 0xFF as invalid to improve error - detection. - -flags[frame_code], frame_flags (v) - Bit Name Description - 0 FLAG_KEY If set, the frame is a keyframe. - 1 FLAG_EOR If set, the stream has no relevance on - presentation. (EOR) - 3 FLAG_CODED_PTS If set, coded_pts is in the frame header. - 4 FLAG_STREAM_ID If set, stream_id is coded in the frame header. - 5 FLAG_SIZE_MSB If set, data_size_msb at the frame header, - otherwise data_size_msb is 0. - 6 FLAG_CHECKSUM If set, the frame header contains a checksum. - 7 FLAG_RESERVED If set, reserved_count is coded in the frame header. - 12 FLAG_CODED If set, coded_flags are stored in the frame header. - 13 FLAG_INVALID If set, frame_code is invalid. - - EOR frames MUST be zero-length and must be set keyframe. - All streams SHOULD end with EOR, where the pts of the EOR indicates the - end presentation time of the final frame. - An EOR set stream is unset by the first content frames. - EOR can only be unset in streams with zero decode_delay . - FLAG_CHECKSUM MUST be set if the frame's data_size is strictly greater than - 2*max_distance or the difference abs(pts-last_pts) is strictly greater than - max_pts_distance (where pts represents this frame's pts and last_pts is - defined as below). - -last_pts - The timestamp of the last frame with the same stream_id as the current. - If there is no such frame between the last syncpoint and the current - frame then the syncpoint timestamp is used, see global_key_pts. - -stream_id[frame_code] (v) - If FLAG_STREAM_ID is not set then this is the stream number for the - frame following this frame_code. - If FLAG_STREAM_ID is set then this value has no meaning. - MUST be <250. - -data_size_mul[frame_code] (v) - If FLAG_SIZE_MSB is set then data_size_msb which is stored after the - frame code is multiplied with it and forms the more significant part - of the size of the following frame. - If FLAG_SIZE_MSB is not set then this field has no meaning. - MUST be <16384. - -data_size_lsb[frame_code] (v) - The less significant part of the size of the following frame. - This added together with data_size_mul*data_size_msb is the size of - the following frame. - MUST be <16384. - -pts_delta[frame_code] (s) - If FLAG_CODED_PTS is set in the flags of the current frame then this - value MUST be ignored, if FLAG_CODED_PTS is not set then pts_delta is the - difference between the current pts and last_pts. - MUST be <16384 and >-16384. - -reserved_count[frame_code] (v) - MUST be <256. - -data_size - The size of the following frame. - data_size = data_size_lsb + data_size_msb * data_size_mul ; - -coded_pts (v) - If coded_pts < ( 1 << msb_pts_shift ) then it is an lsb - pts, otherwise it is a full pts + ( 1 << msb_pts_shift ). - lsb pts is converted to a full pts by: - mask = ( 1 << msb_pts_shift ) - 1; - delta = last_pts - mask / 2 - pts = ( (pts_lsb - delta) & mask ) + delta - -lsb_pts - Least significant bits of the pts in time_base precision. - Example: IBBP display order - keyframe pts=0 -> pts=0 - frame lsb_pts=3 -> pts=3 - frame lsb_pts=1 -> pts=1 - frame lsb_pts=2 -> pts=2 - ... - keyframe msb_pts=257 -> pts=257 - frame lsb_pts=255 -> pts=255 - frame lsb_pts=0 -> pts=256 - frame lsb_pts=4 -> pts=260 - frame lsb_pts=2 -> pts=258 - frame lsb_pts=3 -> pts=259 - All pts values of keyframes of a single stream MUST be monotone. - -dts - decoding timestamp - The dts of a frame is the timestamp of the first sample which is - output by a decoder when it is fed with the frame. Note that the - data output is not necessarily what is coded in the frame, but may - be data from previous frames. - dts is calculated by using a decode_delay + 1 sized buffer for each - stream, into which the current pts is inserted and the element with - the smallest value is removed. This is then the current dts. - This buffer is initialized with decode_delay - 1 elements. - - pts of all frames in all streams MUST be bigger or equal to dts of all - previous frames in all streams, compared in common timebase. (EOR - frames are NOT exempt from this rule.) - dts of all frames MUST be bigger or equal to dts of all previous frames - in the same stream. - -width (v) / height (v) - Width and height of the video in pixels. - MUST be set to the coded width/height, MUST NOT be 0. - -sample_width (v) /sample_height (v) (aspect ratio) - sample_width is the horizontal distance between samples. - sample_width and sample_height MUST be relatively prime if not zero. - Both MUST be 0 if unknown otherwise both MUST be nonzero. - -colorspace_type (v) - 0 unknown - 1 ITU Rec 624 / ITU Rec 601 Y range: 16..235 Cb/Cr range: 16..240 - 2 ITU Rec 709 Y range: 16..235 Cb/Cr range: 16..240 - 17 ITU Rec 624 / ITU Rec 601 Y range: 0..255 Cb/Cr range: 0..255 - 18 ITU Rec 709 Y range: 0..255 Cb/Cr range: 0..255 - -samplerate_num (v) / samplerate_denom (v) = samplerate - The number of samples per second, MUST NOT be 0. - -crc32 checksum - Generator polynomial is 0x104C11DB7. Starting value is zero. - -checksum (u(32)) - crc32 checksum - The checksum is calculated for the area pointed to by forward_ptr - not including the checksum itself (from first byte after the - packet_header until last byte before the checksum). - For frame headers the checksum contains the framecode byte and all - following bytes up to the checksum itself. - -header_checksum (u(32)) - Checksum over the startcode and forward pointer. - -Syncpoint tags: ---------------- - -back_ptr_div16 (v) - back_ptr = back_ptr_div16 * 16 + 15 - back_ptr must point to a position up to 15 bytes before a syncpoint - startcode, relative to position of current syncpoint. The syncpoint - pointed to MUST be the closest syncpoint such that at least one keyframe - with a pts lower or equal to the current syncpoint's global_key_pts for - all streams lies between it and the current syncpoint. - - A stream where EOR is set is to be ignored for back_ptr. - -global_key_pts (t) - After a syncpoint, last_pts of each stream is to be set to: - last_pts[i] = convert_ts(global_key_pts, time_base[id], time_base[i]) - - global_key_pts MUST be bigger or equal to dts of all past frames across - all streams, and smaller or equal to pts of all future frames. - -Index tags: ------------ - -max_pts (t) - the highest pts in the entire file - -syncpoints (v) - number of indexed syncpoints - -syncpoint_pos_div16 (v) - The offset from the beginning of the file to up to 15 bytes before the - syncpoint referred to in this index entry. Relative to position of last - syncpoint. - -has_keyframe - Indicates whether this stream has a keyframe between this syncpoint and - the last syncpoint. - -keyframe_pts - The pts of the first keyframe for this stream in the region between the - 2 syncpoints, in the stream's timebase. (EOR frames are also keyframes.) - -eor_pts - Coded only if EOR is set at the position of the syncpoint. The pts of - that EOR. EOR is unset by the first keyframe after it. - -index_ptr (u(64)) - Length in bytes of the entire index, from the first byte of the - startcode until the last byte of the checksum. - Note: A demuxer can use this to find the index when it is written at - EOF, as index_ptr will always be 12 bytes before the end of file if - there is an index at all. - - -Info tags: ----------- - -stream_id_plus1 (v) - Stream this info packet applies to. If zero, packet applies to the - whole file. - -chapter_id (s) - The ID of the chapter this packet applies to. If zero, the packet applies - to the whole file. Positive chapter_id values represent real chapters and - MUST NOT overlap. - A negative chapter_id indicates a sub region of the file and not a real - chapter. chapter_id MUST be unique to the region it represents. - chapter_id n MUST NOT be used unless there are at least n chapters in the - file. - -chapter_start (t) - timestamp of start of chapter - -chapter_len (v) - Length of chapter in the same timebase as chapter_start. - -count (v) - number of name/value pairs in this info packet - -type - for example: "UTF8" -> string or "JPEG" -> JPEG image - "v" -> unsigned integer - "s" -> signed integer - "r" -> rational - Note: Nonstandard fields should be prefixed by "X-". - Note: MUST be less than 6 byte long (might be increased to 64 later). - -info packet types - The name of the info entry. Valid names are - "Author" - "Description" - "Copyright" - "Encoder" - The name & version of the software used for encoding. - "Title" - "Cover" (allowed types are "PNG" and "JPEG") - image of the (CD, DVD, VHS, ..) cover (preferably PNG or JPEG) - "Source" - "DVD", "VCD", "CD", "MD", "FM radio", "VHS", "TV", "LD" - Optional: Appended PAL, NTSC, SECAM, ... in parentheses. - "SourceContainer" - "nut", "mkv", "mov", "avi", "ogg", "rm", "mpeg-ps", "mpeg-ts", "raw" - "SourceCodecTag" - The source codec ID like a FourCC which was used to store a specific - stream in its SourceContainer. - "CaptureDevice" - "BT878", "BT848", "webcam", ... (or more precise names) - "CreationTime" - "2003-01-20 20:13:15Z", ... - (ISO 8601 format, see http://www.cl.cam.ac.uk/~mgk25/iso-time.html) - Note: Do not forget the timezone. - "Keywords" - "Language" - An ISO 639-2 (three-letter) language code, optionally followed by an - ISO 3166-1 country code that is separated from the language - code by a hyphen. All codes defined in ISO 639-2 are allowed, - including "und" (Undetermined), "mul" (Multiple languages). - See http://www.loc.gov/standards/iso639-2/ - and http://www.din.de/gremien/nas/nabd/iso3166ma/codlstp1/en_listp1.html - the language code - A demuxer MUST ignore unknown language and country codes instead of - treating them as an error. - "Disposition" - "original", "dub" (translated), "comment", "lyrics", "karaoke" - Note: If someone needs some others, please tell us about them, so we - can add them to the official standard (if they are sane). - Note: Nonstandard fields should be prefixed by "X-". - Note: Names of fields SHOULD be in English if a word with the same - meaning exists in English. - Note: MUST be less than 64 bytes long. - -value - value of this name/type pair - -stuffing - 0x80 can be placed in front of any type v entry for stuffing purposes. - Exceptions are the forward_ptr and all fields in the frame header where - a maximum of 8 stuffing bytes per field are allowed. - - -Structure: ----------- - -The headers MUST be in exactly the following order (to simplify demuxer design). - -main header -stream_header (id=0) -stream_header (id=1) -... -stream_header (id=n) - -Headers may be repeated, but if they are, then they MUST all be repeated -together and repeated headers MUST be identical. - -Each set of repeated headers not at the beginning or end of the file SHOULD -be stored at the earliest possible position after 2^x where x is an integer -and the end of the file. So the headers may be repeated at 4102 if that is -the closest position after 2^12=4096 at which the headers can be placed. - -Note: This allows an implementation reading the file to locate backup -headers in O(log filesize) time as opposed to O(filesize). - -Headers MUST be placed at least at the start of the file and immediately before -the index or at the end of the file if there is no index. -Headers MUST be repeated at least twice (so they exist three times in a file). - -There MUST be a syncpoint immediately before the first frame after any headers. - - -Index: ------- - -Note: With realtime streaming, there is no end, so no index there either. -Index MAY only be repeated after main headers. -If an index is written anywhere in the file, it MUST be written at end of -file as well. - - -Info: ------ - -If an info packet is stored anywhere then a muxer MUST also store an identical -info packet after every main-stream-header set. - -If a demuxer has seen several info packets with the same chapter_id and -stream_id then it MUST ignore all but the one with the highest position in -the file. - -Demuxers SHOULD NOT search the whole file for info packets. - -demuxer (non-normative): ------------------------- - -In the absence of a valid header at the beginning, players SHOULD search for -backup headers starting at offset 2^x; for each x players SHOULD end their -search at a particular offset when any startcode (including a syncpoint) is -found. - - -Seeking without an index (non-normative): ------------------------------------------ -A. backward seeking - 1. Perform a binary search on the syncpoint timestamps finding the one - which is largest and <= the target timestamp. -B. forward seeking - 1a. Perform a binary search on the syncpoint timestamps finding the one - which is smallest and >= the target timestamp. - 1b. Perform a binary search on the syncpoint back pointers finding the - smallest one which has a back ptr >= the position of what was found in 1. -2. Follow the back pointer to the corresponding syncpoint. - -Seeking with an index (non-normative): --------------------------------------- -The demuxer only has to find the appropriate keyframe in the index and -start demuxing from the previous syncpoint. - -Note, more complicated seeking methods exist which are capable of quickly -seeking to the optimal point in the presence of an index even if only a -subset of all streams is active. - -A muxer SHOULD place syncpoints so that that simple low complexity seeking -works with fine granularity. That is, syncpoints should be placed prior -to keyframes instead of non-keyframes and with high enough frequency -(once per second unless there are no keyframes between this and the previous -syncpoint). - -Encoders SHOULD place keyframes so that the number of points where all -streams have a keyframe at the same time is maximized. This ensures that -seeking (complicated or not) does not need to demux and decode significant -amounts of data to reach a point where a presentable frame for each stream -is available after seeking. - - -Semantic requirements: -====================== - -If more than one stream of a given stream class is present, each one SHOULD -have info tags specifying disposition, and if applicable, language. -It often highly improves usability and is therefore strongly encouraged. - -A demuxer MUST NOT demux a stream which contains more than one stream, or which -is wrapped in a structure to facilitate more than one stream or otherwise -duplicate the role of a container. Any such file is to be considered invalid. -For example Vorbis in Ogg in NUT is invalid, as is -mpegvideo + mpegaudio in MPEG-PS/TS in NUT or dvvideo + dvaudio in DV in NUT. - - - -Sample code (Public Domain, & untested): -======================================== - -typedef BufferContext{ - uint8_t *buf; - uint8_t *buf_ptr; -}BufferContext; - -static inline uint64_t get_bytes(BufferContext *bc, int count){ - uint64_t val=0; - - assert(count>0 && count<9); - - for(i=0; i<count; i++){ - val <<=8; - val += *(bc->buf_ptr++); - } - - return val; -} - -static inline void put_bytes(BufferContext *bc, int count, uint64_t val){ - uint64_t val=0; - - assert(count>0 && count<9); - - for(i=count-1; i>=0; i--){ - *(bc->buf_ptr++)= val >> (8*i); - } - - return val; -} - -static inline uint64_t get_v(BufferContext *bc){ - uint64_t val= 0; - - for(; space_left(bc) > 0; ){ - int tmp= *(bc->buf_ptr++); - if(tmp&0x80) - val= (val<<7) + tmp - 0x80; - else - return (val<<7) + tmp; - } - - return -1; -} - -static inline int put_v(BufferContext *bc, uint64_t val){ - int i; - - if(space_left(bc) < 9) return -1; - - val &= 0x7FFFFFFFFFFFFFFFULL; // FIXME: Can only encode up to 63 bits ATM. - for(i=7; ; i+=7){ - if(val>>i == 0) break; - } - - for(i-=7; i>0; i-=7){ - *(bc->buf_ptr++)= 0x80 | (val>>i); - } - *(bc->buf_ptr++)= val&0x7F; - - return 0; -} - -static int64_t get_dts(int64_t pts, int64_t *pts_cache, int delay, int reset){ - if(reset) memset(pts_cache, -1, delay*sizeof(int64_t)); - - while(delay--){ - int64_t t= pts_cache[delay]; - if(t < pts){ - pts_cache[delay]= pts; - pts= t; - } - } - - return pts; -} - - - -Authors: -======== - -Folks from the MPlayer developers mailing list (http://www.mplayerhq.hu/). -Authors in alphabetical order: (FIXME! Tell us if we left you out) - Beregszaszi, Alex (alex@fsn.hu) - Bunkus, Moritz (moritz@bunkus.org) - Diedrich, Tobias (ranma+mplayer@tdiedrich.de) - Felker, Rich (dalias@aerifal.cx) - Franz, Fabian (FabianFranz@gmx.de) - Gereoffy, Arpad (arpi@thot.banki.hu) - Hess, Andreas (jaska@gmx.net) - Niedermayer, Michael (michaelni@gmx.at) - Shimon, Oded (ods15@ods15.dyndns.org)
--- a/DOCS/tech/oggless-xiph-codecs.txt Mon May 07 23:26:40 2007 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,127 +0,0 @@ -Title Embedding xiph codecs like vorbis in containers other then ogg -Version 2006-07-30 (draft) -Status this is not a standard or otherwise accepted by xiph or any other - group, one day when we have a fully working implementation, did - enough testing and so on we might submit it to IETF? to become an - RFC ... - furthermore this document has been submitted to vorbis-dev and - so far has been ignored, maybe they where too busy maybe xiph - wants to prevent their open codecs from being used in containers - other then their own? -Author Michael Niedermayer (michaelni at gmx dot at) -License GPL + GFDL + anything neeeded to turn this into a open standard - like a RFC - -Minimum container requirments: -This appendix only explains how to store xiph codecs in containers which -support at least one global header per stream, can separate individual codec -packets and in principle support the codec, so for example in the case of -vorbis that would be variable bitrate and variable number of samples/packet -Storage in other containers is outside the scope of this appendix - - -FIXME non vorbis -Global header: -If the container can store 3 headers per stream in an unambiguos and ordered -way then they shall be stored in that way, if OTOH the container is only -capable to store a single global header then the 3 codec headers shall be -concatenated without any additional header, footer or separator between them -to recover the 3 headers from such a global header the following procedure -shall be used: - -1) search for the 1st occurance of 01,'v','o','r','b','i','s' - the found match and the following 23 bytes are the 1st header packet -2) search for the 1st occurance of 03,'v','o','r','b','i','s' after here - 3) read an unsigned integer of 32 bits and skip that many bytes - 4) [user_comment_list_length] = read an unsigned integer of 32 bits - 5) iterate [user_comment_list_length] times { - 6) read an unsigned integer of 32 bits and skip that many bytes - } - 7) skip 1 byte -8) the match in 2) and what follows until here is the 2nd header packet -9) search for the 1st occurance of 05,'v','o','r','b','i','s' after here - the matching part and what follows is the 3rd header packet -if the container needs an identifer for the global header, for example a 4cc -for a global header chunk then glbl shall be used - - -Storing packets: -Each codec packet shall be stored in exactly one "container packet" -and one "container packet" must not contain more then one codec packet -"container packet" here means the smallest separatable unit of data in the -container - - -Codec Identifer: -xiph-codec 4-cc id long id -Vorbis vrbs vorbis -Theora ther theora -Tarkin trkn tarkin -Flac flac flac -Speex spex speex - -if the container uses 4-character codes 4-cc identifer from the table above -shall be used -if the container uses arbitrary length strings as identifers then the long -id from the table above shall be used - - -Examples and Disscussions about specific containers -What follows are some notes about specific containers, these notes are just -informative as they just repeat what is written above or in the -specification of the specific container - - -Example and Disscussion of the avi container -avi supports everything needed to store vorbis, this does not mean that all -application will support vorbis in avi as vorbis is rather different from -other audio codecs commonly stored in avi ... -avi supports a single global header like wav does, the 3 vorbis headers -shall be stored in it and only in it as described above -dwSampleSize must be set to zero as vorbis is vbr, many applications do -this incorrectly for other vbr codecs and consequently vbr audio in avi -becomes problematic -avi does not have timestamps but each chunk has a constant duration, while -vorbis packets can have one of 2 durations, if now the avi header is setup -so that each avi chunk has the same duration as the smaller duration of -the 2 possibilities in vorbis then simply inserting empty avi chunks will -allow every avi chunk to have the correct duration, this is of course -not the most beautifull solution but it is the only way to keep things -exact, additionally note, that empty chunks have been used since ages -in avi to lengthen the duration of video chunks - - -Example and Disscussion of the asf container -asf supports a single global header per stream and has timestamps so -storing xiph codecs in it should be possible but asf is patented and -microsoft has already threatened individuals so we strongly urge you to -avoid this container - - -Example and Disscussion of the matroska container -matroska supports storing 3 headers using a codec specific -format, which should be used for storing the 3 headers -Note, the above procedure to split one header into 3 works with the -vorbis-matroska specific format too - - -Example and Disscussion of the nut container -nut supports a single global header per stream so the 1<->3 merge/split -procedure above must be used, except that theres nothing special with -storing xiph codecs in nut - - -Example and Disscussion of mpeg-ps / mpeg-ts container -These containers neither support a global header nor provide the neccessary -packet separation / framing, so storing xiph codecs in them is outside the -scope of this appendix - - -Example and Disscussion of wav container -wav does not provide the neccessary packet separation / framing, so storing -xiph codecs in it is outside the scope of this appendix - - -Example and Disscussion of the mov container -a single glbl atom shall be placed in the stsd atom in which the the global -header shall be stored