view DOCS/tech/formats.txt @ 11594:9cc0e44fd623

10l found by atmos
author alex
date Mon, 08 Dec 2003 21:53:54 +0000
parents 6ecc0b5c08cb
children 7e22b762e1a8
line wrap: on
line source

1. Input layer, supported devices, methods:
  - plain file, with seeking
  - STDIN, without seeking backward
  - network streaming (currently plain wget-like HTTP and MMS (.asx))
  - VCD (Video CD) track, by direct CDROM device access (not requires mounting disc)
  - DVD titles using .IFO structure, by direct DVD device access (not requires mounting disc)
  - DVD titles using menu navigation (experimental/alpha, not yet finished!!!)
  - CDDA - raw audio from audio CD-ROM discs (using cdparanoia libs)
  - RTP streaming (mpeg-ps over multicast only)
  - Live.com streaming - support SDP/RTSP (using the live.com libs)
  - SMB - file access over samba (experimental)

2. Demuxer/parser layer, supported file/media formats:

  - MPEG streams (ES,PES,PS. no TS support yet)
    note: mpeg demuxer silently ignore non-mpeg content, and find mpeg packets
    in arbitrary streams. it means you can play directly VCD images (for example
    CDRwin's .BIN files) without extracting mpeg files first (with tools like vcdgear)
    It accepts all PES variants, including files created by VDR.
    Note: VOB (video object) is simple mpeg stream, but it usually has 01BD
    packets which may contain subtitles and non-mpeg audio. Usually found on DVD discs.
    
    Headers: mpeg streams has no global header. each frame sequence (also called GOP,
    group of pictures) contains an sequence header, it describes that block.
    In normal mpeg 1/2 content there are groups of 12-15 frames (24/30 fps).
    It means you can freely seek in mpeg streams, and even can cut it to
    small parts with standard file tools (dd, cut) without destroying it.
    
    Codecs: video is always mpeg video (mpeg1, mpeg2 or mpeg4).
    audio is usually mpeg audio (any layer allowed, but it's layer 2 in most files)
    but 01BD packets may contain AC3, DTS or LPCM too.
    
    FPS: mpeg2 content allows variable framerate, in form of delayed frames.
    It's mostly used for playback 24fps content at 29.97/30 fps (NTSC) rate.
    (so called Telecine or 3:2 pulldown effect)
    It means you see 30 frames per second, but there are only 24 different
    pictures and some of them are shown longer to fill 30 frame time.
    If you encode such files with mencoder, using -ofps 24 or -ofps 23.976
    is recommended.

  - AVI streams.
    Two kind of RIFF AVI files exists:
    1. interleaved: audio and video content is interleaved. it's faster and
       requires only 1 reading thread, so it's recommended (and mostly used).
    2. non-interleaved: audio and video aren't interleaved, i mean first come
       whole video followed by whole audio. it requires 2 reading process or
       1 reading with lots of seeking. very bad for network or cdrom.
    3. badly interleaved streams: mplayer detects interleaving at startup and
       enables -ni option if it finds non-interleaved content. but sometimes
       the stream seems to be interleaved, but with bad sync so it should be
       played as non-interleaved otherwise you get a-v desync or buffer overflow.

    MPlayer supports 2 kind of timing for AVI files:
    - bps-based: it is based on bitrate/samplerate of video/audio stream.
      this method is used by most players, including avifile and wmp.
      files with broken headers, and files created with VBR audio but not
      vbr-compliant encoder will result a-v desync with this method.
      (mostly at seeking).
    - interleaving-based: note: it can't be used togethwer with -ni
      it doesn't use bitrate stuff of header, it uses the relative position
      of interleaved audio and video chunks. makes some badly encoded file
      with vbr audio playable.

    Headers: AVI files has a mandatory header at the begin of the file,
    describing video parameters (resolution, fps) and codecs. Optionally
    they have an INDEX block at the end of the file. It's optional, but
    most files has such block, because it's REQUIRED for seeking.
    Btw usually it can be rebuilt from file content, mplayer does it with
    the -idx switch. Mplayer can recreate broken index blocks using -forceidx.
    As AVI files needs index for random access, broken files with no index
    are usually unplayable.
    Of course, cutting/joining AVI files needs special programs.

    Codecs: any audio and video codecs allowed, but I note that VBR audio is
    not well supported by most players. The file format makes it possible to
    use VBR audio, but most players expect CBR audio and fails with VBR,
    as VBR is unusual, and Microsoft's AVI specs only describe CBR audio.
    I also note, that most AVI encoders/multiplexers create bad files if
    using VBR audio. only 2 exception (known by me): NaNDub and MEncoder.
    
    FPS: only constant framerate allowed, but it's possible to skip frames.

  - ASF streams:
    ASF (active streaming format) comes from Microsoft. they developed two
    variant of ASF, v1.0 and v2.0. v1.0 is used by their media tools (wmp and
    wme) and v2.0 is published and patented :). of course, they differ,
    no compatibility at all. (it's just a legality game)
    MPlayer supports only v1.0, as nobody ever seen v2.0 files :)
    Note, that .ASF files are nowdays come with extension .WMA or .WMV.
    UPDATE: MS recently released the ASF v1.0 specs too, but it has some
    restrictions making it illegal to read by us :)

    Headers: Stream headers (codecs parameters) can be everywhere (in theory),
    but all files i've seen had it at the beginning of the file.
    Asf uses fixed packet size, so it is seekable without any INDEX block,
    and broken files are playable well.
    
    Codecs: video is mostly microsoft's mpeg4 variants: MP42, MP43 (aka DivX),
            WMV1 and WMV2. but any codecs allowed.
            audio is usually wma or voxware, sometimes mp3, but any codecs allowed.

    FPS: no fixed fps, every video frame has an exact timestamp instead.
    I've got stream with up to 3 sec frame display times.

  - QuickTime / MOV files:
    They come from Mac users, usually with .mov or .qt extension, but as
    MPEG Group chose quicktime as recommended file format for MPEG4,
    sometimes you meet quicktime files with .mpg or .mp4 extension.
  
    At first look, it's a mixture of ASF and AVI.
    It requires INDEX block for random access and seeking, and even for
    playback, like AVI, but uses timestamps instead of constant framerate
    and has more flexible stream options (including network stuff) like ASF.
    
    Headers: header can be placed at the beginning or at the end of file.
    About half of my files has it at the begining, others has it at the end.
    Broken files are only playable if they have header at the beginning!
    
    Codecs: any codecs allowed, both CBR and VBR.
    Note: most new mov files use Sorenson video and QDesign Music audio,
    they are patented, closed, secret, (TM)-ed etc formats, only Apple's
    quicktime player is able to playback these files (on win/mac only).

  - VIVO files:
    They are funny streams. They have a human-readable ascii header at
    the beginning, followed by interleaved audio and video chunks.
    It has no index block, has no fixed packetsize or sync bytes, and most
    files even has no keyframes, so forget seeking!
    Video is standard h.263 (in vivo/2.0 files it's modified, non-standard
    h.263), audio is either standard g.723 or Vivo Siren codec.
    
    Note, that microsoft licensed vivo stuff, and included in their netshow
    v2.0 program, so there are VfW/ACM codecs for vivo video and audio.

  - RealMedia files:
    A mixture of AVI and ASF features. It has mandatory headers at the
    beginning and an optional INDEX (missing in most files).
    The file is constructed of variable size chunks, with small header
    telling the stream ID, timestamp, flags (keyframe...) and size.
    But it has some features found in ASF files:
    The video is actually double-muxed, the video chunks are really
    appended fragments of the video frame. RV30+ supports B frames, so
    you have to parse some bits of the first fragment to get the real PTS.
    The audio frames are fixed size (CBR) but using the same scrambling
    (out-of-order interleaving) as in the ASF files. 
    
    Codecs: Audio is either COOK(er), SIPR(o), ATRAC3 or DNET.
    The DNET is actually a byte-swapped low-bitrate Dolby AC3 variant :)
    Video is RV10 (h263 variant), RV20 (rp G2), RV30 (rp v8) or RV40 (rp v9).
    
    FPS: variable, just like in ASF.
    
    Note, that similarity of real and asf has some background - they worked
    together on the (never finished/used) ASF v2 spec for some time.

  - GIF files:
    The GIF format is a common format for web graphics that supports
    animation.  These are read through libungif or compatible library.
    Variable frame delays are supported, but seeking is not supported.
    Seeking will be supported once an index of gif frames can be built.