Media Foundation
From Wikipedia, the free encyclopedia
Microsoft Media Foundation (MF) is a COM-based multimedia framework pipeline and infrastructure platform for digital media in Windows Vista. It is the intended replacement for Microsoft DirectShow, Windows Media SDK, DirectX Media Objects (DMOs) and all other legacy multimedia APIs such as Audio Compression Manager (ACM) and Video for Windows (VfW). The existing DirectShow technology is intended to be replaced by Media Foundation step-by-step, starting with a few features. For some time there will be a co-existence of Media Foundation and DirectShow. Media Foundation will not be available for previous Windows versions, including Windows XP.
The first release, present in Windows Vista, focuses on audio and video playback quality, high-definition content (i.e. HDTV), content protection and a more unified approach for digital data access control for digital rights management (DRM) and its interoperability. It integrates DXVA 2.0 for offloading more of the video processing pipeline to hardware, for better performance. Videos are processed in their colorspace they were encoded in, and are handed off to the hardware, which composes the image in its native colorspace. This prevents intermediate colorspace conversions to improve performance. MF includes a new video renderer, called Enhanced Video Renderer (EVR), which is the next iteration of VMR 7 and 9. EVR has better support for playback timing and synchronization. It uses the Multimedia Class Scheduler Service (MMCSS), a new service that prioritizes real time multimedia processing, to reserve the resources required for the playback, without any tearing or glitches.
Contents |
[edit] Architecture
The MF architecture is divided into the Control layer, Core Layer and the Platform layer. The core layer encapsulates most of the functionality of Media Foundation. It consists of the media foundation pipeline, which has three components: Media Source, Media Sink and Media Foundation Transforms (MFT). A media source is an object that acts as the source of multimedia data, either compressed or uncompressed. It can encapsulate various data sources, like a file, or a network server or even a camcorder, with source specific functionality abstracted by a common interface. A source object can use a source resolver object which creates a media source from an URI, file or bytestream. Support for non-standard protocols can be added by creating a source resolver for them. A source object can also use a sequencer object to use a sequence of sources (a playlist) or to coalesce multiple sources into single logical source. A media sink is the recipient of processed multimedia data. A media sink can either be a renderer sink, which renders the content on an output device, or an archive sink, which saves the content onto a persistent storage system such as a file. A renderer sink takes uncompressed data as input whereas an archive sink can take either compressed or uncompressed data, depending on the output type. The data from media sources to sinks are acted upon by MFTs; MFTs are certain functions which transform the data into another form. MFTs can include multiplexers and demultiplexers, codecs or DSP effects like reverb. The core layer uses services like file access and networking and clock synchronization to time the multimedia rendering. These are part of the Platform layer, which provides services necessary for accessing the source and sink byte streams, presentation clocks and an object model that lets the core layer components function asynchronously, and is generally implemented as OS services. Pausing, stopping, fast forward, reverse or time-compression can be achieved by controlling the presentation clock.
However, the media pipeline components are not connected; rather they are just presented as discrete components. An application running in the Control layer has to choose which source types, transforms and sinks is needed for the particular video processing task at hand, and set up the "connections" between the components (a topology) to complete the data flow pipeline. For example, to play back a compressed audio/video file, the pipeline will consist of a file source object, a demultiplexer for the specific file container format to split the audio and video streams, codecs to decompress the audio and video streams, DSP processors for audio and video effects and finally the EVR renderer, in sequence. Or for a video capture application, the camcorder will act as video and audio sources, on which codec MFTs will work to compress the data and feed to a multiplexer that coalesces the streams into a container; and finally a file sink or a network sink will write it to a file or stream over a network. The application also has to co-ordinate the flow of data between the pipeline components. The control layer has to "pull" (request) samples from one pipeline component and pass it onto the next component in order to achieve data flow within the pipeline. This is in contrast to the way DirectShow's "push" model where a pipeline component pushes data to the next component. Media Foundation allows content protection by hosting the pipeline within a protected execution environment, called the Protected Media Path. The control layer components are required to propagate the data through the pipeline at a rate that the rendering synchronizes with the presentation clock. The rate (or time) of rendering is embedded as a part of the multimedia stream as metadata. The source objects extract the metadata and pass it over. Metadata is of two types: coded metatdata, which is information about bit rate and presentation timings, and descriptive metadata, like title and author names. Coded metadata is handed over to the object that controls the pipeline session, and descriptive metadata is exposed for the application to use if it chooses to.
Media Foundation provides a Media Session object that can be used to setup the topologies, and facilitate a data flow, without the application doing it explicitly. It exists in the control layer, and exposes a Topology loader object. The application specifies the required pipeline topology to the loader, which then creates the necessary connections between the components. The media session object manages the job of synchronizing with the presentation clock. It creates the presentation clock object, and passes a reference to it to the sink. It then uses the timer events from the clock to propagate data along the pipeline. It also changes the state of the clock to handle pause, stop or resume requests from the application.
[edit] Media Foundation Transform
Media Foundation Transforms (MFTs) represent a generic model for processing media data. They are used in Media Foundation primarily to implement decoders, encoders, mixers and digital signal processors (DSPs) - between media sources and media sinks. Media Foundation Transforms are an evolution of the transform model first introduced with DirectX Media Objects (DMOs). Hybrid DMO/MFT Objects can also be created. Applications can use MFTs inside the Media Foundation pipeline, or use them directly as stand-alone objects. MFTs also support hardware-accelerated video processing and their behaviors are more clearly specified. MFTs can be any of the following type:
- Audio and video codecs
- Audio and video effects
- Multiplexers and demultiplexers
- Tees
- Color-space converters
- Sample-rate converters
- Video scalers
Microsoft recommends developers to write a Media Foundation Transform instead of a DirectShow filter, for Windows Vista.[1] For video editing and video capture, Microsoft recommends using DirectShow as they are not the primary focus of Media Foundation in Windows Vista.
[edit] Enhanced Video Renderer
Media Foundation uses the Enhanced Video Renderer (EVR) for rendering video content, which acts as a mixer as well. It can mix up to 16 simultaneous streams, with the first stream being a reference stream. All but the reference stream can have per-pixel transparency information, as well as any specified z-order. The reference stream cannot have transparent pixels, and has a fixed z-order position, at the back of all streams. The final image is composited onto a single surface by coloring each pixel according to the color and transparency of the corresponding pixel in all streams.
Internally, the EVR uses a mixer object for mixing the streams. It can also deinterlace the output and apply color correction, if required. The composited frame is handed off to a presenter object, which schedules them for rendering onto a Direct3D device, which it shares with the DWM and other applications using the device. The frame rate of the output video is synchronized with the frame rate of the reference stream. If any of the other streams (called substreams) have a different frame rate, EVR discards the extra frames (if the substream has a higher frame rate), or uses the same frame more than once (if it has a slower frame rate).
[edit] Supported media formats
Windows Media Audio, Windows Media Video and MP3 are the default supported formats. Format support is extensible; developers can add support for other formats by writing decoder MFTs and/or custom Media Sources. MIDI playback is also not yet supported using Media Foundation.
[edit] Benefits over DirectShow
Media Foundation offers the following benefits:
- Is scalable for high-definition content and DRM-protected content.
- Allows DirectX Video Acceleration to be used outside of the DirectShow infrastructure. DXVA 2.0 is available to user-mode components without using the DirectShow video renderer.
- Provides better resilience to CPU, I/O, and memory stress for low-latency glitch-free playback of audio and video. Video tearing has been minimized. The improved video processing support also enables high color spaces and enhanced full-screen playback. Enhanced video renderer (EVR) which is also available for DirectShow, offers better timing support and improved video processing.
- Media Foundation extensibility enables different content protection systems to operate together.
- Media Foundation uses the Multimedia Class Scheduler Service (MMCSS), a new system service in Windows Vista. MMCSS enables multimedia applications to ensure that their time-sensitive processing receives prioritized access to CPU resources.
Media Foundation accompanies two other technologies — (Direct3D 10) and Windows Presentation Foundation—to keep pace with graphics and multimedia hardware evolution and demanding multimedia applications.
[edit] Application support
Media Foundation, for this initial release in Windows Vista, finds use in media playback applications. Until now, mainly internal or bundled Windows services and applications are using Media Foundation.
- Windows Protected Media Path (PMP), for instance, relies completely on Media Foundation.
- Windows Media Player in Windows Vista relies on Media Foundation for playing ASF (WMA and WMV) content and protected content, but can also use DirectShow or the Windows Media Format SDK instead. In the case of WMV9 playback, this also implies using DXVA 2.0 instead of DXVA 1.0 when the video hardware supports WMV9/VC-1 decoding acceleration.
- DirectX Video Acceleration (DXVA) 2.0, the hardware video acceleration pipeline for Windows Vista, also bases on Media Foundation.
[edit] See also
[edit] References
[edit] External links
- Microsoft Media Foundation SDK
- Media Foundation Development Forum
- Media Source Metadata
- Media Foundation Pipeline
- Media Foundation Architecture
- About the Media Session
- About the Media Foundation SDK
- Enhanced Video Renderer

