Tuesday, July 29, 2008

DM6437 Memory Map

The DaVinci family of processors have a large byte addressable address space, some limitations to byte addressing are determined by peripheral interconnection to the DM6437 device. Program code and data can be placed anywhere in the unified address space. Addresses are multiple sizes depending on hardware implementation.

The memory map shows the address space of a DM6437 processor on the left with specific details of how each region is used on the right. By default, the internal memory sits at the beginning of the address space. Portions of memory can be remapped in software as L2 cache rather than fixed RAM.

The part incorporates a dual EMIF interface. One dedicated EMIF directly interfaces to the DDR2 memory. The Flash, NAND Flash, or SRAM are mapped into CS2 space and selectable via JP2. When CS2 is used for daughter card interfacing JP2 must be set appropriately.


Thursday, July 24, 2008

DaVinci TMS320DM6437 Processor Ideal for Cost-Sensitive Digital Media Applications

DaVinci TMS320DM6437 processor is optimized for cost-sensitive digital media applications and include special features that make them suitable for automotive vision applications such as lane departure and collision avoidance as well as machine-vision systems, robotics, video security and video telephony. DM6437 processors feature the video-optimized programmable TMS320C64x+ DSP core plus video, memory and network interfaces, providing the most flexible and cost-effective DSP for networked video and vision applications. Designed for applications in which either an entire system runs on the DSP or a separate microprocessor runs the application and networking, these processors deliver leading-edge video performance while leaving headroom for networking, user interface and other tasks. Features of the TMS320DM6437 processor:
  • Improved video performance of up to H.264 video encode at D1 resolution and a 50 percent cost reduction over previous DSP digital media processors.
  • The new TMS320C64x+™ core operates at up to 600 MHz.
  • 80 KB L1D, 32 KB L1P cache/SRAM and 128 KI L2 cache SRAM memory
  • Two 32-bit, 133-MHz extended memory interfaces (EMIFs)
  • 10/100 Ethernet media access controller (MAC), two UARTs, I2C, SPI, GPIO, McASP and three PWMs

Tuesday, July 22, 2008

Video Terminologies

  • Pixel - Represents each point of information in a picture
  • Resolution - Describes the number of pixels horizontally and vertically
  • Color Depth - How many bits are used to represent the color of each pixel
  • Frame rate - Determines how long the pixel exists

Monday, July 21, 2008

Code Composer Studio IDE Platinum Edition

Development tools continue to grow in importance when choosing a processor platform. And, because of this, TI is providing the most robust, dependable development tools available. The new Code Composer Studio Platinum Edition is TI’s latest development tool and integrates everything programmers need for application development from start to finish.

The v3.3 Platinum edition of CCStudio offers many new and enhanced features that will increase debug visibility and meet the evolving needs of developers.

CCStudio v3.3 Supports All Platforms into One Easy-to-Use IDE

As embedded applications continue to become more complex, the need to use multiple processors and platforms increase. CCStudio Platinum simplifies this process by offering a fully merged IDE that supports all TI platforms including the C6000, C5000, C2000, OMAP and DaVinci platforms.

Benefits include:
  • Parallel debug capabilities for inter-processor visualization
  • Simplified migration of software from one DSP platform to another under a common IDE
  • Installation of IDE at one time for all platforms
  • Integration simplifies tool maintenance and updates
  • SoC enhancements for DaVinci processors: new MMU Page Table viewer for ARM OS memory management and new status bar to view various ARM processor states.
Simulation and Debug enhancements offer more visibility, ease of use and depth of analysis
  • New Unified Breakpoint Manager offers efficient, easy to use breakpoint manager saving developers time: Manage both software and hardware breakpoints from a single interface in the new Unified Breakpoint Manager.
  • Simulator based debug features give deeper insight into application behavior. Watch point tracks memory corruption and interrupt latency checker, helps you meet real time deadlines predictably. Simulation analysis integration offers advanced code coverage features to quickly identify unexecuted code and pinpoint CPU-intensive code for further optimization.
Click here to download Free CC Studio IDE V3.3 Platinum Edition 120 Day Trial Version

Thursday, July 10, 2008

Hybrid Video Coding - Video Compression

A hybrid video encoding algorithm typically proceeds as follows. Each picture is split into blocks. The first picture of a video sequence (or for a “clean” random access point into a video sequence) is typically coded in Intra mode (which typically uses some prediction from region to region within the picture but has no dependence on other pictures). For all remaining pictures of a sequence or between random access points, typically inter-picture coding modes are used for most blocks. The encoding process for Inter prediction (ME) consists of choosing motion data comprising the selected
reference picture andMV to be applied for all samples of each block. The motion and mode decision data, which are transmitted as side information, are used by the encoder and decoder to generate identical Inter prediction signals using MC.

The residual of the Intra or Inter prediction, which is the difference between the original block and its prediction, is transformed by a frequency transform. The transform coefficients are then scaled, quantized, entropy coded, and transmitted together with the prediction side information.

Decoder inside the Encoder: The encoder duplicates the decoder processing so that both will generate identical predictions for subsequent data. Therefore, the quantized transform coefficients are constructed by inverse scaling and are then inverse transformed to duplicate the decoded prediction residual. The residual is then added to the prediction, and the result of that addition may then be fed into a deblocking filter to smooth out block-edge discontinuities induced by the block-wise processing. The final picture (which is also displayed by the decoder) is then stored for the prediction of subsequent encoded pictures. In general, the order of the encoding or decoding processing of pictures often differs from the order in which they arrive from the source, necessitating a distinction between the decoding order and the output order for a decoder.

The design and operation of an encoder involves the optimization of many decisions to achieve the best possible tradeoff between rate and distortion given the constraints on delay and complexity. There has been a large amount of work on this optimization problem. One particular focus has been on Lagrangian optimization methods. Some studies have developed advanced encoder optimization strategies with little regard for encoding complexity, while others have focused on how to achieve a reduction in complexity while losing as little as possible in rate-distortion performance.


Tuesday, July 8, 2008

Video Source Coding Basics

A digital image or a frame of digital video typically consists of three rectangular arrays of integer-valued samples, one array for each of the three components of a tristimulus color representation for the spatial area represented in the image. Video coding often uses a color representation having three components called Y, Cb, and Cr. Component Y is called luma and represents brightness. The two chroma components Cb and Cr represent the extent to which the color deviates from gray toward blue and red, respectively. Because the human visual system is more sensitive to luma than chroma, often a sampling structure is used in which the chroma component arrays each have only one-fourth as many samples as the corresponding luma component array (half the number of samples in both the horizontal and vertical dimensions). This is called 4:2:0 sampling. The amplitude of each component is typically represented with 8 b of precision per sample for consumer-quality video.

The two basic video formats are progressive and interlaced. A frame array of video samples can be considered to contain two interleaved fields, a top field and a bottom field. The top field contains the even-numbered rows 0, 2, ..., H - 2 (with 0 being top row number for a frame and being its total number of rows), and the bottom field contains the odd-numbered rows 1, 3, ..., H - 1 (starting with the second row of the frame). When interlacing is used, rather than capturing the entire frame at each sampling time, only one of the two fields is captured. Thus, two sampling periods are required to capture each full frame of video. We will use the term picture to refer to either a frame or field. If the two fields of a frame are captured at different time instants, the frame is referred to as an interlaced frame, and otherwise it is referred to as a progressive frame.

Techniques for Digital Compression

Prediction: A process by which a set of prediction values is created (often based in part on an indication sent by an encoder of how to form the prediction based on analysis of the input samples and the types of prediction that can be selected in the system design) that is used to predict the values of the input samples so that the values that need to be represented become only the (typically easier to encode) differences from the predicted values, such differences being called the residual values.

Transformation: A process (also referred to as subband decomposition) that is closely related to prediction, consisting of forming a new set of samples from a combination of input samples, often using a linear combination. Simplistically speaking, a transformation can prevent the need to repeatedly represent similar values and can capture the essence of the input signal by using frequency analysis. A typical benefit of transformation is a reduction in the statistical correlation of the input samples, so that the most relevant aspects of the set of input samples are typically concentrated into a small number of variables. Two well-known examples of transformation are the Karhunen-Loève transform (KLT), which is an optimal decorrelator, and the discrete cosine transform (DCT), which has performance close to that of a KLT when applied to highly correlated auto-regressive sources.

Quantization: A process by which the precision used for the representation of a sample value (or a group of sample values) is reduced in order to reduce the amount of data needed to encode the representation. Such a process is directly analogous to intuitively well-understood concepts such as the rounding off of less significant digits when writing the value of some statistic. Often the rounding precision is controlled by a step size that specifies the smallest representable value increment. Among the techniques listed here for compression, quantization is typically the only one that is inherently noninvertible—that is, quantization involves some form of many-to-few mapping that inherently involves some loss of fidelity. The challenge is to minimize that loss of fidelity in relation to some relevant method of measuring distortion.

Entropy coding: A process by which discrete-valued source symbols are represented in a manner that takes advantage of the relative probabilities of the various possible values of each source symbol. A well-known type of entropy code is the variable-length code (VLC), which involves establishing a tree-structured code table that uses short binary strings to represent symbol values that are highly likely to occur and longer binary strings to represent less likely symbol values. The best-known method of designing VLCs is the well-known Huffman code method, which produces an optimal VLC. A somewhat less well-known method of entropy coding that can typically be more optimal than VLC coding and can also be more easily designed to adapt to varying symbol statistics is the newer technique referred to as arithmetic coding.

One way of compressing video is simply to compress each picture separately. This is how much of the compression research started in the mid-1960s. Today, the most prevalent syntax for such use is JPEG. The most common “baseline” JPEG scheme consists of segmenting the picture arrays into equal-size blocks of 8x8 samples each. These blocks are transformed by a DCT, and the DCT coefficients are then quantized and transmitted using variable-length codes. We refer to this kind of coding scheme as intra-picture or Intra coding, since the picture is coded without referring to other pictures in a video sequence. In fact, such Intra coding (often called motion JPEG) is in common use for video coding today in production-quality editing systems.

However, improved compression performance can be achieved by taking advantage of the large amount of temporal redundancy in video content. This was recognized at least as long ago as 1929. Usually, much of the depicted scene is essentially just repeated in picture after picture without any significant change, so video can be represented more efficiently by sending only the changes in the video scene rather than coding all regions repeatedly. We refer to such techniques as inter-picture or Inter coding. This ability to use temporal redundancy to improve coding efficiency is what fundamentally distinguishes video compression from the Intra compression exemplified by JPEG standards.

Conditional Replenishment: A simple method of improving compression by coding only the changes in a video scene is called conditional replenishment (CR) , and it was the only temporal redundancy reduction method used in the first version of the first digital video coding international standard, ITU-T Recommendation H.120. CR coding consists of sending signals to indicate which areas of a picture can just be repeated, and sending new information to replace the changed areas. Thus, CR allows a choice between one of two modes of representation for each area, which we call Skip and Intra. However, CR has a significant shortcoming, which is its inability to refine the approximation given by a repetition.

Motion Prediction: Often the content of an area of a prior picture can be a good starting approximation for the corresponding area in a new picture, but this approximation could benefit from some minor alteration to make it a better representation. Adding a third type of “prediction mode,” in which a refinement difference approximation can be sent, results in a further improvement of compression performance—leading to the basic design of modern hybrid codecs (using a term coined by Habibi with a somewhat different original meaning). The naming of these codecs refers to their construction as a hybrid of two redundancy reduction techniques—using both prediction and transformation. In modern hybrid codecs, regions can be predicted using inter-picture prediction, and a spatial frequency transform is applied to the refinement regions and the Intra-coded regions.

Motion Compensation & Estimation: One concept for the exploitation of statistical temporal dependencies that was missing in the first version of H.120 was motion-compensated prediction (MCP). Most changes in video content are typically due to the motion of objects in the depicted scene relative to the imaging plane, and a small amount of motion can result in large differences in the values of the samples in a picture, especially near the edges of objects. Often, predicting an area of the current picture from a region of the previous picture that is displaced by a few samples in spatial location can significantly reduce the need for a refining difference approximation. This use of spatial displacement motion vectors (MVs) to form a prediction is known as motion compensation (MC), and the encoder’s search for the best MVs to use is known as motion estimation. The coding of the resulting difference signal for the refinement of the MCP is known as MCP residual coding.

It should be noted that the subsequent improvement of MCP techniques has been the major reason for coding efficiency improvements achieved by modern standards when comparing them from generation to generation. The price for the use of MCP in ever more sophisticated ways is a major increase in complexity requirements.

Fractional-sample-accurate MCP: This term refers to the use of spatial displacement MV values that have more than integer precision, thus requiring the use of interpolation when performing MCP. Intuitive reasons include having a more accurate motion representation and greater flexibility in prediction filtering (as full-sample, half-sample, and quarter-sample interpolators provide different degrees of low-pass filtering which are chosen automatically in the ME process). Half-sample-accuracy MCP was considered even during the design of H.261 but was not included due to the complexity limits of the time. Later, as processing power increased and algorithm designs improved, video codec standards increased the precision of MV support from full-sample to half-sample (in MPEG-1, MPEG-2, and H.263) to quarter-sample (for luma in MPEG-4’s advanced simple profile and H.264/AVC) and beyond (with eighth-sample accuracy used for chroma in H.264/AVC).

MVs over picture boundaries: The approach solves the problem for motion representation for samples at the boundary of a picture by extrapolating the reference picture. The most common method is just to replicate the boundary samples for extrapolation.

Bipredictive MCP: The averaging of two MCP signals. One prediction signal has typically been
formed from a picture in the temporal future with the other formed from the past relative to the picture being predicted (hence, it has often been called bidirectional MCP). Bipredictive MCP was first put in a standard in MPEG-1, and it has been present in all other succeeding standards. Intuitively, such bipredictive MCP particularly helps when the scene contains uncovered regions or smooth and consistent motion.

Variable block size MCP: The ability to select the size of the region (ordinarily a rectangular blockshaped region) associated with each MV for MCP. Intuitively, this provides the ability to effectively trade off the accuracy of the motion field representation with the number of bits needed for representing MVs.

Multipicture MCP: MCP using more than just one or two previous decoded pictures. This allows the exploitation of long-term statistical dependencies in video sequences, as found with backgrounds, scene cuts, and sampling aliasing.


Thursday, July 3, 2008

MPEG-2 System

Introduction

The increasing need for audio and video compression for storage and transmission purpose led to MPEG (Motion Picture Expert Group). MPEG1 (ISO/IEC 11172) is first standard finalized in 1993 by MPEG committee. MPEG1 standard is mainly targeted for digital storage application. MPEG2 (ISO/IEC 13818) is successor to the MPEG1 standard, which got finalized in 1998. MPEG2 standard is mainly targeted for digital broadcast and television application. It addresses video compression up to 15 Mbps, audio compression at 128kbps and method of multiplexing audio-visual data. MPEG2 standard consists of following subparts namely system, audio and video. The audio and video subpart addresses details of respective compression method. The system part covers method of combining one or more audio, video or data streams in to one or more system stream, audio-visual synchronization and buffer management.

MPEG2 system layer enhances the MPEG1 system by introduction of second way to multiplex and addition features to existing one. MPEG2 system defines two ways to multiplex for addressing different kind of applications as follows.
  • Program Stream(PS)
  • Transport Stream(TS)
Program stream is targeted for single program application on error free environment e.g. digital storage media. It is based on MPEG1 system stream. Popular DVD uses MPEG2 program stream for movie. Transport stream is targeted for simultaneous delivery of multiple programs on error-prone network e.g. broadcast environment, DVB etc.

Terms and Definitions

1. Program: - Program used to describe is channel or broadcast service.
2. Presentation Unit: - It is used to denote an uncompressed raw YUV picture frame or a raw
PCM audio frame.
3. Access Unit: - It is compressed presentation unit. E.g. audio access unit stands for encoded audio frame and video access unit stands for encoded video frame.
4. Elementary stream: - It is generic term to define successions of audio or video access units.
5. Audio Elementary stream: - It is used to denote consecutive audio access unit. In other words, it denotes output of audio encoder.
6. Video Elementary stream: - It is used to denote consecutive video access unit. In other words, it denotes output of video encoder.

MPEG-2 System Overview

MPEG2 system layer supports five basic functions.
1. Synchronization of multiple compressed streams on playback side
2. Interleaving of multiple compressed streams in to single or multiple system streams
3. Initialization of buffer for playback buffer for startup
4. Continuous buffer management
5. Time identification.


Similar on lines of audio and video subpart, system subpart does not specify architecture or implementation of encoder or decoder. However standard specifies the bitstream syntax and any implementations has to comply with specification. Even if with bitstream compliance, there exists lots of freedom in design and implementation.


MPEG-2 System Architecture

MPEG 2 system layer consists of two layers. Namely
1. PES Packet Layer
2. Multiplex layer
PES packet layer handles synchronization by means of time stamps, while multiplex layer handles interleaving of data and data continuity. The PES packet layer is common across the transport and program stream. The multiplexing layer is different for transport and program stream. The multiplexing layer for program stream is pack layer, while for transport layer it is transport packets. Transcoding is possible between program stream and transport stream using PES packets. In case of transport stream having multiple programs, user gets multiple program streams.

PES Packet Layer

Presentation time stamp (PTS): - It denotes time instant at which access unit from decoder buffer is removed, decoded and presented for viewer for display. In STD model, decoding process is assumed to be instantaneous without any delay. Any real system must compensate for actual decoding delay.

Decoding time stamp (DTS): - It denotes time instant at which access unit from decoder buffer is removed, decoded, but not presented to viewer to display. It is stored for display at later point of time. DTS is present in case of video sequences having B frame. In this case, there is rearrangement of video frames for encoding. The encoding frame order and display order are different. DTS always comes with PTS for any video access unit. DTS alone cannot exist for any video access unit. DTS value will always lesser then corresponding PTS. In case of audio and video sequence (without B frames) there is only one time stamp i.e. PTS. DTS is not applicable for these kinds of sequences.

Packet layer consists of packet header followed by access units. Packet can be fixed or variable length. Some important fields of packet header are shown below.



Stream_ID identifies audio/video packet data to its audio/video elementary streams. Significance of PTS and DTS is mentioned above.

Transport Packet Layer

Transport stream is mainly targeted for error-prone transmission. Transport packets consist of short fixed length packet. The packet size is always 188 bytes. It consists of 4 bytes header followed by adaptation field or payload or both. A PES a packet is divided into many transport packets with following constraints.

  • The first byte of PES packet must become first byteof the transport packet layer payload
  • Only data from one PES packet may be carried in a transport packet.
A PES packet is unlikely to fill integer number of payload of transport packets. In this case, excessive space in last transport packet is wasted using adaptation field of appropriate length in this transport packet. The resulting transport packets are outputted sequentially to form MPEG2 transport stream. Some of important field for transport packet header are as follows.



Sync_byte gets repeated every 188 bytes, which help in error recovery. Packet Identifier (PID) is used to associate data with corresponding elementary streams. If in a given transport packet, the start of access unit happens, then Payload_unit_start_indicator will be set. Continuity_counter increments for successive packets. This helps in identifying loss of packet. Adaption_field_control is used to fill last transport packet. There are some transport packets with specific PID carrying service information, program specific information and null packets.