DM6437 Digital Video Development Platform(DVDP): 2008

Tuesday, July 29, 2008

DM6437 Memory Map

The DaVinci family of processors have a large byte addressable address space, some limitations to byte addressing are determined by peripheral interconnection to the DM6437 device. Program code and data can be placed anywhere in the unified address space. Addresses are multiple sizes depending on hardware implementation.

The memory map shows the address space of a DM6437 processor on the left with specific details of how each region is used on the right. By default, the internal memory sits at the beginning of the address space. Portions of memory can be remapped in software as L2 cache rather than fixed RAM.

The part incorporates a dual EMIF interface. One dedicated EMIF directly interfaces to the DDR2 memory. The Flash, NAND Flash, or SRAM are mapped into CS2 space and selectable via JP2. When CS2 is used for daughter card interfacing JP2 must be set appropriately.

Thursday, July 24, 2008

DaVinci TMS320DM6437 Processor Ideal for Cost-Sensitive Digital Media Applications

DaVinci TMS320DM6437 processor is optimized for cost-sensitive digital media applications and include special features that make them suitable for automotive vision applications such as lane departure and collision avoidance as well as machine-vision systems, robotics, video security and video telephony. DM6437 processors feature the video-optimized programmable TMS320C64x+ DSP core plus video, memory and network interfaces, providing the most flexible and cost-effective DSP for networked video and vision applications. Designed for applications in which either an entire system runs on the DSP or a separate microprocessor runs the application and networking, these processors deliver leading-edge video performance while leaving headroom for networking, user interface and other tasks. Features of the TMS320DM6437 processor:

Improved video performance of up to H.264 video encode at D1 resolution and a 50 percent cost reduction over previous DSP digital media processors.
The new TMS320C64x+™ core operates at up to 600 MHz.
80 KB L1D, 32 KB L1P cache/SRAM and 128 KI L2 cache SRAM memory
Two 32-bit, 133-MHz extended memory interfaces (EMIFs)
10/100 Ethernet media access controller (MAC), two UARTs, I2C, SPI, GPIO, McASP and three PWMs

Tuesday, July 22, 2008

Video Terminologies

Pixel - Represents each point of information in a picture
Resolution - Describes the number of pixels horizontally and vertically
Color Depth - How many bits are used to represent the color of each pixel
Frame rate - Determines how long the pixel exists

Monday, July 21, 2008

Code Composer Studio IDE Platinum Edition

Development tools continue to grow in importance when choosing a processor platform. And, because of this, TI is providing the most robust, dependable development tools available. The new Code Composer Studio Platinum Edition is TI’s latest development tool and integrates everything programmers need for application development from start to finish.

The v3.3 Platinum edition of CCStudio offers many new and enhanced features that will increase debug visibility and meet the evolving needs of developers.

CCStudio v3.3 Supports All Platforms into One Easy-to-Use IDE

As embedded applications continue to become more complex, the need to use multiple processors and platforms increase. CCStudio Platinum simplifies this process by offering a fully merged IDE that supports all TI platforms including the C6000, C5000, C2000, OMAP and DaVinci platforms.

Benefits include:

Parallel debug capabilities for inter-processor visualization
Simplified migration of software from one DSP platform to another under a common IDE
Installation of IDE at one time for all platforms
Integration simplifies tool maintenance and updates
SoC enhancements for DaVinci processors: new MMU Page Table viewer for ARM OS memory management and new status bar to view various ARM processor states.

Simulation and Debug enhancements offer more visibility, ease of use and depth of analysis

New Unified Breakpoint Manager offers efficient, easy to use breakpoint manager saving developers time: Manage both software and hardware breakpoints from a single interface in the new Unified Breakpoint Manager.
Simulator based debug features give deeper insight into application behavior. Watch point tracks memory corruption and interrupt latency checker, helps you meet real time deadlines predictably. Simulation analysis integration offers advanced code coverage features to quickly identify unexecuted code and pinpoint CPU-intensive code for further optimization.

Click here to download Free CC Studio IDE V3.3 Platinum Edition 120 Day Trial Version

Thursday, July 10, 2008

Hybrid Video Coding - Video Compression

A hybrid video encoding algorithm typically proceeds as follows. Each picture is split into blocks. The first picture of a video sequence (or for a “clean” random access point into a video sequence) is typically coded in Intra mode (which typically uses some prediction from region to region within the picture but has no dependence on other pictures). For all remaining pictures of a sequence or between random access points, typically inter-picture coding modes are used for most blocks. The encoding process for Inter prediction (ME) consists of choosing motion data comprising the selected
reference picture andMV to be applied for all samples of each block. The motion and mode decision data, which are transmitted as side information, are used by the encoder and decoder to generate identical Inter prediction signals using MC.

The residual of the Intra or Inter prediction, which is the difference between the original block and its prediction, is transformed by a frequency transform. The transform coefficients are then scaled, quantized, entropy coded, and transmitted together with the prediction side information.

Decoder inside the Encoder: The encoder duplicates the decoder processing so that both will generate identical predictions for subsequent data. Therefore, the quantized transform coefficients are constructed by inverse scaling and are then inverse transformed to duplicate the decoded prediction residual. The residual is then added to the prediction, and the result of that addition may then be fed into a deblocking filter to smooth out block-edge discontinuities induced by the block-wise processing. The final picture (which is also displayed by the decoder) is then stored for the prediction of subsequent encoded pictures. In general, the order of the encoding or decoding processing of pictures often differs from the order in which they arrive from the source, necessitating a distinction between the decoding order and the output order for a decoder.

The design and operation of an encoder involves the optimization of many decisions to achieve the best possible tradeoff between rate and distortion given the constraints on delay and complexity. There has been a large amount of work on this optimization problem. One particular focus has been on Lagrangian optimization methods. Some studies have developed advanced encoder optimization strategies with little regard for encoding complexity, while others have focused on how to achieve a reduction in complexity while losing as little as possible in rate-distortion performance.

Tuesday, July 8, 2008

Video Source Coding Basics

A digital image or a frame of digital video typically consists of three rectangular arrays of integer-valued samples, one array for each of the three components of a tristimulus color representation for the spatial area represented in the image. Video coding often uses a color representation having three components called Y, Cb, and Cr. Component Y is called luma and represents brightness. The two chroma components Cb and Cr represent the extent to which the color deviates from gray toward blue and red, respectively. Because the human visual system is more sensitive to luma than chroma, often a sampling structure is used in which the chroma component arrays each have only one-fourth as many samples as the corresponding luma component array (half the number of samples in both the horizontal and vertical dimensions). This is called 4:2:0 sampling. The amplitude of each component is typically represented with 8 b of precision per sample for consumer-quality video.

The two basic video formats are progressive and interlaced. A frame array of video samples can be considered to contain two interleaved fields, a top field and a bottom field. The top field contains the even-numbered rows 0, 2, ..., H - 2 (with 0 being top row number for a frame and being its total number of rows), and the bottom field contains the odd-numbered rows 1, 3, ..., H - 1 (starting with the second row of the frame). When interlacing is used, rather than capturing the entire frame at each sampling time, only one of the two fields is captured. Thus, two sampling periods are required to capture each full frame of video. We will use the term picture to refer to either a frame or field. If the two fields of a frame are captured at different time instants, the frame is referred to as an interlaced frame, and otherwise it is referred to as a progressive frame.

Techniques for Digital Compression

Prediction: A process by which a set of prediction values is created (often based in part on an indication sent by an encoder of how to form the prediction based on analysis of the input samples and the types of prediction that can be selected in the system design) that is used to predict the values of the input samples so that the values that need to be represented become only the (typically easier to encode) differences from the predicted values, such differences being called the residual values.

Transformation: A process (also referred to as subband decomposition) that is closely related to prediction, consisting of forming a new set of samples from a combination of input samples, often using a linear combination. Simplistically speaking, a transformation can prevent the need to repeatedly represent similar values and can capture the essence of the input signal by using frequency analysis. A typical benefit of transformation is a reduction in the statistical correlation of the input samples, so that the most relevant aspects of the set of input samples are typically concentrated into a small number of variables. Two well-known examples of transformation are the Karhunen-Loève transform (KLT), which is an optimal decorrelator, and the discrete cosine transform (DCT), which has performance close to that of a KLT when applied to highly correlated auto-regressive sources.

Quantization: A process by which the precision used for the representation of a sample value (or a group of sample values) is reduced in order to reduce the amount of data needed to encode the representation. Such a process is directly analogous to intuitively well-understood concepts such as the rounding off of less significant digits when writing the value of some statistic. Often the rounding precision is controlled by a step size that specifies the smallest representable value increment. Among the techniques listed here for compression, quantization is typically the only one that is inherently noninvertible—that is, quantization involves some form of many-to-few mapping that inherently involves some loss of fidelity. The challenge is to minimize that loss of fidelity in relation to some relevant method of measuring distortion.

Entropy coding: A process by which discrete-valued source symbols are represented in a manner that takes advantage of the relative probabilities of the various possible values of each source symbol. A well-known type of entropy code is the variable-length code (VLC), which involves establishing a tree-structured code table that uses short binary strings to represent symbol values that are highly likely to occur and longer binary strings to represent less likely symbol values. The best-known method of designing VLCs is the well-known Huffman code method, which produces an optimal VLC. A somewhat less well-known method of entropy coding that can typically be more optimal than VLC coding and can also be more easily designed to adapt to varying symbol statistics is the newer technique referred to as arithmetic coding.

One way of compressing video is simply to compress each picture separately. This is how much of the compression research started in the mid-1960s. Today, the most prevalent syntax for such use is JPEG. The most common “baseline” JPEG scheme consists of segmenting the picture arrays into equal-size blocks of 8x8 samples each. These blocks are transformed by a DCT, and the DCT coefficients are then quantized and transmitted using variable-length codes. We refer to this kind of coding scheme as intra-picture or Intra coding, since the picture is coded without referring to other pictures in a video sequence. In fact, such Intra coding (often called motion JPEG) is in common use for video coding today in production-quality editing systems.

However, improved compression performance can be achieved by taking advantage of the large amount of temporal redundancy in video content. This was recognized at least as long ago as 1929. Usually, much of the depicted scene is essentially just repeated in picture after picture without any significant change, so video can be represented more efficiently by sending only the changes in the video scene rather than coding all regions repeatedly. We refer to such techniques as inter-picture or Inter coding. This ability to use temporal redundancy to improve coding efficiency is what fundamentally distinguishes video compression from the Intra compression exemplified by JPEG standards.

Conditional Replenishment: A simple method of improving compression by coding only the changes in a video scene is called conditional replenishment (CR) , and it was the only temporal redundancy reduction method used in the first version of the first digital video coding international standard, ITU-T Recommendation H.120. CR coding consists of sending signals to indicate which areas of a picture can just be repeated, and sending new information to replace the changed areas. Thus, CR allows a choice between one of two modes of representation for each area, which we call Skip and Intra. However, CR has a significant shortcoming, which is its inability to refine the approximation given by a repetition.

Motion Prediction: Often the content of an area of a prior picture can be a good starting approximation for the corresponding area in a new picture, but this approximation could benefit from some minor alteration to make it a better representation. Adding a third type of “prediction mode,” in which a refinement difference approximation can be sent, results in a further improvement of compression performance—leading to the basic design of modern hybrid codecs (using a term coined by Habibi with a somewhat different original meaning). The naming of these codecs refers to their construction as a hybrid of two redundancy reduction techniques—using both prediction and transformation. In modern hybrid codecs, regions can be predicted using inter-picture prediction, and a spatial frequency transform is applied to the refinement regions and the Intra-coded regions.

Motion Compensation & Estimation: One concept for the exploitation of statistical temporal dependencies that was missing in the first version of H.120 was motion-compensated prediction (MCP). Most changes in video content are typically due to the motion of objects in the depicted scene relative to the imaging plane, and a small amount of motion can result in large differences in the values of the samples in a picture, especially near the edges of objects. Often, predicting an area of the current picture from a region of the previous picture that is displaced by a few samples in spatial location can significantly reduce the need for a refining difference approximation. This use of spatial displacement motion vectors (MVs) to form a prediction is known as motion compensation (MC), and the encoder’s search for the best MVs to use is known as motion estimation. The coding of the resulting difference signal for the refinement of the MCP is known as MCP residual coding.

It should be noted that the subsequent improvement of MCP techniques has been the major reason for coding efficiency improvements achieved by modern standards when comparing them from generation to generation. The price for the use of MCP in ever more sophisticated ways is a major increase in complexity requirements.

Fractional-sample-accurate MCP: This term refers to the use of spatial displacement MV values that have more than integer precision, thus requiring the use of interpolation when performing MCP. Intuitive reasons include having a more accurate motion representation and greater flexibility in prediction filtering (as full-sample, half-sample, and quarter-sample interpolators provide different degrees of low-pass filtering which are chosen automatically in the ME process). Half-sample-accuracy MCP was considered even during the design of H.261 but was not included due to the complexity limits of the time. Later, as processing power increased and algorithm designs improved, video codec standards increased the precision of MV support from full-sample to half-sample (in MPEG-1, MPEG-2, and H.263) to quarter-sample (for luma in MPEG-4’s advanced simple profile and H.264/AVC) and beyond (with eighth-sample accuracy used for chroma in H.264/AVC).

MVs over picture boundaries: The approach solves the problem for motion representation for samples at the boundary of a picture by extrapolating the reference picture. The most common method is just to replicate the boundary samples for extrapolation.

Bipredictive MCP: The averaging of two MCP signals. One prediction signal has typically been
formed from a picture in the temporal future with the other formed from the past relative to the picture being predicted (hence, it has often been called bidirectional MCP). Bipredictive MCP was first put in a standard in MPEG-1, and it has been present in all other succeeding standards. Intuitively, such bipredictive MCP particularly helps when the scene contains uncovered regions or smooth and consistent motion.

Variable block size MCP: The ability to select the size of the region (ordinarily a rectangular blockshaped region) associated with each MV for MCP. Intuitively, this provides the ability to effectively trade off the accuracy of the motion field representation with the number of bits needed for representing MVs.

Multipicture MCP: MCP using more than just one or two previous decoded pictures. This allows the exploitation of long-term statistical dependencies in video sequences, as found with backgrounds, scene cuts, and sampling aliasing.

Thursday, July 3, 2008

MPEG-2 System

Introduction

The increasing need for audio and video compression for storage and transmission purpose led to MPEG (Motion Picture Expert Group). MPEG1 (ISO/IEC 11172) is first standard finalized in 1993 by MPEG committee. MPEG1 standard is mainly targeted for digital storage application. MPEG2 (ISO/IEC 13818) is successor to the MPEG1 standard, which got finalized in 1998. MPEG2 standard is mainly targeted for digital broadcast and television application. It addresses video compression up to 15 Mbps, audio compression at 128kbps and method of multiplexing audio-visual data. MPEG2 standard consists of following subparts namely system, audio and video. The audio and video subpart addresses details of respective compression method. The system part covers method of combining one or more audio, video or data streams in to one or more system stream, audio-visual synchronization and buffer management.

MPEG2 system layer enhances the MPEG1 system by introduction of second way to multiplex and addition features to existing one. MPEG2 system defines two ways to multiplex for addressing different kind of applications as follows.

Program Stream(PS)
Transport Stream(TS)

Program stream is targeted for single program application on error free environment e.g. digital storage media. It is based on MPEG1 system stream. Popular DVD uses MPEG2 program stream for movie. Transport stream is targeted for simultaneous delivery of multiple programs on error-prone network e.g. broadcast environment, DVB etc.

Terms and Definitions

1. Program: - Program used to describe is channel or broadcast service.
2. Presentation Unit: - It is used to denote an uncompressed raw YUV picture frame or a raw
PCM audio frame.
3. Access Unit: - It is compressed presentation unit. E.g. audio access unit stands for encoded audio frame and video access unit stands for encoded video frame.
4. Elementary stream: - It is generic term to define successions of audio or video access units.
5. Audio Elementary stream: - It is used to denote consecutive audio access unit. In other words, it denotes output of audio encoder.
6. Video Elementary stream: - It is used to denote consecutive video access unit. In other words, it denotes output of video encoder.

MPEG-2 System Overview

MPEG2 system layer supports five basic functions.
1. Synchronization of multiple compressed streams on playback side
2. Interleaving of multiple compressed streams in to single or multiple system streams
3. Initialization of buffer for playback buffer for startup
4. Continuous buffer management
5. Time identification.

Similar on lines of audio and video subpart, system subpart does not specify architecture or implementation of encoder or decoder. However standard specifies the bitstream syntax and any implementations has to comply with specification. Even if with bitstream compliance, there exists lots of freedom in design and implementation.

MPEG-2 System Architecture

MPEG 2 system layer consists of two layers. Namely
1. PES Packet Layer
2. Multiplex layer
PES packet layer handles synchronization by means of time stamps, while multiplex layer handles interleaving of data and data continuity. The PES packet layer is common across the transport and program stream. The multiplexing layer is different for transport and program stream. The multiplexing layer for program stream is pack layer, while for transport layer it is transport packets. Transcoding is possible between program stream and transport stream using PES packets. In case of transport stream having multiple programs, user gets multiple program streams.

PES Packet Layer

Presentation time stamp (PTS): - It denotes time instant at which access unit from decoder buffer is removed, decoded and presented for viewer for display. In STD model, decoding process is assumed to be instantaneous without any delay. Any real system must compensate for actual decoding delay.

Decoding time stamp (DTS): - It denotes time instant at which access unit from decoder buffer is removed, decoded, but not presented to viewer to display. It is stored for display at later point of time. DTS is present in case of video sequences having B frame. In this case, there is rearrangement of video frames for encoding. The encoding frame order and display order are different. DTS always comes with PTS for any video access unit. DTS alone cannot exist for any video access unit. DTS value will always lesser then corresponding PTS. In case of audio and video sequence (without B frames) there is only one time stamp i.e. PTS. DTS is not applicable for these kinds of sequences.

Packet layer consists of packet header followed by access units. Packet can be fixed or variable length. Some important fields of packet header are shown below.

Stream_ID identifies audio/video packet data to its audio/video elementary streams. Significance of PTS and DTS is mentioned above.

Transport Packet Layer

Transport stream is mainly targeted for error-prone transmission. Transport packets consist of short fixed length packet. The packet size is always 188 bytes. It consists of 4 bytes header followed by adaptation field or payload or both. A PES a packet is divided into many transport packets with following constraints.

The first byte of PES packet must become first byteof the transport packet layer payload
Only data from one PES packet may be carried in a transport packet.

A PES packet is unlikely to fill integer number of payload of transport packets. In this case, excessive space in last transport packet is wasted using adaptation field of appropriate length in this transport packet. The resulting transport packets are outputted sequentially to form MPEG2 transport stream. Some of important field for transport packet header are as follows.

Sync_byte gets repeated every 188 bytes, which help in error recovery. Packet Identifier (PID) is used to associate data with corresponding elementary streams. If in a given transport packet, the start of access unit happens, then Payload_unit_start_indicator will be set. Continuity_counter increments for successive packets. This helps in identifying loss of packet. Adaption_field_control is used to fill last transport packet. There are some transport packets with specific PID carrying service information, program specific information and null packets.

Monday, June 16, 2008

Graphics Interface Standards

OpenGL

This is the original version of the OpenGL standard interface for 3D graphics on desktop systems. Microsoft also offers a similar standard, Direct3D® (D3D) interface. Both are commonly used on Windows® platforms, but Direct3D® is available only on Windows, where it’s dominate. OpenGL is the dominant standard on all other operating system platforms. OpenGL is known for its architectural stability. There have been several minor revisions, as the standard has evolved, but only one major revision, 2.0. Prior to the 2.0 version, OpenGL supported only vertex shading. Version 2.0 added pixel-shading capabilities. OpenGL has always required floating-point support, which is usually available on desktop systems.

OpenGL is essentially a library of functions which are designed so that they can be ported to any graphics-acceleration hardware without changing the application code that uses these functions. This library of functions is contained within a device driver known as the Installable Client Driver. Hence, OpenGL implementations are often referred to as “drivers” or “libraries”.

One of the great strengths of OpenGL is the abundance of quality resources for learning how to use it, including books, tutorials, online coding examples and classes. In particular, there are two essential books that every developer should have as a starting point – often referred to as the Red and Blue books. These are the official OpenGL Reference Manual and Programming Guide:

• The OpenGL Reference Manual
• The OpenGL Programming Guide

The complete specifications for OpenGL are also available publicly at:
www.opengl.org/

These are great resources for learning OpenGL and for reference during development, but keep in mind that they cover the desktop version which is quite large compared to the embedded OpenGL ES versions supported on the OMAP 2 and 3 platforms.

OpenGL ES 1.1

This is the fixed-function version of the OpenGL ES standard for doing vertex shading on embedded devices. This is the most widely used 3D graphics interface for embedded and mobile devices today. It is a subset of the OpenGL standard used widely on desktop systems. OpenGL ES also supports 2D graphics features for antialiased text, geometry and raster image processing. This embedded version is royalty-free and the complete specifications are publicly available at:
www.khronos.org/opengles/

There are two profiles of OpenGL® ES 1.1, Common and Common Lite. The Common Lite profile is for devices with no floating-point capability. Since all OMAP 2 and 3 devices have floating-point capabilities in hardware, only the Common profile is supported. Supported OSes include embedded Linux, Windows® Embedded® CE and Symbian OS™.

OpenGL ES is the most widely used 3D graphics interface on these platforms. Porting to other operating system platforms is available through TI approved graphics partners. Although this version of OpenGL ES is designed primarily for vertex shading, there is also support for a type of DOT3 per-pixel lighting known as bump mapping. Refer to the ChameleonMan example program in the software development kit (SDK) for an example of this. OpenGL ES 1.1 was preceded by version 1.0. The new functionality includes better support for multitexturing, automatic mipmap generation, vertex buffer objects, state queries, user clip planes and greater control over point rendering.

Sunday, June 15, 2008

Graphics Software Development for DSP

Computer graphics can be as simple as a library of functions that draw geometric shapes, such as lines, rectangles or polygons on a 2-dimensional plane, or copy pixels from one plane to another. Such planes are called bitmaps, drawing surfaces or canvases. They may represent the pixels on a visible display device (such as LCD), or they may be stored somewhere off screen, in an unseen memory area. Bitmaps have a bit depth that determines how many colors they can represent and their pixels may be defined in either the RGB or YUV color space.

The process of drawing shapes on a bitmap is called rasterization or rendering. This can be done directly by software running on the host processor (ARM®), digital signal processor (DSP) or by graphics acceleration hardware. Generally, it’s best to use an accelerator whenever possible because it will require fewer clock cycles and less power to achieve the same rendering.

In 2D graphics, copying pixels from one bitmap to another is a common operation called a BitBlt (Bit Block Transfer), which can be implemented in hardware or software. A typical 2D graphics accelerator is really a BitBlt accelerator, implemented in hardware. It is similar to DMA, but specialized to transferring data on pixel (rather than word) boundaries. BitBlt operations are the foundation of 2D graphics and this is how most graphical user interfaces (GUIs) have traditionally been built.

Simple 2D Graphics

Many applications are built entirely on relatively simple BitBlt operations. Some examples are the graphical user interfaces of Microsoft® Windows®, Linux, MacOS®, set-top boxes and mobile device platforms like Symbian OS™. Each time a user opens or drags a window in a GUI, hundreds of BitBlt operations are instantiated, so the performance of the BitBlts is important to achieve a responsive user interface. The math that is used to calculate BitBlt operations is typically all integer (no floating point), so this helps to achieve good performance.

The main limitation of BitBlt-based graphics is that they do not scale well. User interfaces are typically built for a specific display size (i.e., VGA; 640 × 480 pixels) and it’s difficult to make the graphics adaptable to different screen sizes. A good example of this is what happens to the Windows desktop when it’s changed from a high-resolution mode to VGA mode. The icons become huge and pixilated and most of them no longer fit on the display.

OMAP 2 devices feature a dedicated BitBlt accelerator which applications can access through the PVR2D API (Application Programming Interface). Under Windows Embedded® CE and Windows Mobile®, their DirectX® drivers will utilize the 2D accelerator directly.

OMAP 3 devices approach 2D acceleration differently. Instead of using BitBlt operations, OMAP 3 devices avoid using BitBlts entirely in favor of scalable 2D graphics programmed through OpenVG™ or a 3D interface, such as OpenGL® ES or DirectX Mobile (for Windows).

2D, 2.5D or 3D?

For those uninitiated to the graphics world, it can be difficult to judge whether a particular application is really implemented with 3D graphics, or if it is just making clever use of 2D, to look like 3D. For example, icons which appear to spin in 3D are often achieved on a 2D GUI by repeatedly copying a sequence of still images (BitBlts). Many games feature 3Dlooking characters and objects that really only move in two dimensions (around the screen). This is easy to achieve by applying shading effects to the objects once, then moving them about with BitBlts operations. This typically requires that the “light source” in the scene is at a fixed position; usually somewhere off of the screen. This works well for many applications.

The main criteria that necessitates 3D graphics capabilities is if the user will navigate into the scene at substantial depth, and/or the user’s view will rotate into the depth dimension (for example, the user can “turn around” and look at what’s behind him). Further, if these depth movements must be performed interactively and in real-time in order to achieve an immersive feel of “being in the scene”, this requires the shading of all the objects in the scene be rendered repeatedly to account for each change in the relative position of the user’s view and light source(s). These requirements, taken together, demand 3D graphics capabilities.

Another criterion is the need for a transformation matrix to convert the coordinate system in which the scene is built into the coordinate system of the user’s graphics display. For example, a VGA display is 640 × 480 pixels, but immersive 3D environments require a much larger range of coordinates than VGA allows. Typically, this requires coordinates with such a large dynamic range that they are best specified with either single- or doubleprecision floating-point. However, there are a few notable examples of this being done entirely with fixed-point precision (such as the first Sony PlayStation®).

Finally, 3D graphics accelerators are designed to copy bitmaps onto surfaces with any arbitrary orientation (in three dimensions) with respect to the camera (the display) and to transform the pixels so that their perspective is correct on the new shape. This is called texture mapping and it requires interpolation and filtering that is well beyond the BitBlt capabilities of 2D. The example below contrasts a typical BitBlt operation, which simply copies some portion of the source bitmap, with a texture mapping operation, which maps the bitmap to a different plane in 3D space which requires the pixels to be interpolated to achieve the correct perspective in the scene.

Of course, 3D accelerators add many more capabilities as well, like the ability to handle changes in light sources and removing surfaces which are obscured by other surfaces in the scene (culling).

Graphics Interface Standards

Read the above blog

Friday, June 6, 2008

TMS320DM642 Datasheets, Application Notes and User Guides

TMS320DM642 Video/Imaging Fixed-Point Digital Signal Processor (Rev. L) (tms320dm642.pdf, 1596 KB)
29 Jan 2007 Download

TMS320DM642 DSP Silicon Errata, Silicon Revisions 2.0, 1.2, 1.1, 1.0 (Rev. J) (sprz196j.pdf, 280 KB)
30 Aug 2005 Download

TMS320C6000 EMIF-to-External SDRAM Interface (Rev. E) (spra433e.htm, 8 KB)
04 Sep 2007 Abstract

TMS320DM642 to TMS320DM6437 Migration Guide (spraao2.htm, 9 KB)
29 Jun 2007 Abstract

Migrating from TMS320DM642/3/1/0 to the TMS320DM647/DM648 Device (spraam5.htm, 8 KB)
07 Jun 2007 Abstract

Thermal Considerations for the DM64xx, DM64x, and C6000 Devices (spraal9.htm, 8 KB)
20 May 2007 Abstract

TMS320DM642 Hardware Designer's Resource Guide (Rev. A) (spraa51a.htm, 8 KB)
25 Oct 2005 Abstract

TMS320C64x to TMS320C64x+ CPU Migration Guide (Rev. A) (spraa84a.htm, 8 KB)
20 Oct 2005 Abstract

TMS320DM64x Power Consumption Summary (Rev. F) (spra962f.htm, 9 KB)
18 Feb 2005 Abstract

Video Scaling Example on the DM642 EVM (spraa57.htm, 8 KB)
27 Sep 2004 Abstract

Driver Examples on the DM642 EVM (Rev. A) (spra932a.htm, 8 KB)
31 Aug 2004 Abstract

Use and Handling of Semiconductor Packages With ENIG Pad Finishes (spraa55.htm, 8 KB)
31 Aug 2004 Abstract

Interfacing a CMOS Sensor to the TMS320DM642 Using Raw Capture Mode (spraa52.htm, 8 KB)
30 Aug 2004 Abstract

The TMS320DM642 Video Port Mini-Driver for TVP5146 and TVP5150 decoder (spraa44.htm, 8 KB)
16 Jul 2004 Abstract

JPEG Network on the DM642 EVM (Rev. A) (spra938a.htm, 9 KB)
16 Jul 2004 Abstract

JPEG Netcam2 on the DM642 EVM (Rev. A) (spra937a.htm, 9 KB)
16 Jul 2004 Abstract

JPEG Netcam on the DM642 EVM (Rev. A) (spra936a.htm, 9 KB)
16 Jul 2004 Abstract

JPEG Motion on the DM642 EVM (Rev. A) (spra935a.htm, 9 KB)
16 Jul 2004 Abstract

Interfacing an LCD Controller to a DM642 Video Port (Rev. B) (spra975b.htm, 9 KB)
03 May 2004 Abstract

High Resolution Video Using the DM642 DSP and the THS8200 Driver (Rev. A) (spra961a.htm, 9 KB)
03 May 2004 Abstract

TMS320C6000 Tools: Vector Table and Boot ROM Creation (Rev. D) (spra544d.htm, 8 KB)
26 Apr 2004 Abstract

TMS320C6000 Board Design: Considerations for Debug (Rev. C) (spra523c.htm, 8 KB)
21 Apr 2004 Abstract

TMS320C6000 McBSP Initialization (Rev. C) (spra488c.htm, 8 KB)
08 Mar 2004 Abstract

TMS320C6000 EDMA IO Scheduling and Performance (spraa00.htm, 8 KB)
05 Mar 2004 Abstract

Adapting the SPRA904 Motion Detection Application Report to the DM642 EVM (spra950.htm, 8 KB)
10 Oct 2003 Abstract

An Audio Example Using Reference Frameworks on the DM642 EVM (spra942.htm, 8 KB)
31 Aug 2003 Abstract

MPEG-2 Loop Back on the DM642 EVM (spra941.htm, 8 KB)
31 Aug 2003 Abstract

MPEG-2 High Definition Decoder on the DM642 EVM (spra940.htm, 8 KB)
31 Aug 2003 Abstract

MPEG-2 Encoder on the DM642 EVM (spra939.htm, 8 KB)
31 Aug 2003 Abstract

JPEG Loop Back on the DM642 EVM (spra934.htm, 8 KB)
31 Aug 2003 Abstract

H.263 Loop Back on the DM642 EVM (spra933.htm, 8 KB)
31 Aug 2003 Abstract

Audio Echo on the DM642 EVM (spra931.htm, 8 KB)
31 Aug 2003 Abstract

Audio Demonstration on the DM642 EVM (spra930.htm, 8 KB)
31 Aug 2003 Abstract

The TMS320DM642 Video Port Mini-Driver (Rev. A) (spra918a.htm, 9 KB)
14 Aug 2003 Abstract

A DSP/BIOS AIC23 Codec Device Driver for the TMS320DM642 EVM (spra922.htm, 8 KB)
30 Jun 2003 Abstract

TMS320DM642 EVM Daughtercard Specification Revision 1.0 (spra920.htm, 8 KB)
25 Jun 2003 Abstract

Using IBIS Models for Timing Analysis (Rev. A) (spra839a.htm, 8 KB)
15 Apr 2003 Abstract

TMS320C6000 McBSP Interface to an ST-BUS Device (Rev. B) (spra511b.htm, 9 KB)
04 Jun 2002 Abstract

TMS320C6000 HPI to PCI Interfacing Using the PLX PCI9050 (Rev. C) (spra537c.htm, 8 KB)
17 Apr 2002 Abstract

TMS320C6000 Board Design for JTAG (Rev. C) (spra584c.htm, 8 KB)
02 Apr 2002 Abstract

TMS320C6000 EMIF to External Flash Memory (Rev. A) (spra568a.htm, 8 KB)
13 Feb 2002 Abstract

Cache Usage in High-Performance DSP Applications with the TMS320C64x (spra756.htm, 9 KB)
13 Dec 2001 Abstract

Using a TMS320C6000 McBSP for Data Packing (Rev. A) (spra551a.htm, 9 KB)
31 Oct 2001 Abstract

TMS320C6000 Enhanced DMA: Example Applications (Rev. A) (spra636a.htm, 9 KB)
24 Oct 2001 Abstract

Interfacing theTMS320C6000 EMIFto a PCI Bus Using the AMCC S5933 PCI Controller (Rev. A) (spra479a.htm, 8 KB)
30 Sep 2001 Abstract

TMS320C6000 Host Port to MC68360 Interface (Rev. A) (spra545a.htm, 8 KB)
30 Sep 2001 Abstract

Using the TMS320C6000 McBSP as a High Speed Communication Port (Rev. A) (spra455a.htm, 9 KB)
31 Aug 2001 Abstract

TMS320C6000 Host Port to the i80960 Microprocessors Interface (Rev. A) (spra541a.htm, 8 KB)
31 Aug 2001 Abstract

TMS320C6000 EMIF to External Asynchronous SRAM Interface (Rev. A) (spra542a.htm, 8 KB)
31 Aug 2001 Abstract

TMS320C6000 System Clock Circuit Example (Rev. A) (spra430a.htm, 8 KB)
15 Aug 2001 Abstract

TMS320C6000 McBSP to Voice Band Audio Processor (VBAP) Interface (Rev. A) (spra489a.htm, 9 KB)
23 Jul 2001 Abstract

TMS320C6000 McBSP: AC'97 Codec Interface (TLV320AIC27) (Rev. A) (spra528a.htm, 9 KB)
10 Jul 2001 Abstract

TMS320C6000 McBSP: Interface to SPI ROM (Rev. C) (spra487c.htm, 8 KB)
30 Jun 2001 Abstract

TMS320C6000 Host Port to MPC860 Interface (Rev. A) (spra546a.htm, 8 KB)
21 Jun 2001 Abstract

TMS320C6000 McBSP: IOM-2 Interface (Rev. A) (spra569a.htm, 8 KB)
21 May 2001 Abstract

Circular Buffering on TMS320C6000 (Rev. A) (spra645a.htm, 8 KB)
12 Sep 2000 Abstract

TMS320C6000 McBSP as a TDM Highway (Rev. A) (spra491a.htm, 9 KB)
11 Sep 2000 Abstract

TMS320C6000 u-Law and a-Law Companding with Software or the McBSP (spra634.htm, 9 KB)
02 Feb 2000 Abstract

General Guide to Implement Logarithmic and Exponential Operations on Fixed-Point (spra619.htm, 8 KB)
31 Jan 2000 Abstract

TMS320C6000 C Compiler: C Implementation of Intrinsics (spra616.htm, 8 KB)
07 Dec 1999 Abstract

TMS320C6000 McBSP: I2S Interface (spra595.htm, 9 KB)
08 Sep 1999 Abstract

TMS320C6000 DSP Multichannel Audio Serial Port (McASP) Reference Guide (Rev. J) (spru041j.htm, 8 KB)
20 Mar 2008 Abstract

TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (Rev. G) (spru732g.htm, 8 KB)
20 Feb 2008 Abstract

TMS320C6000 DSP Peripherals Overview Reference Guide (Rev. O) (spru190o.htm, 8 KB)
03 Dec 2007 Abstract

TMS320C64x DSP Video Port/ VCXO Interpolated Control (VIC) Port Reference Guide (Rev. F) (spru629f.htm, 8 KB)
15 Jun 2007 Abstract

TMS320C6000 DSP External Memory Interface (EMIF) Reference Guide (Rev. E) (spru266e.htm, 8 KB)
11 Apr 2007 Abstract

TMS320C6000 DSP Inter-Integrated Circuit (I2C) Module Reference Guide (Rev. D) (spru175d.htm, 8 KB)
26 Mar 2007 Abstract

TMS320C6000 DSP Peripheral Component Interconnect (PCI) Reference Guide (Rev. C) (spru581c.htm, 8 KB)
25 Jan 2007 Abstract

TMS320C6000 DSP Multichannel Buffered Serial Port ( McBSP) Reference Guide (Rev. G) (spru580g.htm, 8 KB)
14 Dec 2006 Abstract

TMS320C6000 DSP Enhanced Direct Memory Access (EDMA) Controller Reference Guide (Rev. C) (spru234c.htm, 8 KB)
15 Nov 2006 Abstract

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. C) (spru610c.htm, 8 KB)
28 Feb 2006 Abstract

TMS320C6000 DSP Host-Post Interface (HPI) Reference Guide (Rev. C) (spru578c.htm, 8 KB)
01 Jan 2006 Abstract

TMS320C6000 DSP Power-Down Logic and Modes Reference Guide (Rev. C) (spru728c.htm, 8 KB)
01 Mar 2005 Abstract

TMS320C6000 DSP 32-bit Timer Reference Guide (Rev. B) (spru582b.htm, 8 KB)
25 Jan 2005 Abstract

TMS320C6000 Chip Support Library API Reference Guide (Rev. J) (spru401j.htm, 8 KB)
13 Aug 2004 Abstract

TMS320C6000 DSP EMAC/MDIO Module Reference Guide (Rev. A) (spru628a.htm, 8 KB)
26 Mar 2004 Abstract

TMS320C6000 DSP General-Purpose Input/Output (GPIO) Reference Guide (Rev. A) (spru584a.htm, 8 KB)
25 Mar 2004 Abstract

TMS320C6000 DSP Designing for JTAG Emulation Reference Guide (spru641.htm, 8 KB)
31 Jul 2003 Abstract

TMS320DM642 EVM OSD FPGA User's Guide (spru295.htm, 8 KB)
26 Jun 2003 Abstract

TMS320C6000 DSP Cache User's Guide (Rev. A) (spru656a.htm, 8 KB)
05 May 2003 Abstract

TMS320DM642 Technical Overview (spru615.htm, 9 KB)
13 Sep 2002 Abstract

TMS320C64x Technical Overview (Rev. B) (spru395b.htm, 8 KB)
30 Jan 2001 Abstract

Monday, April 14, 2008

VirtualLogix and Texas Instruments Drive Digital Video Innovation with Development Platform for DaVinci™ Technology

Texas Instruments ships VirtualLogix software in its new TMS320DM6437 Digital Video Development Platform for DaVinci technology
New development platform includes VirtualLogix Linux at no cost and royalty free to enable customers using the TI DSP to develop and ship products using Linux
Real-Time Virtualization™ software from VirtualLogix enables TI DSP/BIOS™ kernel and Linux to run concurrently without the need for an additional application processor

TMS320DM6437 Digital Video Development Platform (DVDP) for DaVinci technology. The DVDP includes VirtualLogix Linux and an evaluation version of its VLX for Digital Multimedia v2.0. VirtualLogix and TI customers using TMS320DM643x DSPs can now develop and ship Linux-based products using TI DSP. By including VirtualLogix’s VLX for Digital Multimedia v2.0 customers can evaluate the benefit of running both TI DSP/BIOS™ kernel and Linux simultaneously on a single DSP to create feature-rich, high-performance, digital multimedia products faster and at a lower cost.

As digital video applications such as IP set-top boxes, videophones and IP netcams permeate the market, developers are looking to Linux to provide the rich functionality these products demand. By offering VirtualLogix’s Linux for TI’s DM643x DSPs based on DaVinci technology, developers can take advantage of Linux without giving up the real-time performance capabilities and broad ecosystem of TI’s DSP/BIOS kernel. The new DVDP offers a significant increase in software development options as well as the ability to ship products faster and at a lower cost.

Sunday, April 13, 2008

The Davinci Effect

DaVinci technology offers the right processors for digital video applications, combining digital signal processing and video accelerator technology. The DM643x processors based on DaVinci technology provide all of the processing capacity required to handle multiple functions on a single chip, and these processors are integrated with all of the peripherals necessary for a complete video/imaging processing system. As a result, developers don’t need to spend valuable time getting multiple components to work together as they have already been integrated.

For example, the TMS320DM6437 processor provides a powerful video front-end to handle key preprocessing functionality. A video back-end is also provided so that processed images can be displayed, such as for rear-view parking assist and night vision applications. From a peripheral perspective, three of the DM643x devices have an integrated high-end CAN controller, and SPI and UART peripherals, enabling it to tie into CAN or LIN bus of any automotive system. DDR2 memory support provides higher throughput to maximize system performance.

DM643x processors also provide specialized functionality optimized for active safety and ADAS applications. The video port front-end, for example, has several preprocessing blocks which can offload processing from the main processor, enabling more value-added active safety functions to be implemented on a single DSP. Specifically, the front-end offers a resizer block which can upscale and downscale an image to an appropriate resolution without using CPU cycles. The resizer block can free the CPU cycles when a section of an image (region of interest) needs to be normalized to a predefined size.

The TMS320DM643x processor video port front-end supports BT656, YCrCb, or Bayer format. The video port front-end on the TMS320DM6435 and TMS320DM6437 processors also offers a histogram function which provides distribution of pixel intensities of the captured frame. Using information from the histogram, the DSP can adjust the contrast to improve recognition accuracy. The video port front-end is also capable of conversion from Bayer to YCrCrb color space freeing up the TMS320C64x+ DSP core to perform other tasks.

Video processing involves large blocks of data. Processors have limited on-chip memory resources, so these must be managed carefully to minimize overall latency, especially when multiple algorithms are operating in parallel and sharing available resources.

Developers can ease the burden on memory resources by focusing on areas of interest and with the use of a fast L2 cache and enhanced DMA (EDMA). By using EDMA to preload the internal memory before data is needed, overhead of the CPU accesses to external memory can be reduced.

The DM643x processor EDMA v3.0 is capable of performing three-dimensional data transfers. The ports on predecessor EDMA v2.0 were limited to only two-dimensional transfers. Two-dimensional DMA transfer is sufficient when only one section of the image needs to be transferred from source to destination memory. In case multiple regions of the image need to be moved around support for the third DMA dimension is essential. Therefore, three dimensional transfers are useful when multiple regions-of-interest need to be transferred between external and on-chip memory that directly translates to higher efficiency.

Friday, March 28, 2008

DM6437 Architecture

The TMS320C64x+™ DSPs (including the TMS320DM6437 device) are the highest-performance fixed-point DSP generation in the TMS320C6000™ DSP platform. The DM6437 device is based on the third-generation high-performance, advanced VelociTI™ very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI), making these DSPs an excellent choice for digital media applications. The C64x+™ devices are upward code-compatible from previous devices that are part of the C6000™ DSP platform. The C64x™ DSPs support added functionality and have an expanded instruction set from previous devices.

With performance of up to 4800 million instructions per second (MIPS) at a clock rate of 600 MHz, the C64x+ core offers solutions to high-performance DSP programming challenges. The DSP core possesses the operational flexibility of high-speed controllers and the numerical capability of array processors. The C64x+ DSP core processor has 64 general-purpose registers of 32-bit word length and eight highly independent functional units—two multipliers for a 32-bit result and six arithmetic logic units (ALUs). The eight functional units include instructions to accelerate the performance in video and imaging applications. The DSP core can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 2400 million MACs per second (MMACS), or eight 8-bit MACs per cycle for a total of 4800 MMACS.

The DM6437 also has application-specific hardware logic, on-chip memory, and additional on-chip peripherals similar to the other C6000 DSP platform devices. The DM6437 core uses a two-level cache-based architecture. The Level 1 program memory/cache (L1P) consists of a 256K-bit memory space that can be configured as mapped memory or direct mapped cache, and the Level 1 data (L1D) consists of a 640K-bit memory space —384K-bit of which is mapped memory and 256K-bit of which can be configured as mapped memory or 2-way set-associative cache. The Level 2 memory/cache (L2) consists of a 1M-bit memory space that is shared between program and data space. L2 memory can be configured as mapped memory, cache, or combinations of the two.

The peripheral set includes:

2 configurable video ports;
A 10/100 Mb/s Ethernet MAC (EMAC) with a management data input/output (MDIO) module;
A 4-bit transmit, 4-bit receive VLYNQ interface;
An Inter-Integrated Circuit (I2C) Bus interface;
Two MultiChannel Buffered Serial Ports (McBSPs);
A Multichannel Audio Serial Port (McASP0) with 4 serializers;
2 64-bit general-purpose timers each configurable as 2 independent 32-bit timers;
1 64-bit watchdog timer;
A user-configurable 16-bit Host-Port Interface (HPI);
Upto 111-pins of General-Purpose Input/Output (GPIO) with programmable interrupt/event generation modes, multiplexed with other peripherals;
2 UARTs with hardware handshaking support on 1 UART;
3 Pulse Width Modulator (PWM) peripherals;
1 high-end Controller Area Network (CAN) controller [HECC];
1Pperipheral Component Interconnect (PCI) [33 MHz]; and
2 glueless external memory interfaces: an asynchronous external memory interface (EMIFA) for slower memories/peripherals, and a higher speed synchronous memory interface for DDR2.