Many market segments (including video broadcasting, military, medical imaging, base stations) benefit from the use of high-density FIFO device solutions, which are programmable. And compared to SDRAM + FPGA architecture can significantly save cost and improve video quality, using system-level programming, can make high-density FIFO design simpler and lower cost. In this article, we will first introduce a few video applications, understand their data paths and the nature of the data that needs to be processed. Next, we will try to estimate the complexity of manipulating the data in the video processing pipeline. It will then introduce the programmable high-density FIFO and its capabilities, and how it can be a more efficient alternative to the current traditional frame buffer implementation using SDRAM and FPGA.
Video application overview:
Figure 1 shows the system block diagram of IPTV. The input transport stream can be in any encoding format such as DVB-ASI, MPEG2 or SDI, and they are converted (ie decoded or re-decoded) into H.264 transport streams by transmitting them with a multi-format CODEC. The encoded transport stream is encapsulated with channel information and sent over Ethernet. In the receive path, the incoming transport stream is decoded and post-processed before Display, such as noise reduction, color enhancement, scaling, de-interlacing, etc.
Figure 2 shows a system block diagram of an HD (high definition) professional camera used in film production and studios. The captured image goes through an image processor for color processing, brightness enhancement, digital scaling, frame rate conversion, and more. The image processing unit typically uses an FPGA-based design and changes frequently since most of the image processing is proprietary. The application processor manages communication with other devices and compresses and stores captured content to mass storage (HDD). The application processor also has a graphics engine for on-screen display (OSD), which is mixed with incoming video and displayed.
From the above example, we can see that data processing includes two types:
1) Frame synchronization:
Frame synchronization is required in some tasks (eg, when transmitting and receiving over Ethernet, when the stream speed is changing and the decoder needs a constant speed transport stream). While the memory requirement for synchronization may seem small, it can be significant when multiple streams are involved. This synchronization can be achieved by an asynchronous FIFO.
2) Frame storage:
Frame storage is required in these places: any temporal processing such as frame rate conversion, digital zooming (zooming), or performing de-interlacing. The number of frames stored increases as the amount of temporary information required increases. When the video data is in the original order, the frame buffer must also be “first in first out”.
From the above discussion, we can say that all storage and synchronization can be implemented using FIFO. So how big should the ideal FIFO be? A typical 1080p frame, 10-bit 4:2:2 format will require a memory size of 39.55M bits (pixels per line * lines per frame * bits per pixel = 1920*1080*20). The estimated total capacity can be multiplied by this number by the number of frames that need to be stored. A typical video processing algorithm needs to store 2 to 3 frames, which means a total capacity of 120M bits. Since such a large FIFO memory based on on-chip SRAM is not possible, the general approach is to use a DRAM to buffer this data.
High Density FIFO – Traditional implementation and its complexity.
Frame buffers are high-density FIFOs, traditionally implemented using external DDR SDRAM. An example of a typical video processing application and how these FIFOs are implemented.
Figure 3 shows the data path for a typical situation where there are 4 video streams from different sources that need to be displayed on the same display. Four HD cameras capturing video at 1080p60 (24-bit RGB) resolution are connected to the system using a cameralink interface. After color space conversion (from RGB to YCbCr) and chroma downsampling (from 4:4:4 to 4:2:2), horizontal and vertical frames are scaled down and stored in DDR2 SDRAM. Stored frames can be read back and positioned as required, and the resulting frames and fused frames are then upsampled and color space converted to drive the panel via an LVDS connection.
Let’s look at the memory size and bandwidth requirements:
(i) Size requirements:
Although there is no temporal processing involved here, in order to avoid two frames from one source being stored separately, so that while one frame is being written, the other frame may have to be read back. The size of two frame images is ((1920 * 1080* 16)/ 4) * 2 ~ = 63.3M bits.
(ii) Bandwidth requirements
Since the read and write paths are multiplexed, the required bandwidth is the sum of the read and write path bandwidths.
Write Path Frequency = (Frequency per Client) * (Number of Clients) = (148.5/4 ) * 4 = 148.5MHz Read Path Frequency = Output Frame Resolution Frequency = 148.5 MHz.
The actual operating frequency is ((read frequency + write frequency) / 2 + other overhead), because the interface operates at double data rate, and there are some other overheads such as DRAM memory refresh cycles, bank address switching, and so on. Assuming 80% efficiency, it will operate at 185MHz.
(iii) Memory interface size and I/O requirements:
When pictures are stored in 16-bit 4:2:2 format, a 16-bit interface is sufficient. According to the calculation, the total number of I/Os of the FPGA is 46:
Clock pins (2 for differential clock, 1 for clock enable) = 3 pins
Command pins (Chip Select, RAS, CAS, WE) = 4 pins
Address pins (14 address lines, 3 bank address lines) = 17 pins
Data line (X16 interface) = 16 pins
Data strobe and split (4 pins for 2 differential DQS, 2 for split data) = 6 pins
High Density FIFO – Discrete Memory:
Now let’s look at the implementation and feature definition of using discrete programmable high-density FIFOs so that DDR2 SDRAM memory can be replaced by simple data storage.
(i) Multi-queue features:
If the FIFO memory is defined as a single block of memory, it is not possible to write multiple video streams. Therefore, the FIFO must be able to be configured and divided into multiple queues. In the above example, there are four different frames to write, and the four frames must be simultaneously from different queues at the same time. Therefore, our application requires at least eight queues.
(ii) Detach and Retransmission:
It is possible that data once read from a standard FIFO is lost from the FIFO. The FIFO pointer can be reprogrammed, allowing any frame to be read out as many times as needed.
Figure 4 shows the block diagram of the Cypress CYFX072VXXX HD-FIFO.
Figure 5 shows an application example of using Cypress HDFIFOs to replace DDR2 chips.
Let’s take another look at the memory size and bandwidth requirements:
(i) Size requirements:
The memory size is the same as that of DDR2 SDRAM, which is the size of two frames of images ((1920 * 1080 * 16)/4) * 2 ~ = 63.3M bits.
(ii) Bandwidth requirements:
Since the read and write paths are separate, read and write operating frequencies can be different. This is a big advantage over DDR2 SDRAM.
Write Path Frequency = (Frequency per Client)*(Number of Clients)=(148.5/4)*4 = 148.5MHz
Read Path Frequency = Output Frame Resolution Frequency = 148.5MHz.
The actual operating frequency for read and write is a single data frequency of 148.5MHz, there is no additional overhead such as DRAM memory refresh cycles and bank address switching.
(iii) Memory interface size and I/O requirements:
When pictures are stored in 16-bit 4:2:2 format, a 16-bit interface is sufficient. According to the calculation, the total number of I/Os of the FPGA is 48:
Clock pins (1 for write clock, 1 for read clock) = 2 pins
Command pins (write enable, read enable, input enable, output enable, 3 pins to select which 8 queues to write, 3 pins to select which 8 queues to read, 1 pin for separation, 1 pin for transmission) = 12 pins
Data pins (16 pins for writing data, 16 pins for reading data) = 32 pins
Flags (1 pin for empty flag, 1 pin for full flag) = 2 pins
Advantages of discrete HD-FIFO over traditional implementations:
(i) Since the read and write paths are separated, there is no other operation overhead, and the operation frequency can be reduced by more than half, which are significant advantages.
(ii) Since the SDRAM controller is used, no arbitration mechanism is required, and the internal logic of the FPGA becomes simpler.
(i) The signal switching frequency is reduced by more than half, allowing for increased settling time margin, with no strict output synchronization requirements relative to DDR2.
(iv) The number of clock domains in the design is reduced, thus reducing the problems associated with timing switching and crossing clock domains.
(i) The signal switching frequency is reduced, thereby reducing switching noise on the circuit board.
(ii) The IO logic of HD-FIFO can be any LVCMOS interface, which has greater noise redundancy than the SSTL2 logic of DDR2 SDRAM.
Using HD FIFOs in high-end FPGA solutions can save FPGA resources as follows:
(i) SDRAM controller, reducing the required memory, I/O, and logic
(ii) Video processing functions, which can be implemented on HD FIFO using multi-queue features, such as:
a. Interlacing/De-interlacing of the video signal
b. PIP Implementation
c. Cross signal processing
Using high-density FIFOs can save logic elements, registers, memory, and I/O, and can help developers put high
The end-end FPGA is replaced with a smaller FPGA, which can save 20 to 30% of the cost.
The high-density FIFO is based on SRAM technology, providing customers with high data reliability and low latency. An easy-to-use bus interface reduces implementation and debugging efforts. High-density FIFO density can reach 144 Mb, speed can reach 150 MHz, with segment features, and many value-added features, such as multi-queue and optional memory architecture, can help developers design faster and more efficient, making their Suitable for a wide range of applications. It is already a proven solution that accelerates time-to-market while reducing associated design effort. The device also offers a wide range of expansion options, adapting to video broadcasting, military, medical imaging, base station (network) equipment for many applications such as:
• Normal HD format frame buffer (720p, 1080i, 1080p): stores four 1080p resolution frames
•HDTV/SDTV frame synchronization
• Exchange or format converter box
• High-end digital video cameras
• High-density cache in military radar
• Medical Imaging
• Base Station – 3G, 4G and Network