Large-scale, parallel embedded applications: A hardware design model for software engineers

International Journal of Electrical Engineering Education, Oct 2001 by Fleury, M, Self, R P, Downton, A C

PPF is suitable for continuous flow, data-driven (not control-driven), soft real-time applications, allowing possible throughput, latency, and ordering constraints to be accommodated. It has been used to illustrate top-down design on realistically complex, large-scale embedded systems (5,000-50,000 lines of code). The value of such an approach is that the system is viewed as a whole, before delving into the implementation of a small part of the system. This ensures that resources and effort are not misdirected into optimizing one small part of the application that has little or no impact on overall performance.

Extending PPF

PPF is aimed at software-based systems with medium-grained, homogeneous processors. However, an application can contain algorithms within a pipeline, which are better suited to fine-grained parallelism, and which can become a bottleneck. Two examples are: (1) cellular-array algorithms, relaxation algorithms in optical-flow motion detection suitable for target-tracking, (2) finely synchronous algorithms, beam-forming of the adaptive- or delay-filter varieties. The problem has been that, in recent times, producers of SIMD fine-grained hardware have not established themselves in the market-place. With the advent of high-density FPGAs that situation has changed.^

When the Karhunen-Loeve Transform (KLT) is partitioned to form a PPF the solution is not entirely satisfactory23 if confined to available medium-- grained hardware. The reason Fig. 7 is an `ill-conditioned' pipeline is that the central eigenvector calculations, being on a low-dimensional matrix, cannot be farmed effectively, but remain as a bottleneck. Due to the large data flows between the two farmers, it is preferable to keep one array or ensemble of images within the same processor, preventing a physical partition of the pipeline. However, the covariance and transform farms are embarrassingly parallel, suggesting that FPGAs could be deployed. Therefore, the eigenvector calculations could be mapped to a RISC, while the two transforms could be mapped to FPGA, thus balancing in time the eigenvector calculation with covariance and transform processing. When a coprocessor model is introduced, it thus immediately becomes necessary to partition between the software parts running on the RISC and the hardware parts running on the FPGA.

Pipelined examples

This section considers examples of the type of embedded application design for which training is required.

Radar systems24 provide large-scale, embedded applications which illustrate the problems that will face new engineers. Sample rates are now approaching 500 MHz, decreasing the sample time frame available to give a real-time response. Multidimensional and non-linear signal-processing techniques are being used for 2D and 3D imaging of terrain or objects within Synthetic Aperture Radar (SAR), and target-tracking radar is now required to give estimates of the velocity of moving targets. Algorithm changes within an application are endemic, though there are a few staples such as the FFT. I/O bandwidth may be critical and air-borne systems will require compact, lowpower solutions.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with ProQuest