Large-scale, parallel embedded applications: A hardware design model for software engineers

International Journal of Electrical Engineering Education, Oct 2001 by Fleury, M, Self, R P, Downton, A C

A two-level computer model has been proposed25 that will consist of an invariant network-level structure and a variant node-level structure. In fact, this model only represents the target hardware and it is likely that a third-level is necessary which can emulate the parallel logic present, for the purposes of algorithmic development, and in order to retain a core prototype of the system, which can be projected upon varying target hardware. The parallel structure might be captured within a software component. An embodiment of the twolevel computer in given in Fig. 8. The first level consists of a Sparc host with a Myrinet26 NIC, itself controlled by a LANai communication coprocessor.

The host is responsible for initialization of its computation coprocessor and subsequent message handling. The computation coprocessor actually consists of four FPGAs with on-board NIC. The FPGAs are connected to the host by an 8-way crossbar to form what Myrinet call a System Area Network. (The FPGAs can also be connected locally in a ring topology for exchange of global results.) Higher-level connectivity is via Myrinet LAN (160 Mbytes in 1998), which replicates, for commodity processors, a form of interconnect common on supercomputers such as the Cray T3D.

Figure 9 shows one node with driver software installed for a particular application, automatic target detection by radar.27 From Fig. 10, it can be seen where this node fits within the processing pipeline. After data-capture and preprocessing of images by SAR, stage two, Focus of Attention (FOA), is responsible for extracting regions of interest within images with potential targets, moving vehicles. Second-level detection (SLD), performs simple time-domain template matching, which is embarrassingly parallel. Apart from the use of fine-grained hardware, this application has all of the characteristics of a PPF. It is, in fact, more aptly described as a PPC.

The same processing structure can be applied to sonar beam-forming. However, in detailed studies28,29 for time-delay, and frequency-space beamforming, it was established that there are non-obvious trade-offs between DSPs and FPGAs in terms of, respectively, memory access blockages caused by irregular addressing patterns, and a future bottleneck caused by a limit to the number of i/o pins available on an FPGA. However, the frequency-space implementation runs contrary to the view that silicon compilation will solve many problems, as the CORDIC algorithm, a hardware-oriented solution to calculation of trigonometric values, was needed for manipulation of phase and magnitude components.

Interestingly, an SMP is utilized in an alternative proposal30 with, apparently, fewer beams but interpolation of samples. A 12-processor Sun Ultra Enterprise achieved a speed-up of eleven with a throughput of 4.8 GFLOP/s. Word-level SIMD parallelism within the SPARC VIS instruction set was used, together with multithreading. In addition, advanced software techniques were applied such as compiler-directed data pre-fetching. Therefore, for optimal designs the software-based approach still needs to go beyond the generic to exploit features of a particular instruction set and micro architecture.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with ProQuest