Find Articles in:
All
Business
Reference
Technology
News
Lifestyle

Processor subsystem interconnect architecture for a large symmetric multiprocessing system

IBM Journal of Research and Development, May-Jul 2004 by Mak, P, Strait, G E, Blake, M A, Kark, K W, Et al

Integral to the significant capacity growth of the IBM eServer(TM) z990 (the eighth-generation zSeries� CMOS-based server) from its predecessor z900 system is the interconnect architecture, which tightly couples 48 customer CPUs in the system. A major attribute of this architecture is a new "hot swap" feature which improves zSeries system availability for customers by permitting the substitution or addition of a field-replaceable unit (FRU) in the processor subsystem without requiring the system to be powered down. The novel two-level interconnect architecture contains a distributed switch which connects up to four processor-memory nodes in book packages. The book packages, which are also FRUs, are connected in a dual concentric ring topology at the second-level (L2) interconnect. This architecture also contains an integrated 32-MB L2 cache and central switch connecting up to eight dual-core processor chips in a star topology at the first-level interconnect inside one of these nodes. This paper describes the bus protocol on the second-level interconnect, the cache coherency management throughout the storage hierarchy, and the ring topology reconfiguration for hot swap. Also described is a memory power management scheme to support the power demand from the 48 CPUs and up to 256 GB of memory.

Introduction

The z990 is the latest in the zSeries* line of enterprise servers. The z990 shares much with its predecessors, but also introduces significant advances. Some of the most significant advances are made possible by a new system package.

The z99U system comprises one to four book packages; each book package is a pluggable unit containing up to 12 processors and up to 64 GB of memory, I/O adapters, and a centralized switch and coherency manager known as the system control element (SCE), through which the processors and I/O connect. The SCE includes a second-level cache (L2 cache) and a pipelined switch that manages data routing and maintains strong storage coherency across the multiprocessing system. The shared L2 cache is interposed in the storage hierarchy between a private Ll cache dedicated to each processor and the fully shared, fully coherent memory of the z990. The minimum system configuration consists of one book; additional books may be plugged into a system and configured online without stopping the previously installed books, until the system reaches its maximum capacity of four books.

Figure 1 illustrates the system storage hierarchy. Main storage memory consists of one contiguous address space, physically located in processor memory arrays (PMAs) spread across the installed books. Each storage address corresponds to one physical location in a PMA on one of the books. Each book contains a shared 32-MB L2 cache that is used by all of the processors and I/O on the book. The contents of memory may be cached in one or multiple books. The processors on each node contain their own 512-KB Ll caches, represented in the figure in groups of two corresponding to the packaging of two processor cores on one chip. Each book contains up to twelve processors, though only six are shown in the figure; the ellipsis represents additional instances. The contents of the L2 cache may also be held in one or more Ll caches on the' book. The L2 cache maintains a full subset rule for all of the LIs; i.e., it contains a copy of the data stored in each of the LIs on the book. The L2 cache may contain additional data that is not in any Ll cache because of the removal of data from the Ll cache by the least recently used (LRU) replacement algorithm. This may happen because of pre-fetching of additional data from memory beyond what is requested by a processor or use of the L2 cache by I/O devices (data is not cached within z990 I/O devices, but may be held in the L2 cache for use by I/O devices). This L2 full subset rule is represented in the figure by showing the Ll caches contained within the L2 space. all changes made by processors to their respective Ll caches are immediately duplicated to the shared L2 cache. This cache structure continues the evolution of prior binodal zScrics systems which utilized dual shared L2 caches [1, 2] to up to four shared L2 caches in the z990 that define new intervention master (IM) and multicopy (MC) states extending the previous modified, exclusive, shared, invalid (MESI) cache management scheme.

The single largest advance in the z990 is the increased capacity made possible by the four-book ring-connected system structure, as compared with the binodal structure of the predecessor z900 and G6 [3]. Not only does this structure allow for more processors, memory, and I/O than before; it also, for the first time, allows additional books, containing additional processors, cache, memory, and I/O ports, to be hot -plugged and configured online into an operating system, a capability previously supported only for hot -plugging of I/O cards and for enabling processors and memory that were already installed but not configured online [4J. Table 1 summarizes some of the z990 advances in comparison with prior zSeries models. This paper describes design features that make the z990 system possible.

 

BNET TalkbackShare your ideas and expertise on this topic

The following tags are supported in BNET comments:
<b></b> <i></i> <u></u> <pre></pre>

Leave a Reply

  1. You are currently a guest | Login?
advertisement
Go
advertisement
  • Click Here
  • Click Here
advertisement