Clocking and clocked storage elements in a multi-gigahertz environment

IBM Journal of Research and Development, Sep-Nov 2003 by Oklobdzija, Vojin G

Clocking considerations and the design of clocked storage elements are discussed in this paper. We present a systematic approach for deriving a clocked storage element suitable for "time borrowing" and absorption of clock uncertainties. We explain how to compare different clocked storage elements with each other, and discuss issues related to power consumption and low-power designs. Finally, results of comparisons among representative designs are presented.

Introduction

Deciding on the clocking strategy is one of the single most important decisions when designing a digital system. If the wrong strategy is employed, system bring-up and diagnostics can be very costly, and system operation will remain unreliable throughout its lifetime. The importance of clocking is gaining recognition as clock speeds rapidly increase, traditionally doubling every three years-and lately, every two years.

As clock speeds increase, the number of logic levels in the critical path diminishes. In today's high-speed processors, instructions are executed in one cycle, driven by a single-phase clock. In addition, the number of pipeline stages has increased to 15 or 20 to accommodate the increase in clock speed. Today, ten levels of logic in the critical path are common, and, as shown in Figure 1 [1], this number is expected to decrease further. The diminishing amount of logic placed between two pipeline stages is responsible in large part for the recent and rapid increase in clock frequency, an increase that has surpassed the traditional trend in technology scaling. This decrease in the amount of logic between two pipeline stages is occurring at about half the rate at which clock frequency is increasing, bringing the number of pipeline stages to roughly one half every six years. However, this trend cannot be expected to continue much longer because a minimal amount of logic (at least two stages) is necessary to make the pipeline stage meaningful. With deeper pipelines, any overhead associated with the clock system and clocking mechanism that directly and adversely affects machine performance is critically important.

At today's frequencies, the ability to absorb clock skew and use faster clocked storage elements (CSEs) results in a direct performance improvement comparable to those obtained through difficult implementations of architectural or microarchitectural techniques.

As the clock frequency reaches 5-10 GHz, traditional clocking techniques will be stretched to their limits, because three to five gates per stage would be barely useful. Beyond that frequency, traditional CSEs would be using as much logic as the pipeline stage. With power continuing to grow, requirements for low power would necessitate more efficient clocking solutions. Thus, new ideas and new ways of designing digital systems are required.

Clocking considerations in sequential systems

Clock distribution

The two most important timing parameters that affect the clock signal are clock skew and clock jitter.

Clock skew is a spatial variation of the clock signal as distributed through the system. It is caused by the various resistive/capacitive (RC) characteristics of the clock paths to the various points in the system and the different loading of the clock signal at different points on the chip. Further, we can distinguish global clock skew and local dock skew, which are equally important in the design of high-performance systems.

Clock jitter is a temporal variation of the clock signal with regard to the reference transition (reference edge) of the clock signal. Clock jitter represents edge-to-edge variation of the clock signal in time. As such, clock jitter can also be classified as either of two types: long-term jitter or edge-to-edge clock jitter, which is defined as clock-signal variation between two consecutive clock edges. In high-speed logic design, we are more concerned about the edge-to-edge clock jitter, because it affects the time available for the logic operation.

Typically, the clock signal has to be distributed to several hundreds of thousands of the CSEs. Therefore, the clock signal has the largest fanout of any node in the design and requires several levels of amplification. As a consequence, the clock system by itself can consume up to 40-50% of the power of the entire VLSI chip [2]. We must also ensure that every CSE receives the clock signal precisely at the same moment in time.

There are several methods of distributing the on-chip clock signal while minimizing clock skew and limiting the power dissipated by the clock system [3, 4], Two typical cases are an RC-matched tree and a grid [5].

If given superior computer-aided design tools, a perfect and uniform process, and the ability to route wires and balance loads with a high degree of flexibility, an RC-matched delay clock distribution tree would be preferable to a grid. However, we do not have a perfect and uniform process and a high degree of flexibility in routing and balancing loads. As a result, a grid is used when clock distribution on the chip has to be controlled very precisely, as is the case with high-performance systems. However, because the clock consumes more power when using a grid arrangement, and because local variations in device geometry and supply voltage are important components of the clock skew, it is necessary to use more sophisticated clock distribution than simple RC-matched or grid-based schemes. Active schemes with adaptive digital de-skewing typically reduce the clock skew of simple passive clock networks by an order of magnitude, allowing tighter control of the clock period and higher clock rates [6].


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with ProQuest