# Interconnected Rings and Oscillators as Gigahertz Clock Distribution Nets

Manuel Salim Maza and Mónico Linares Aranda. Instituto Nacional de Astrofísica, Optica y Electrónica, INAOE Luis Enrique Erro 1. Sta. Ma. Tonantzintla Puebla, Pue. México. Apdo. Postal 51 y 216. 011 52 (222) 2663100 ext. 1108 and 1420.

msalim@susu.inaoep.mx, mlinares@inaoep.mx

### ABSTRACT

The performance of interconnected rings and oscillators, working as clock distribution networks, is analyzed and compared among several configurations. The use of interconnected 3-inverter rings as globally asynchronous, locally synchronous clock distribution networks is proposed even for chip lengths from 4 to 24 mm. In this approach, modularity and basic cell properties are kept while the power consumption results directly proportional to the number of blocks. Typical 3.3V AMS  $0.35\mu m$  CMOS N-well process parameters were used for the analysis. Regarding the current area expansion, we show that interconnected rings is a more robust approach than the interconnected oscillators.

#### **Categories and Subject Descriptors**

B.7.1 [Integrated Circuits]: Types and Design styles – *microprocessors and microcomputers, VLSI (very large scale integration)* 

General Terms Measurement, Performance, Design.

### Keywords

Clock distribution networks, ring oscillators, GALS.

### **1. INTRODUCTION**

The maximum distance that a switching signal can travel across a region, in which the time of flight (TOF) does not limit the signal propagation, circumscribes a region known as the isochronous region. The size of this region decreases for an increasing relative length, chip area and operation frequency [1].

New techniques have been proposed to solve this problem in clock distribution networks (CDNs); some techniques offer solutions at the process or fabrication levels such as the flip-chip package [1]; or at the architecture level as asynchronous communication between blocks [2]; or the use of interconnected rings as the CDNs [3]. This last approach showed several

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

GLSVLSI'03, April 28-29, 2003, Washington, DC, USA.

Copyright 2003 ACM 1-58113-677-3/03/0004...\$5.00.

advantages such as good performance regarding scalability, low clock skew, and high-speed clocking. Moreover, CDNs have become a more dominant source of power consumption in high performance microprocessors as well as in portable devices [4]. Hence, low-power clock distribution design has become a major issue in the development of large-scale integration and portable device design.

This paper presents interconnected rings and interconnected oscillators as globally asynchronous, locally synchronous (GALS) CDNs for chip lengths from 4 to 24 mm, which are a trend in VLSI projects such as MCM and SoC designs [1].

### 2. INTERCONNECTED RINGS

Interconnected rings of 3 inverter stages have been studied before [5], but only reported for a triangular configuration, and not reported for large chip area and large loads. In figure 1 the interconnected rings proposed in this work are presented: 2 stages covering  $4x4 \text{ mm}^2$  chip area are shown. The height of the interconnected 3-inverter stage rings at 60° is reduced by sin 60° or  $1/\sqrt{3}$ , since equilateral triangles were used. In figure 2 the topology 3 *inv.*  $\pm 45^\circ$  1:1 is presented. This is an improved version of the approach presented in figure 1-c where only a ring per sink is used.

In figure 3, the wire length per stage are plotted, for different approaches of interconnected rings and global CDNs as H-tree or Quaternary tree with stage connecting (SC) [7]. Topology 3 *inv*.  $\pm 45^{\circ} 1:1$  is highlighted as having the minor number of inverters per stage and the second topology at wire length below the 5-inveter stage rings topology. Therefore, the topology 3 *inv*.  $\pm 45^{\circ} 1:1$  was selected to be analyzed in section 3.

Figure 3 also shows the exponential cost growth of global CDNs (even algorithm-based minimum-cost global CDNs), and the linear cost growth of the GALS networks versus number of stages.



Figure 1. 2 stages of interconnected rings showing its sinks: a) 6 rings at 60°; b) 8 rings at 45°; and c) 16 rings at ±45°, four rings per sink (4:1).



Figure 2. 3-inverter stage rings at ±45°, one ring per sink (1:1): a) 16 interconnected rings; b) 16 interconnected rings with buffers at each sink.



Figure 3. Wire length comparison between different global and local clock distribution topologies.

This is very important as chip lengths are growing and more stages will be required [1]. Furthermore, global CDNs will have more restrictions and limitations than GALS approaches [3].

The frequency of the array is close to the frequency of a single expanded cell due to border effects in the lattice. Look at the node 13 at the upper right corner of the array in figure 2-b. It have one inverter stage feeding it and two inverter stages as loads resulting in a decrease in swing at that node and slowing the array. Now look at the node 12 at the center of the upper side in the same array, it also have two inverter stages as loads but three inverter stages feeding it resulting in an increase in swing at that node and speeding the array. Now suppose that all the inverters are reversed, then node 13 have two inverter stages feeding only one inverter and node 12 have two inverter stages feeding three. This second option is preferable because in the slow case there is a relation of 2:3 (sources vs sinks) but in the original diagram, the slow case is 1:2, more severe. All the other nodes in the array have a relation of 4:4 or 6:6, and the inversion of stages have not effect at all.

The oscillation conditions are very easily met using an odd number N of digital inverters since they have very large gain. The interconnection length represents load to a stage and limits its speed and output swing; we obtain

Freq array 
$$\propto \beta \gamma / C_{\text{LOAD}}$$
 (1)

where Freq <sub>array</sub> is the global oscillation frequency,  $\beta$  is the betaratio of the inverter stage (current-drive strength due to W/L relations),  $\gamma$  is the sources vs sink relation in the array ( $\gamma$  tends to 1 if the array is large) and C<sub>LOAD</sub> is the pure capacitive load seen from a single inverter stage including the expanded interconnection (as a first approximation).

## 3. MODULARITY IN INTERCONNECTED RINGS

There is a notorious trend in the use of large chip lengths in largescale integration projects as MCM and SoC designs [1]. For this reason, the modularity of the 3 *inv*.  $\pm 45^{\circ}$  1:1 configuration is explored at chip lengths from 4mm to 24mm as shown in figure 4. In figure 4b the topology presented in figure 2 is depicted in a simplified way.

All topologies were extracted from the layout, using *Virtuoso* of *Cadence*, and were simulated in *HSPICE* for the AMS  $0.35\mu$ m technology. The  $6\pi$ -RLC model was used for the interconnections. Power consumption, operation frequency, clock skew between sinks and time of plateau measurements were performed for 4, 16, 36, 64, 100 and 144 interconnected rings and loads of 0, 50, 100 and 200fF at the sinks.



Figure 4. Modular representation of the 3 inv.  $\pm 45^{\circ}$  1:1 topology. a) 4 rings; b) 16 rings; c) 36 rings; d) 64 rings.



Figure 5. Output Waveforms from post-layout simulation for 16 interconnected rings.



Figure 6. Merit figures for interconnected 3 *inv.* ±45° 1:1 rings with 200fF load at final sinks.

Time of plateau is measured when the signal is below 10% or over 90% of power supply Vdd. As this time is closer to 50% of the period of the signal, the output waveform is closer to a square waveform and can be directly used by digital circuitry.

Figure 5 shows the output waveforms from post-layout simulations for 16 interconnected rings. The array generates sine-like waveforms at the GHz range with a very short settling time (< 1ns).

Figure 6 shows the merit figures for interconnected rings. The operation frequency and power consumption per ring are almost constant and the clock skew is always below 4% of the signal period even for the 144 interconnected rings which correspond to a 24mm chip length and 200fF load at every sink. Time of plateau improves with the number of rings.

The interconnected rings' performance has been shown: By repeating the basic ring, the array will keep the basic cell properties satisfactorily, and the power consumption will be proportional to the number of rings.

## 4. INTERCONNECTED RELAXATION OSCILLATORS

There are great benefits when moving from 5-inverter stage rings [3] to 3-inverter stage rings such as less power consumption and higher operation frequency. Due to this, it is worth to investigate the performance of 1-inverter stage ring or equivalent oscillators in the interconnected arrays. It was observed that oscillators of this kind require a proper design to assure oscillation and stability, moreover the output swing is reduced. In order to show this, an interconnected oscillators array was obtained and analyzed.

A relaxation oscillator shown in figure 7 was considered because is fast, simple and presents an acceptable output swing [6]. It was divided in four sub-modules; this is preferable to get a square lattice. We have got three different kinds of sub-modules: M57 and M68 is the same sub-module; M12 and M34 have different sizes so they are considered two different sub-modules. This sizing is necessary to meet the oscillation conditions. Submodules M12 and M34 are considered sources because they connect to Vdd and sub-modules M57 and M68 are considered sinks because they connect to ground. Symmetry is applied to get an array. One 2x2 lattice is depicted in figure 8 and its simplified form is shown in figure 9-a. In this array the corner nodes have a 1:1 source vs sinks relation; node 22 has 2:2; nodes 12 and 32 have 2:1; and nodes 21 and 23 have 1:2. Considering this, eq. (1) could be used with  $\gamma$  as the sources vs sink relation seen by every node in the array which tends to 1 if the array is large,  $\beta$  as a beta-ratio of the oscillator as a whole after expansion, not per sub-module, and C<sub>LOAD</sub> as the equivalent pure capacitive load added by the interconnection.

In figure 8, notice how there are only two M34 sub-modules for all the array, specifically, one M34 sub-module for two basic oscillators, and this could lead to redesign that sub-modules.

The same analysis performed at Section 3 was done. The modular representations of 4, 16, 36 and 64 interconnected relaxation oscillators are shown in figure 9.



Figure 7. Relaxation oscillator divided in 4 sub-modules (M12, M34, M57 and M68) for interconnection.



Figure 8. 2x2 Lattice of relaxation oscillators obtained after apply symmetry.

4

|             | → 8mm ⊷ | <b>н</b> -12mm → | н—16mm —н |
|-------------|---------|------------------|-----------|
| lmm<br>I≪→I |         |                  |           |
| (a)         | (b)     |                  |           |
|             |         | (c)              | (d)       |

Figure 9. Modular representation of the interconnected relaxation oscillator topology. a) 4 osc; b) 16 osc; c) 36 osc; d) 64 osc.



Figure 10. Output Waveforms from post-layout simulation for 16 interconnected relaxation oscillators.



### Figure 11. Merit figures for interconnected oscillators with 200fF load at final sinks.

Figure 10 shows the output waveforms from post-layout simulations of 16 interconnected relaxation oscillators. The array generates sine-like waveforms at the GHz range. Comparing with interconnected rings (fig. 5), interconnected oscillators present less swing, frequency and more clock skew.

Figure 11 shows the merit figures for interconnected relaxation oscillators. Operation frequency and power consumption per oscilator do not present a large variation but clock skew overshoot to 10% of the period of the signal in a 36 interconnected oscillators array in a 12x12mm chip. For 16 sinks in an area of 8x8mm<sup>2</sup>, global approaches consume about 2mW at 250MHz [7] versus our local proposals: Interconnected rings consumes 113mW and interconnected oscillators consumes 57mW, both at frequencies around 1.2GHz. Further research must be done in decrease power consumption of the array.

The degraded performance that interconnected relaxation oscillators have in comparison with interconnected rings has been shown. Interconnected oscillators are more sensitive to expansion because analog signals must travel, threw long interconnections, among sub-modules, which represent undesirable load in the oscillator. As a semiconductor lattice structure, the borders of the arrays do not present exactly the same behavior than in the rest of the array, due to this quaternary nodes were selected to feed sinks in both topologies and borders were left as dummy structures.

### 5. CONCLUSIONS

Simulation results shown that a proposed configuration of interconnected 3-inverter-stage rings is more robust to arrayexpansion than its equivalent of relaxation oscillators of one single stage. Interconnected oscillators present approximately three times more clock skew than interconnected rings but two times less power consumption. Both configurations have a power consumption linear proportional to the number of rings and oscillators.

Rings and oscillators arrays have been sent to fabrication. There is a strong effort to reduce the power consumption on interconnected rings. Further results will be published in future works.

### 6. ACKNOWLEDGMENTS

This work was partially supported by Consejo Nacional de Ciencia y Tecnología (CONACYT-MEXICO) under grant no. 34557-A and scholarship 129236

#### 7. REFERENCES

- [1] "Special Issue: Interconnections addressing the next challenge of IC Technology". Proceedings of the IEEE, April and May 2001. pp. 478, 484.
- [2] Thomas Meincke et al. "Globally asynchronous locally synchronous architecture for large high-performance ASICs", Proceedings of the IEEE ISCAS, May 30-June 2, 1999, Orlando, Florida, pp. 512-515.
- [3] Lars Bengtsson and Bertil Svensson, "A Globally Asynchronous, Locally Synchronous SIMD Processor", Proceedings of the Third International Conference on Massively Parallel Computing Systems (MPCS'98), Colorado, USA, April 2-5, 1998.
- [4] Jatuchai Pangjun and Sachin S. Sapatnekar. "Low-Power Clock Distribution Using Multiple Voltages and Reduced Swings". IEEE Trans. on VLSI Systems, Vol. 10, No. 3, June 2002, pp. 309-318.
- [5] G. Moon, H. Kim., M. Ismail, and C. Hwang, "A New GHz CMOS Cellular oscillator Network", IEEE ISCAS, Monterrey, CA, USA, May, 1998.
- [6] A. Ahmed, K. Sharaf, H. Haddara, H. F. Ragai. "CMOS VCO-Prescaler Cell-based Design for RF PLL Frequency Synthesizers". IEEE ISCAS May 28-31, 2000, Geneva, Switzerland
- [7] M. Salim and M Linares. "Analysis of Clock Distribution Networks in the Presence of Cross Talk and Grounbounce". Proceedings of IEEE ICECS 2001, Malta.