

# AN INVESTIGATION OF ORDERING, tearing, and latency algorithms FOR THE TIME-DOMAIN SIMULATION OF LARGE CIRCUITS 

PING YANG

## UNIVERSITY OF ILLINOIS - URBANA, ILLINOIS 830131 102

UNCLASSIFIED
security elassification of this page (men Dato Enierod)

seeumity elassification of this magerman Dace Encemod)
simulation of integrated circuits which can alleviate the problems of excessive compuration time and high storage requirements. A new ordering scheme for the modified nodal approach was developed, and some new algorithms for the dc and transient analysis of logic circuits were studied. Different tearing methods and sparsity considerations for the node tearing method were theoretically and experimentally studied. Latency at the subcircuit and the network levels was investigated. Different latency criteria were proposed and studied. The resul: of this research is a new general purpose circuit simulation program SLATE.

AN INVESTIGATION OF ORDERING, TEARING, AND LATENCY ALGORITHMS FOR THE TIME-DOMAIN SIMULATION OF LARGE CIRCUITS
by
Ping Yang

This work was supported by the Joint Services Electronics Program under Contract N00014-79-C-0424.
 the United States Government.


Approved for public release. Distribution unlimited.

1
1


| Accession For |
| :--- |
| NTIS GRA\&I |
| DTIC TAB |
| Unannounced |
| Justification |
| By |
| Distribution/ |
| Avallability Codes |
| Dist Avail and/or |

AN INVESTIGATION OF ORDERING, TEARING, AND LATENCY ALGORITHMS FOR THE TIME-DOMAIN SIMULATION OF LARGE CIRCUITS

BY
PING YANG
B.S., National Taiwan University, 1974 M.S., University of Illinois, 1978

THESIS
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering
in the Graduate College of the University of Illinois at Urbana-Champaign, 1980

Thesis Adviser: Professor Timothy N. Trick and
Professor Ibrahim N. Hajj

Urbana, Illinois

Ping Yang, Ph.D.

Coordinated Science Laboratory and Department of Electrical Engineering University of Illinois at Urbana-Champaign, 1979

Many circuit simulation prugrams have been available for the design of integrated circuits. However, these conventional circuit simulation programs calculate all of the node voltages or branch voltages and currents at each iteration and each timepoint. Even with sparse matrix techniques the simulation of modern large-scale integrated (LSI) circuits is not possible in many situations due to the excessive computation time and high storage requirements.

The goal of this research was to investigate new approaches to the simulation of integrated circuits which can alleviate the problems of excessive computation time and high storage requirments. A new ordering scheme for the modified nodal approach was developed, and some new algorithms for the dc and transient analysis of logic circuits were studied. Different tearing methods and sparsity considerations for the node tearing method were theoretically and experimentally studied. Latency at the subcircuit and the network levels was investigated. Different latency criteria were proposed and studied. The result of this research is a new general purpose circuit simulation program SLATE.

## ACKNOWLEDGEMENTS

First, I would like to thank Professor T. N. Trick, my dissertation advisor. With his thorough knowledge of the field, he provided valuable guidance and help throughout the course of this research. Also, with tis great personality, he provided support and encouragement in all respects of my life throughout my graduate study. I would also like to thank Professor I. N. Hajj, my dissertation co-advisor, for the many helpful discussions with him and suggestions. I am also grateful to Professor M. R. Lightner, who invented the name 'SLATE' for my program and offered a lot of advice and encouragement. I also wish to thank Professor W. R. Perkins for being member of my dissertation committee and his support.

I am also indebted to my wife, Jau-Yuann, for her help in preparing this dissertation and her understanding throughout my graduate study.

Finally, I would like to thank my parents, I-Lueh and Queh-Chu Yang, for their love and support throughout the past years.

## TABLE OF CONTENTS

Chapter Page
I. INTRODUCTION ..... 1
II. NEW REORDERING STRATEGY FOR THE MODIFIED NODAL APPROACH ..... 6
2.1 Problems with Previous Methods ..... 7
2.2 New Partitioning and Ordering Strategy ..... 16
2.3 Theorems and Examples ..... 21
2.4 Results ..... 27
2.5 Discussion ..... 33
III. MODIFIED NEWTON METHOD AND PIECEWISE-NONLINEAR APPROACH ..... 34
3.1 The Newton-Raphson Algorithm ..... 35
3.2 Problems with the Newton-Raphson Algorithm ..... 35
3.3 Piecewise Nonlinear Approach ..... 37
3.4 A New Modified Newton-Raphson Method for Bipolar Devices ..... 42
3.5 Piecewise Nonlinear Approach for Bipolar Devices Including Avalanche Effects ..... 67
3.6 Piecewise Non1inear Approach for MOS FET ..... 68
3.7 Discussion ..... 70
IV. NUMERICAL INTEGRATION ..... 77
4.1 Problems with Previous Work ..... 78
4.2 Algorithm ..... 91
V. TEARING METHODS AND SPARSITY CONSIDERATIONS FOR NODE TEARING METHOD ..... 94
5.1 Derivation of the Branch Tearing Method ..... 100
5.2 Derivation of the Node Tearing Method ..... 105
5.3 Comparison of the Branch Tearing Method with the Node Tearing Method ..... 109
5.4 Constructing the Node Tearing Matrix from Subnetworks ..... 111
5.5 Sparsity Considerations for the Node Tearing Method ..... 116
5.6 Implementation of the Node Tearing Method ..... 129
5.7 Circuit Interpretation of the Tearing Methods ..... 133
5.8 Discussıun and Conclusion ..... 136
VI. LATENCY EXPLOITATION ..... 137
6.1 Latency Exploitation at the Subnetwork Level ..... 138
6.2 Latency Exploitation at the Network Level ..... 160
6.3 Discussion ..... 164
VII. CONCLUSIONS ..... 165
REFERENCES ..... 171
VITA ..... 176

## I. INTRODUCTION

The design of integrated circuits requires an accurate method of predicting circuit performance. The traditional breadboard method is not able to satisfy the above requirement because of the fact that the parasitic components that are present in the breadboard are entirely different from the parasitic components that are present in integrated circuits, so a circuit simulation program is a must. Conventional circuit simulation programs [1-11] possess two serious limitations: a computer storage requirement and a computing time requirement, so the size of the circuit that can be simulated is limited. With the advances of circuit simulation techniques, the size of the circuit that can be simulated has increased; but the simulation of large scale integrated (LSI) circuits is still beyond the capabilities of present circuit simulation programs.

The goal of this research was to study new approaches to the simulation of integrated circuits which can alleviate the above two limitations, namely the repetitiveness and latency properties of digital integrated circuits. Since a DEC-10 version of SPICE2 was available to us, it was decided that this program would serve as a vehicle for testing our algorithms. However, in the initial phases of our research, it was found that our version of SPICE2 had several deficiencies in the implementation of some of its algorithms which occasionally caused numerical difficulties. In order to resolve these difficulties a new reordering scheme for the modified nodal approach was developed, a new concept - a piecewise nonlinear approach

- for the Newton-Raphson iteration was proposed, and two problems with the numerical integration algorithm were resolved. The new reordering scheme for the modified nodal approach not only avoids zero diagonal pivot elements which increases numerical accuracy, but it also significantly reduces the number of fills in the matrix which reduces the computational cost. The piecewise nonlinear approach reduces the number of iterations needed to find the solution of a nonlinear circuit and improves the global convergence property of Newton-Raphson method. The resolution of the two problems with the numerical integration algorithm provides more efficiency and accuracy. All of these new developments result in a modified version of SPICE2 (YSPICE), which is 2 to 5 times faster than SPICEZ.

Although YSPICE is more efficient and more accurate than SPICE2, it is still not powerful enough to handle LSI circuits simulation problems. Experience has shown that LSI circuits possess properites which can be exploited to improve the storage and computing time requirements. The two properties are the repetitiveness of a limited number of subcircuits and the latency that may exist within parts of the circuits during an analysis. Conventional circuit simulation programs do not exploit these two properties, so all of the node voltages or branch voltages and currents are calculated at each iteration and each timepoint. In order to increase the capabilities of circuit simulation programs substantially, these two properties must be fully exploited. When the first property is exploited both computer storage requirements and computing time can be reduced in several ways.

First, only one subcircuit description for each type of repetitive subcircuit need be stored; secondly, only one set of small submatrix sparse matrix pointers for each type of repetitive subcircuit is needed so that both storage and preprocessing time can be saved; thirdly, if one type of subcircuit is linear, then the LU factorization of that type of subcircuit need be found once only. When the second property is exploited, we only need to solve for the active parts of the circuit and this reduces the computational effort considerably. Tearing methods, first introduced by Kron [12], are well suited for the exploitation of these two properites as well as the sparsity of the network. Recently, the use of tearing methods and latency [13-24] has been studied to exploit these two properties, but in order to fully exploit these two properties more research effort is needed.

In the second stage of our research, these two problems were studied extensively and the result of our investigations is a new general purpose circuit simulation program SLATE (a Simulator with Latency and Tearing). SLATE evolved from YSPICE, so it has all the good features of YSPICE: in addition, several new approaches are used. First, the new reordering strategy for the modified nodal approach is used at both the subcircuit and interconnection levels; secondly, ways of exploiting sparsity that exist at the subcircuit and interconnection levels were theoretically and experimentally studied and the most efficient way is used; thirdly, node tearing is used such that the program is more efficient and the final equation formulation is suitable for latency exploitation and parallel processing; fourthly,
latency in he Newton-Raphson iterations is exploited not only at device and subcircuit levels, but also at interconnection levels; fifthly, latency in the time domain is exploited not only at device and subcircuit levels, but also at interconnection levels; sixthly, three latency in time criteria schemes were studied thoroughly in relation to the spread of the time constants in the subcircuits and the best scheme was determined; and lastly, the interconnection matrix formulation method is general enough to accommodate the situation when there are no subcircuits specified in the network or when the interconnection circuits consist of more than tearing nodes.

Both YSPICE and SLATE are written in FORTRAN and have a SPICE-like input language for user convenience. If no subcircuits are used, then the methods of analysis of SLATE is equivalent to that of YSPICE, that is, YSPICE is a subset of this new program SLATE. Simulation results indicate that the speed of SLATE is about an order of magnitude faster than SPICE2, and the output results are either the same as or more accurate than those of SPICE2.

The new reordering scheme for the modified nodal approach is described in Chapter 2, and the comparison between this new scheme and that used in SPICE2 is given The piecewise nonlinear approach is explained in Chapter 3 and simulation results are given. The two problems with numerical integration are detailed in Chapter 4 , and the solution is given. Chapter 5 introduces the concept of tearing methods and gives the sparsity consideration for the node tearing method. Chapter 6 describes three latency criteria and gives the simulation
results of these three schemes. Finally, in Chapter 7 a summary of SLATE performance is given, the conclusions are presented, and areas for future work are described.


#### Abstract

The modified nodal approach (MNA) [25] has been widely used in many computer-aided circuit analysis programs $[1,11,26,27]$ for formulating circuit equations. It is well known, however, that while the more restrictive nodal approach in general produces nonzero diagonal elements for pivoting, the modified nodal approach, although more general, may produce zero diagonal entries in the network matrix. This occurs, for example, when the circuit contains voltage sources, short-circuits, inductors at zero frequency (dc solution) and some types of controlled sources. When sparse matrix techniques with diagonal pivoting are used for solving these types of circuit equations, extreme care should be taken so as not to choose a zero-valued pivot. Two methods have been proposed for avoiding pivoting on these zero diagonal entries. One method (method 1) involves ordering the rows and columns with zero diagonal entries last, in the hope that they will be filled before becoming candidates for pivoting [1,11]. Another method (method 2) involves rearranging and/or combining rows and columns in order to obtain nonzero diagonal elements [25]. However, as we show below, there are two problems with these methods. First, even if all the zero diagonal elements which exist in the network matrix at the formulation stage are avoided or filled during the elimination stage, it is possible to generate zero diagonal elements during the Gaussian elimination process regardless of the values of the circuit elements; Secondly, these methods usually are not efficient. For example, forcing the zero-diagonal entries to be last usually increases the number of fills considerably.


In this chapter a new reordering scheme for the modified nodal approach is described which avoids zero diagonal pivots in essentially all practical cases and is very efficient. In Section 2.1, the problems with previous methods are illustrated and explained. In Section 2.2, the partitioning of the circuit variables is detailed and the ordering strategy is introduced. In Section 2.3, theorems and examples are given. The implementation of this new scheme resulted in YSPICE. The simulation results from YSPICE are given in Section 2.4. In this Section examples are given which caused computational problems in our DEC-10 version of SPICE2 due to pivoting on zero diagonal elements, but which were successfully analyzed by $\operatorname{ISPICE}$. Also the number of fills produced by YSPICE is much less than that produced by SPICE2. In Section 2.5, a discussion of this new ordering strategy is given.
2.: Problems with Previous Methods

The MNA matrix can in general be expressed in the form [25]

$$
\left[\begin{array}{ll}
\underset{\sim}{Y} R & \underset{\sim}{B}  \tag{2,1}\\
\underset{\sim}{C} & \underset{\sim}{D}
\end{array}\right]\left[\begin{array}{l}
\underset{\sim}{V} \\
\underset{\sim}{I}
\end{array}\right]=\left[\begin{array}{l}
\underset{\sim}{J} \\
\underset{\sim}{E}
\end{array}\right]
$$

where $\underset{\sim}{V}$ is the set of node-to-datum voltages and $I$ is the set of branch currents which are chosen as additional circuit variables. ${\underset{\sim}{R}}^{Y}$ is a reduced form of the nodal matrix excluding the contributions due to voltage sources, current controlling elements, etc. B contains partial derivatives of the Kirchhoff current equations with respect to the additional current variables and thus contains $\pm 1$ 's for the elements whose branch relations
are introduced. The branch constitution relations, differentiated with respect to the unknown vector are represented by the matrices $\mathbb{C}$ and $\underset{\sim}{D} . \mathbb{N}$ and $\underset{\sim}{E}$ are the excitations.

As mentioned above, when sparse matrix techniques with diagonal pivoting are used for solving Eq. (2.1), zero diagonal elements may be encountered. Previously, two methods have been proposed for avoiding pivoting on these zero diagonal elements. However, there are still two problems with these previous methods: (1) zero diagonal elements may be generated during the Gaussian elimination process, and (2) the methods may not be the most efficient. In this section we consider the zero diagonal problem, and in Section 2.4 we discuss the efficiency problem.

Method 1 orders the rows and columns with zero diagonal entries last, in the hope that they will be filled before becoming candidates for pivoting. Even if all the zero diagonal elements which exist in the network matrix at the formulation stage are filled during the elimination stage, cutsets of branches whose currents are declared as network variables in a modified nodal formulation will generate zero diagonal elements during the Gaussian elimination process regardless of the values of the circuit elements. This problem is proved and illustrated by Theorem 2.1, Example 2.1, and Example 2.2.

Theorem 2.1. For any network which has cutsets of branches whose currents are declared as circuit variables in a modified nodal formulation, if these current variables are ordered last, then zero diagonal elements will be generated during the Gaussian elimination process, regardless of circuit element values.

Proof: Since we assume that all the current variables are ordered last and they form outsets, therefore floating subnetworks are created. The admittance matrices of these floating subnetworks are singular, therefore the ${\underset{\sim}{R}}^{R}$ in Eq. (2.1) is singular, so zero diagonal elements will be generated during the Gaussian elimination of Eq. (2.1).

The following Example 2.1 illustrates Theorem 2.1.

Example 2.1: A cutset of current variables (Fig. 2.1)

If the method which orders all the current variables last is used to formulate the modified nodal equations of the circuit shown in Fig. 2.1, the resulting equations will be as follows:
$\left[\begin{array}{ccccc}G_{1} & - & 0 & 1 & 0 \\ -G_{1} & G_{1} & 0 & 0 & -1 \\ 0 & 0 & G_{2} & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & -1 & 1 & 0 & 0\end{array}\right]\left[\begin{array}{l}V_{1} \\ V_{2} \\ V_{3} \\ I_{E} \\ I_{L}\end{array}\right]=\left[\begin{array}{l}0 \\ 0 \\ 0 \\ E \\ 0\end{array}\right]$

During the course of Gaussian elimination due to the resulting floating subnetwork, a zero diagonal element will be produced at location $(2,2)$.

Remark: For any subnetwork which has cutsets of branches whose currents are declared as circuit variables in a modified nodal formulation, if the rows corresponding to current variables which have zero-diagonal elements


Fig. 2.1 Circuit used in Example 2.1.
are ordered last until a diagonal entry is filled, before it is considered as a pivot, then zero diagonal elements may be generated during the Gaussian elimination process, regardless of circuit element values.

The proof of this remark is the same as that of Theorem 2.1. In the following, Example 2.2 illustrates this remark.

Example 2.2: A cutset of current variables (Fig. 2.2)

If the reordering strategy mentioned in the previous remark is used to formulate the equations of the circuit shown in Fig. 2.2, the matrix formulated is:
$\left[\begin{array}{cccc}G_{2} & 0 & 0 & 0 \\ 0 & G_{1} & -G_{1} & 1 \\ 0 & -G_{1} & G_{1} & 0 \\ 0 & 1 & 0 & 0\end{array}\right]\left[\begin{array}{l}V_{3} \\ V_{1} \\ V_{2} \\ I_{E}\end{array}\right]=\left[\begin{array}{l}0 \\ 0 \\ 0 \\ E\end{array}\right]$

During the course of Gaussian elimination due to the resulting floating subnetwork, a zero diagonal element will be generated at location $(3,3)$.

Method 2 interchanges rows in order to obtain nonzero diagonal elements. Even if all the zero diagonal elements which exist in the network matrix at the formulation stage are avoided before the elimination, if there are loops of branches whose currents are declared as network variables in the modified nodal formulation, then zero diagonal elements


Fig. 2.2 Circuit used in Example 2.2.
may be generated during the Gaussian elimination process regardless of the values of the circuit elements. This problem is proved and illustrated by Theorem 2.2 and Example 2.3.

Let us define the branch whose current is declared as a current variable in the modified nodal formulation as current branch. Let us define the 'positive' node as follows: Assuming that the datum node can not be chosen as 'positive' and that the datum node is not contained in any loop formed by current branches, then we can always choose one of the two nodes of a current branch as 'positive' for that current branch and there is a one-to-one correspondence between these 'positive' nodes and the current branches. An algorithm for choosing 'positive' nodes is given in Section 2.2.

Theorem 2.2. For any network with a loop of branches whose currents are declared as network variables in a modified nodal formulation, and the reference node is not contained in the loop and there is no coupling among the voltages of the branches in the loop, then if all the rows corresponding to the current variables are interchanged with the corresponding 'positive' node voltage rows, zero diagonal elements will be generated during the Gaussian elimination process, regardless of circuit element values.

Proof: Let us assume that after the rows corresponding to the current variables are interchanged with the corresponding 'positive' node voltage rows, the rows corresponding to the current variables are ordered first, then the MNA matrix equation (2.1) is transformed into

The submatrix being eliminated first is the node-to-branch incidence matrix for the 'positive' nodes and the current variable branches [28], that is , the ${\underset{\sim}{1}}_{1}$ in Eq. (2.2). Since we assume that the reference node is not contained in the loop and there is no coupling among the voltages of the branches of the loop, then there is a one-to-one correspondence between each branch of the loop and the corresponding 'positive' node and each column in ${\underset{\sim}{-}}^{\text {contains exactly } a+1}$ and $a-1$, therefore, $\underset{\sim}{B}$ is singular and zero diagonal elements will be generated during the Gaussian elimination.

The following Example 2.3 illustrates Theorem 2.2.

Example 2.3: A loop of current variables (Fig. 2.3)

The circuit equations formulated by method 1 for the circuit shown in Fig. 2.3 in a transient analysis using a backward Euler Formula with timestep $n$ have the following form:


Transient Analysis
50-5736

Fig. 2.3 Circuit used in Example 2.3.


The submatrix ${\underset{\sim}{1}}^{1}$ is singular, therefore during the Gaussian elimination a zero diagonal element will be generated at location (4,4).

### 2.2. New Partitioning and Ordering Strategy

From the previous section, we conciude that the topological reasons for zero diagonal elements being generated in the modified nodal approach are: (1) cutsets of current variables and (2) loops of current variables. Here we present a new partitioning and ordering strategy which has the following good features:
(1) zero diagonal elements are avoided before the Gaussian elimination and during the Gaussian elimination in essentially all practical circuits;
(2) it is efficient and the number of fills is less than that of previous
methods;
(3) it is easy to implement and the partitioning and ordering are done in the preprocessing phase, so it is well suited for the use of sparse matrix techniques.

Consider a linear (or linearized) circuit which contains independent current and voltage sources, two terminal resistors, capacitors, inductors and all types of controlled sources. We assume that the circuit contains neither loops of only (independent and dependent) voltage sources and inductors nor cutsets of only (independent and dependent) current sources and capacitors.


#### Abstract

In the modified nodal approach, the circuit variables consist of node-to-datum voltage $\underset{\sim}{V}$ n together with a subset of branch currents $I_{b}$. (Henceforth those branches are referred to as current branches.) In the proposed ordering strategy, the node voltages $V_{n}$ are partitioned into two subsets, ${\underset{V}{1}}$ and ${\underset{\sim}{V}}_{2}$, and $I_{b}$ is partitioned into three subsets, $I_{1}, I_{2}$ and I3. The components of $I_{1}$ consist of the currents in the (dependent and independent) voltage sources, and are in turn partitioned as follows:


IV Ebranch currents of the independent voltage sources.
$I_{V C V} \equiv b r a n c h$ currents of the voltage-controlled voltage sources.
$I_{C C V} \equiv b r a n c h$ currents of the current-controlled voltage sources.

The components of $I_{\sim}$ and $I_{2}$ consist of the remaining currents which are circuit variables.

Let a graph $G_{I}$ (possibly disconnected) be first constructed to include all the current branches, with all the other branches removed. If GI contains loops, then a tree (or forest) is chosen, with only finite-valued resistors as links. This is always possible since by assumption no loops of only voltage sources, inductors and zero-valued resistors exist in the circuit. Let $I_{3}$ be the set of currents in the links of $\bar{Y} I$, then these links can not form cutsets [28]. The components of $I_{2}$ consist of currents in the inductors and the remaining currents of the current resistors.

The components of ${\underset{\sim}{V}}_{1}$ consist of the following:
$V_{V} \quad$ Eset of 'positive' node voltages of the independent voltage sources.
$\underline{V}_{V C y}$ Eset of 'positive' node voltages of the voltage-controlled voltage sources.
${ }_{\sim}^{V} C C V$ Eset of 'positive' node voltages of the current-controlled voltage sources.
${\underset{\sim}{b C}}$ Eset of 'positive' node voltages of the the $I_{2}$ branches.

The components of ${\underset{\sim}{V}}_{2}$ consist of the remaining node voltages.

The following algorithm is followed in selecting the 'positive' node voltages defined above:

## Algorithm

(1) The ungrounded nodes of all grounded current branches belonging to $I_{1}$ or $I_{2}$ are chosen first as 'positive';
(2) Let $b_{j}$ be the number of branches whose currents belong to $I_{\sim}$ or $I_{2}$ and which are incident at node $j$. Whenever a node of a current branch is chosen as 'positive', the number $\underset{\sim}{b}$ at $i t s$ 'negative' node $k$ is reduced by one.
(3) If the $b_{k}$ value of node $k$ of a current branch is one and that node has not been previously selected as 'positive', then node $k$ is selected 'positive' for that particular current branch. If more than one node have their $b_{k}$ value equal to one and if some of these nodes do not is' 3 a conductance (i.e., a resistance whose current is not a circuit varizble) connected to them, then one of these nodes is chosen 'positive' first. Otherwise, any one of the nodes that has its $b_{k}$ value equal to one is chosen 'positive'.

Step (2) and (3) are repeated until all the branches corresponding to $I_{\sim}$ and $I_{2}$ have been processed. Note that up to this point there is always at least one node whose $b_{k}$ value is one. This is because $I_{-1}$ and $I_{2}$ do not form loops. Note also that the number of positive nodes is equal to the number of elements in $I_{\sim}$ and $I_{2}$. The polarities of the currents in the current branches are associated with the positive node assignments.

Partitioning and ordering the arcuit variables in the order of $I_{1}$, $I_{2}, \quad V_{1}, \quad V_{2}$, and $I_{3}$, and writing the modified nodal equations in the usual way $[25]$, we get the following equation structure:


Where the $A_{i}$ 's contain the partial derivatives of the Kirchhof: current equations with respect to the circuit current variables, $I_{1}, I_{2}$, $I_{3}$, and thus contain $0,+1,-1$ only.

By interchanging the rows corresponding to $V_{1}$ with the rows corresponding to $I_{1}$ and $I_{2}$, Eq. (2.3) can be written in the following form (This interchange is equivalent to off-diagonal pivoting and is done in practice by a simple change in the pointer system rather than a physical interchange of data in the rows.)

(1) $I_{\sim}^{1}$ and ${\underset{\sim}{2}}_{2}$, (2) $\underset{\sim}{V}$, and (3) the remaining variables. The Markowitz scheme [29] is used to minimize the number of operations within each subgroup. After reordering, Eq. (2.4) can now be solved by Gaussian elimination or LU factorization.

### 2.3 Theorems and Examples

If there are no current-controlled current sources or if the current-controlled current sources are not incident at the 'positive' nodes, then within the first two subgroups all the diagonal elements remain 1's and all the nonzero off-diagonal elements are -1's during the Gaussian elimination process, so the leading part of the elimination can be done simply by addition. The proof is given below in Theorem 2.3. Let us consider the first subgroup, the submatrix associated is the node-to-branch incidence matrix $\underset{\sim}{A}$ for the 'positive' nodes and the currents belong to ${\underset{\sim}{I}}^{1}$ and $I_{2}$. Let us denote the directed graph of those nodes and currents by
$\mathrm{G}_{\mathrm{I}}$. Due to our partitioning and ordering strategy, there are no loops in $G_{I}$, so $A_{a}$ has +1's on the diagonal, 0 's or -1 's on the off-diagonal and $A_{a}$ is square and nonsingular.

Theorem 2.3. For any diagonal pivoting the $L U$ factors of $A$ have the following special properties: all the diagonal elements remain +1 's and all the nonzero off-diagonal elements are -1's.

Proof: Let $A_{a}$ be formulated with current $I_{k}$ chosen as the first pivot where $I_{k}$ flows in branch $b_{k}$, which is connected between node $i$ and node $f$. After row and column interchange the first row and column of $A_{a}$ will have the following form:

where node $j$ is assumed to be in $G_{I}$; otherwise column one would be all zeros below the diagonal. Note that the entry $a_{1 j}=0$ because $G_{I}$ does not have any loops and $a_{1 i} 1=2,3, \ldots, n$ are either zero or -1 . Pivoting on $a_{11}$ amounts simply to adding row 1 to row $j$. Since adding any two rows in the incidence matrix of a directed graph produces a row with $0,-1$ or +1
enties, row $j$ will then contain 0 and -1 's with +1 on the diagonal because $a_{1 j}=0$.

Let the submatrix generated by pivoting on $a_{11}$ be denoted by $\hat{A}_{a}$. $\hat{A}_{a}$ can be considered as the incidence matrix of a directed graph $\hat{G}_{I}$ where $\hat{G}_{I}$ is derived from $G_{I}$ by removing branch $b_{k}$ and merging node 1 with node $j$. Thus $\hat{A}_{\mathbf{a}}$ has the same properties as ${\underset{a}{a}}^{a}$, and pivoting on its first diagonal entry will produce a submatrix with ones on the diagonal and 0 and -1 's elsewhere. This proves the theorem.

The reasoning for the second subgroup is similar to Theorem 2.3.

Now we would like to present the main result.

Main Result. For any network which has a unique solution, if the partitioning and orderinng strategy proposed here is used to solve the modified nodal equations, then no zero diagonal elements will be encountered during the Gaussian elimination process, except for the case when controlled sources or negative-valued elements with some specific set of circuit element values result in perfect cancellation.

Proof: There are two kinds of zero diagonal elements which may be encountered. One type is due to the formulation method [25] and occurs in the network matrix before the elimination process starts. These zero diagonal elements are avoided by interchanging the rows corresponding to the 'positive' node voltages with the rows corresponding to $I_{I}$ and $I_{2^{2}}$. During the elimination, topologically, the zero diagonal elements are caused either by ordering loops of current variables first or by a floating subnetwork which results by ordering a cutset of current variables last.

Both of these situations are prevented by partitioning $I_{3}$ away from $I_{2}$ and ordering $I_{1}$ and $I_{2}$ first, so no loops can be formed by ${\underset{\sim}{1}}^{1}$ and $I_{\sim}$. Since $I_{\sim}$ consists of currents in the links, so $I_{3}$ will not form outsets, and thus no floating subnetworks will result.

Alternatively, this theorem can be proved as follows: Since all the currents are ordered first and eliminated first, from Theorem 2.3, we know that the elimination of these currents will not generate zero diagonal elements. After all these currents are eliminated, if ${\underset{\sim}{3}}_{3}$ is empty, we are left with nodal matrix equations, then no zero diagonal elements will be generated; if $I_{3}$ is not empty, since $I_{3}$ can not form loops or cutsets, so no zero diagonal elements will be generated.

A more rigorous and general proof can be found in [30].

In the following we would like to use the new ordering scheme to solve those examples used in Section 2.1.

Example 2.1:

If our approach is used, initially, the matrix formulated is:

After interchanging rows, the resulting matrix is:

$$
\left[\begin{array}{ccccc}
0 & 0 & v_{1} & 0 & -G_{1} \\
0 & 1 & j & 0 & G_{2} \\
0 & 0 & 1 & 2 & 0 \\
0 & 0 & 0 & 1 & -1 \\
0 & -i & -G_{1} & 0 & G_{1}
\end{array}\right]\left[\begin{array}{c}
c_{2} \\
L_{2} \\
v_{1} \\
v_{3} \\
v_{2}
\end{array}\right]=\left[\begin{array}{l}
0 \\
0 \\
\vdots \\
0 \\
0
\end{array}\right]
$$

No zero diagonal elements will be encountered during the course of Gaussian elimination.

Example 2.2:

If our approach is used, initially, the matrix formulated is:

$$
\left[\begin{array}{cccc}
0 & 1 & 0 & 0 \\
1 & G_{1} & -G_{1} & 0 \\
0 & -G_{1} & G_{1} & 0 \\
0 & 0 & 0 & G_{2}
\end{array}\right]\left[\begin{array}{l}
I_{E} \\
V_{1} \\
v_{2} \\
V_{3}
\end{array}\right]=\left[\begin{array}{l}
E \\
0 \\
0 \\
0
\end{array}\right]
$$

After interchanging rows, the resulting matrix is:

$$
\left[\begin{array}{cccccc}
1 & \vdots & - & 0 & - & -5 \\
0 & \vdots & 0 & 0 & v_{1} & = \\
0 & -B_{1} & u_{1} & 0 & v_{2} & 0 \\
0 & 0 & 0 & \sigma_{2} & v_{3} & 0
\end{array}\right.
$$

No zero diagonal elements will be encountered during the course of Gaussian elimination.

Example 2.3:

If our approach is used, initially, the matrix formulated is:

After interchanging rows, the resulting matrix is:

No zero diagonal elements will be encountered during the course of Gaussian elimination.

These examples show that our approach indeed can avoid zero diagonal elements before the elimination and during the elimination. However, as mentioned before, if there are controlled sources or negative valued elements with specific set of element values, zero diagonal elements may be produced due to perfect cancellation.

### 2.4. Results

The implementation of this new algorithm into the DEC-10 version of SPICE2 has resulted in YSPICE. In YSPICE, the 'positive' nodes are first determined by the algorithm presented in Section 2.2. The network matrix is constructed using the element stamps as in [31]. The sparse matrix reordering is carried out using the Markowitz criterion [29,32] with diagonal pivoting. The row interchange is done by one extra set of pointers.

Examples which caused computational problems in the original version of SPICE2 due to pivoting on zero diagonal elements were successfully analyzed using YSPICE. Furthermore, the results we obtained show that in many cases the number of fills produced by our ordering strategy is far lower than that produced by previous methods, resulting in less computational cost, and at the same time, more accurate solutions.

Here a small selection of the examples analyzed by YSPICE is presented and the results are compared with those obtained by SPICE2.

Example 2.4: The two circuits shown in Figs. $2.4(\mathrm{a})$ and (b) were analyzed using SPICE2 and YSPICE. The CPU times required by the equation solving subroutines in both programs for both circuits are given in Table 2. 1.

Table 2.1 Simulation Data.

| Circuit | CPU time <br> for the equation <br> solving subroutine | number of <br> variables | number of <br> operations <br> per iteration |
| :--- | :---: | :---: | :---: |
| 2.4 (a) YSPICE | 0.9090 sec. | 7 | 16 |
| SPICE2 | 1.9740 sec. | 7 | 71 |
| 2.4 (b)YSPICE <br> SPICE2 | 0.031 sec. | 10 | 30 |



Fig. 2.4 Example Circuits.

The difference in the number of operations between YSPICE and SPICE2 in Table 2.1 can be explained as follows: In SPICE2, the matrix formulated by the modified nodal approach for the circuit in Fig. 2.4(a) is as shown in $\mathrm{Fig} 2.5(\mathrm{a})$. It can be seen that although the number of off-diagonal elements of the rows and columns corresponding to $I_{1}, I_{2}$, and $I_{4}$ is small, they are not chosen as pivots until their corresponding zero diagonal entries are filled. The delay causes the number of fills to increase greatly. In YSPICE, the matrix formulated for the oircuit in Fig. 2.4(a) is as shown in Fig. $2.5(b)$. It can be seen that the number of fills is now zero due to the off-diagonal pivoting, and consequently, the number of operations is reduced.

Example 2.5: The circuit shown in Fig. 2.5 was also analyzed using both SPICE2 and YSPICE. The results of the de analysis are shown in Table 2.2.

Table 2.2 Simulation Data.

| Node | 2 | 3 | 4 | 5 | 6 |
| :--- | :---: | :---: | :---: | :---: | :---: |
| YSPICE <br> node voltages | 2.000 V | 4.000 V | 4.000 V | 0.000 V | 0.000 V |
| SPICE2 <br> node voltages | 2.000 V | 2.324 V | 2.324 V | -1.676 V | 1675.9999 V |

$$
\begin{aligned}
& {\left[\begin{array}{lllllll}
3 & 1 & i_{1} & 2 & i_{2} & 4 & i_{4} \\
x & x & 0 & x & 0 & x & 0 \\
x & x & 1 & x & 0 & x & 0 \\
0 & 1 & \otimes & \otimes & 0 & \otimes & 0 \\
x & x & \otimes & x & 1 & x & 0 \\
0 & 0 & 0 & 1 & \otimes & \otimes & 0 \\
x & x & \otimes & x & \otimes & x & 1 \\
0 & 0 & 0 & 0 & 0 & 1 & \otimes
\end{array}\right]} \\
& \\
& {\left[\begin{array}{cccccccc}
i_{1} & i_{2} & i_{4} & 1 & 2 & 4 & 3 \\
1 & 0 & 0 & x & x & x & x \\
0 & 1 & 0 & x & x & x & x \\
0 & 0 & 1 & x & x & x & x \\
0 & 0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & x & x & x & x
\end{array}\right]}
\end{aligned}
$$

Fig. 2.5(a) Structure of the Network Matrix for the Circuit in Fig. 2.4 (a) formulated by SPICE2.
(b) Structure of the Network Matrix for the Circuit in Fig. 2.4 (a) formulated by YSPICE.


Fig. 2.6 Circuit used in Example 2.5.


#### Abstract

Our approach gave correct results for this circuit wile SPICE2 gave inaccurate results. These inaccuracies can be explained as follows: In SPICE2, if a diagonal element becomes too small, then it is replaced by $1.0 \times 10^{-12}$. In this circuit this approach is equivalent to connecting a $1.0 \times 10^{12} \Omega$ resistor from node 4 to ground. In this circuit the diode is reverse biased, the equivalent resistance used in SPICE2 for this diode is $0.721 \times 10^{12} \Omega$, as a result the computed $I_{I}$ in SPICE2 is $2.324 \times 10^{-12} \mathrm{~A}$, instead of the correct value, which should be 0.0 A . This inaccuracy in computing $I_{1}$ makes $V_{5}=-1.6760 \mathrm{~V}$ and $V_{6}=1675.9999 \mathrm{~V}$ instead of 0.0 V .


### 2.5 Discussion

In this chapter we have presented an ordering stategy to be followed when the modified nodal approach is used. When this new strategy is used, the possibility of selecting zero diagonal pivots is reduced. The new strategy eliminates the need for having to continuously check the pivot and to replace it by a nonzero value in case a zero is generated, as is done in some existing strategies, which is both time consuming and inaccurate.

In addition, if the currents through the voltage sources are not needed, our ordering scheme provides a convenient way of reducing computation by performing the backward substitution step only partially to obtain the required variables.

Although by performing off-diagonal pivoting, the circuit matrix loses its symmetry and increases the complexity of the program, however, this is not a serious drawback. In fact, in many of the examples which we have analyzed, we have observed that by using off-diagonal pivoting, the number of fills is much less than that produced by other methods.

In a computer-aided circuit simulation program, if a circuit contains nonlinear elements, then a nonlinear solution method is required so solve the nonlinear algebraic equations in both dc analysis and transient analysis of the circuit. There are many nonlinear solution methods available, but the one most widely used is the Newton-Raphson method. This method has the desirable property that its rate of convergence is quadratic in the neighborhood of the solution.

Although The Newton-Raphson method has excelient local convergence proparties, it has problems [1,33j when the initial guess is not close $=0$ the solution, such as numerical overflow, slow convergence, or no convergence. Several modified Newton-Raphson methods have been proposed $=0$ try to resolve the above problems, and the performance of the basia Newton-Raphson method has been improved to some extent. Here a new nethod - the piecewise nonlinear approach - is presented, and examples are given which show even further improvement. This method evolved from the piecewise linear method and previous modified Newton-Raphson methods, so it has the advantages of both methods. However, this new method is still at the experimental stage, no definite conclusion about it has been obtained.

This chapter begins with the introduction of the Newton-Raphson method. In Section 3.2, problems with the Newton-Raphson method are illustrated. In Section 3.3 the piecewise nonlinear approach is presented. In Section 3.4, a new modified Newton-Raphson method for bipolar devices is detailed and the piecewise nonlinear approach for bipolar devices including the avalanche effect is given in Section 3.5. In Section 3.6, the
piecewise nonlinear approach for the MOSFET is described. In Section 3.7, a discussion of the piecewise nonlinear approach is given.
3.1. The Newton-Raphson Algorithm

Let the set of nonlinear equations be

$$
\begin{equation*}
\underset{\sim}{F}(\underline{X})=\underset{\sim}{0} \tag{3.1}
\end{equation*}
$$

If $X_{k}$ is the solution at the kth iteration, from Taylor series expansion, we have

$$
\begin{equation*}
\underset{\sim}{F}(\underset{\sim}{X})=\underset{\sim}{F}\left({\underset{\sim}{x}}^{x}\right)+\underset{\sim}{J}(\underset{\sim}{X})(\underset{\sim}{X}-\underset{\sim}{X})+\text { higher order terms } \tag{3.2}
\end{equation*}
$$

Eq. (3.2) is used to obtain a solution to Eq. (3.1) under the assumption that the higher order terms are negligible. Thus, we write

$$
\begin{equation*}
\underset{\sim}{F}({\underset{X}{k}})+\underset{\sim}{J}({\underset{\sim}{k}})\left({\underset{\sim}{x}}_{k+1}-{\underset{\sim}{x}}_{k}\right)=\underset{\sim}{0} \tag{3.3}
\end{equation*}
$$

Solving Eq. (3.3) for $X_{k+1}$ we obtain

$$
\begin{equation*}
\left.{\underset{-k+1}{X}}^{X_{-k}}-\left[\underset{-}{X}\left(X_{k}\right)\right]^{-1} \underset{F}{\left(X_{k}\right.}\right) \tag{3.4}
\end{equation*}
$$

Eq. (3.4) is called the Newton-Raphson iteration algorithm.

### 3.2. Problems with the Newton-Raphson Algorithm

The problems of numerical overflow and slow convergence can be illustrated by a simple diode circuit shown in Fig. 3.1. The branch constraint for the typical semiconductor diode has the form $i=I_{s}\left(e^{40 v}-1\right)$ Given an initial estimate $V_{0}$ to the solution for this circuit as shown in Fig. 3.1, it is not uncommon for the solution $V_{1}$ to the next


Fig. 3.1 Overflow and Slow Convergence Problem with the Simple Diode Circuit.

Newton-Raphson iterate to be in the neighborhood of $V_{D D}$ as shown in $F i g$. 3.1. If the exponent in the diode equation is too large, overflow may occur. Even if overflow does not occur, convergence will be extremely slow because of the very large slope of the diode characteristic in this region. One modified Newton-Raphson algorithm which has proved successful in avoiding the above problems was proposed by Colon [33]. In this algorithm iteration on current is employed if $V_{k+1}$ exceeds a reference junction voltage $V_{\text {REF }}$, this is illustrated in Fig. 3.2. This algorithm is used in the SPICE2 program.

Another problem with the Newton-Raphson algorithm is the lack of convergence. This is illustrated in Fig. 3.3. The iterate solutions will oscillate between $V_{0}$ and $V_{1}$ and never converge to the solution $V^{*}$.

### 3.3. Piecewise Nonlinear Approach

This is a new approach which has the advantages of the piecewise linear approach and the modified Newton-Raphson methods. However, this method is still at the experimental stage, the proof of global convergence or conditions for global convergence has not been obtained. We restrict our discussion to two terminal elements. In this approach, first, a set of breakpoints is chosen and the device characteristic is partitioned into several nonlinear pieces. The partition must satisfy the following constraints:
(1) each piece must be monotonic and the first derivative must be monotonic too;


Fig. 3.2 Comparison of Current Iteration and Voltage Iteration.


Fig. 3.3 Example of a Tunnel Diode Circuit.
(2) the piece must be chosen to be suitable for the current/voltage iteration to avoid numerical overflow and to hasten convergence;
(3) the number of pieces must be kept as small as possible to avoid the possibility of slow convergence.

After the partitioning, the following algorithm is used to perform the iteration:
(1) choose an initial guess $V_{0}$;
(2) linearize the circuit by Newton-Raphson method and find the iterate solution $V_{k+1}(k=0)$;
(3) if $V_{k+1}$ is within the original piece, then use the modified Newton-Raphson method to choose $V_{k+1}$, and continue the iteration; otherwise, if the next breakpoint in the direction of change has not been chosen before, choose $V_{k+1}$ equal to it; otherwise go into the adjacent piece, if the derivative is not continuous at this breakpoint, then choose this breakpoint as $V_{k+1}$ again but use the new derivative; otherwise, choose the other breakpoint as $V_{k+1}$ and continue the iteration.

This approach is illustrated in the graphical solution that is given in Fig. 3.4. Here the tunnel diode characteristic is partitioned into four pieces. The initial guess is $V_{o}$ located in piece $I$. The solution of the linearized circuit is $\hat{V}_{1}$ which is not in piece $I$, so $V_{1}$ is chosen to be equal to breakpoint 1 . The solution of the new linearized circuit is $\hat{v}_{2}$ which is still not in piece I. Enter piece II and choose breakpoint 2 as $V_{2}$. The iterate solution $\dot{V}_{3}$ is not in piece $I I$, go into piece III and choose breakpoint 3 as $V_{3}$, the iterate solution $\dot{V}_{4}$ is not in piece III.


Fig. 3.4 Example of the Piecewise Nonlinear Approach.

Enter piece $I V$ and choose $\dot{V}_{4}$ as $V_{4}$, this time, the solution $\hat{V}_{5}$ is in piece IV. Continue the iteration by modified Newton-Raphson method until the convergence is obtained.
3.4. A New Modified Newt on-Raphson Method for Bipolar Devices

In the piecewise nonlinear approach, the diode characteristic is partitioned into three pieces as shown in Fig. 3.5. In region III, in order to avoid numerical overflow and to compensate for large higher order terms, the modified Newt on-Raphson method must be used $[1,33,34]$.

Consider the simple diode circuit shown in Fig. 3.6. The nodal equation is

$$
\begin{equation*}
F(V)=\frac{V-V_{D D}}{R}+I_{s}\left(e^{V / V_{t}}-1\right)=0 \tag{3.5}
\end{equation*}
$$

By a Taylor series expansion we obtain

$$
\begin{aligned}
F\left(V_{k+1}\right)= & F\left(V_{k}\right)+F^{\prime}\left(V_{k}\right)\left(V_{k+1}-V_{k}\right)+\frac{F^{\prime \prime}\left(V_{k}\right)}{2}\left(V_{k+1}-V_{k}\right)^{2} \\
& + \text { higher order terms }
\end{aligned}
$$

where $F^{\prime}\left(V_{k}\right)\left(V_{k+1}-V_{k}\right)=\left(\frac{1}{R}+\frac{I_{s}}{V_{t}} e^{V_{k} / V_{t}}\right)\left(V_{k+1}-V_{k}\right) \quad$ and

$$
\frac{F^{\prime \prime}\left(V_{k}\right)}{2}\left(V_{k+1}-V_{k}\right)^{2}=\frac{I_{s}}{2 * V_{t}^{2}} e^{V_{k} / V_{t}}\left(V_{k+1}-V_{k}\right)^{2}
$$

If we assume that $R$ is sufficiently large, then the ratio of the third term to the second term in Eq. (3.6) is

$$
\begin{equation*}
\frac{\left(V_{k+1}-V_{k}\right)}{2 * V_{t}} \tag{3.7}
\end{equation*}
$$



Fig. 3.5 Diode Static I-V Characteristic.

(b)

Fig. 3.6(a) Simple Diode Circuit.
(b) Newton-Raphson Iteration Solutions for (a).

If ( $V_{k+1}-V_{k}$ ) is not small compared to $2 V_{t}$, then the assumption that the higher order terms in Eg. (3.2) are small and can be neglected is not true, so the correction term $\Delta V_{k}=V_{k+1}-V_{k}$ obtained by the Newton-Raphson method may not be good.

Let $\Delta V_{k}^{\prime}$ be the modified correction term such that

$$
\begin{equation*}
F\left(V_{k}+\Delta V_{k}^{\prime}\right)=\frac{V_{k}+\Delta V_{k}^{\prime}-V_{D D}}{R}+I_{S}\left(e^{\left(V_{k}+\Delta V_{k}^{\prime}\right) / V_{t}}-1\right)=0 \tag{3.8}
\end{equation*}
$$

Because $\Delta V_{k}$ satisfies

$$
\begin{equation*}
F\left(V_{k}\right)+F^{\prime}\left(V_{k}\right) \Delta V_{k}=0 \tag{3.9}
\end{equation*}
$$

so

$$
\begin{equation*}
F\left(V_{k}\right)+F^{\prime}\left(V_{k}\right) \Delta V_{k}=F\left(V_{k}+\Delta V_{k}^{\prime}\right) \tag{3.10}
\end{equation*}
$$

From Eq. (3.10) we obtain

$$
\begin{equation*}
\frac{\left(\Delta V_{k}^{\prime}-\Delta V_{k}\right)}{R}+I_{s} e^{\left(V_{k+} \Delta V_{k}^{\prime}\right) / V_{t}}=I_{S} e^{V_{k} / V_{t}}\left(1+\frac{\Delta V_{k}}{V_{t}}\right) \tag{3.11}
\end{equation*}
$$

From Eq. (3.11) we obtain
$1+\frac{\Delta V_{k}}{V_{t}}-e^{\Delta V_{k}^{\prime} / V_{t}}=\frac{\Delta V_{k}^{\prime}-\Delta V_{k}}{\frac{R}{I_{S} e^{V_{k} / V_{t}}}}$

If $\left(\Delta V_{k}^{\prime}-\Delta V_{k}\right) / R$ is sufficiently small compared to the exponential term $I_{s} e^{V_{k} / V_{t}}$, then the eq:ation

$$
\begin{equation*}
\Delta V_{k}^{\prime}=V_{t} \ln \left(1+\frac{\Delta V_{k}}{V_{t}}\right) \tag{3.13}
\end{equation*}
$$

yields a good approximation to the true solution.

Now we would like to find out the relationship of Eq. (3.13) to current iteration. For current iteration, after obtaining

$$
\begin{equation*}
\hat{V}_{k+1}=V_{k}+\Delta V_{k} \tag{3.14}
\end{equation*}
$$

we can obtain

$$
\hat{I}_{k+1}=I_{s}\left(e^{V_{k} / V_{t}}-1\right)+\frac{I_{s}}{V_{t}} e^{V_{k} / V_{t}} \Delta V_{k}
$$

$$
\text { If } \hat{I}_{k+1}>-I_{s} \text {, then there is a point } V_{k+1}=V_{k}+\Delta \hat{V}_{k} \text { on the diode }
$$ charateristic whose current is $\hat{I}_{k+1}$.

$$
\begin{equation*}
I_{s}\left(e^{\left(V_{k}+\hat{V}_{k}\right) / V_{t}}-1\right)=I_{s}\left(e^{V_{k} / V_{t}}-1\right)+\frac{I_{s}}{V_{t}} e^{V_{k} / V_{t}} \Delta V_{k} \tag{3.16}
\end{equation*}
$$

We obtain

$$
\begin{equation*}
\Delta \hat{v}_{k}=v_{t} \ln \left(1+\frac{\Delta V_{k}}{V_{t}}\right) \tag{3.17}
\end{equation*}
$$

We can see that Eq. (3.17) is identical to Eq. (3.13), also we can see that the condition for Eq. (3.16) to have a solution is

$$
\begin{equation*}
1+\frac{\Delta V_{k}}{V_{t}}>0 \tag{3.18}
\end{equation*}
$$

Since if $\Delta V_{k} \geq 0$, then Eq. (3.18) is satisfied, so we only need to consider the situation when $\Delta \mathrm{V}_{\mathrm{k}}<0$. This condition can be explained graphically in Figs. 3.7(a), (b) and (c). From Eq. (3.15) we see that $-I_{s} \leqslant \hat{I}_{k+1} \leqslant I_{i}$. for $-V_{c} \leqslant \Delta V_{k} \leqslant 0$. Thus, if the current intercept of the load line with the linearized diode curve lies in this range, then $\left|\Delta V_{k}\right|$ cannot exceed $V_{t}$ and convergence can be quite slow if voltage iteration is




Fig. 3.7 Three Cases with the Simple Diode Circuit.
used.

For Example in Fig. $3.7(a)$ we see that $\hat{I}_{k+1}>-I_{s}$ and so

$$
-V_{t}<\Delta V_{k} \leq 0
$$

In Fig. $3.7(b) \hat{I}_{k+1}=-I_{s}$ and so

$$
\Delta V_{k}=-V_{t}
$$

In Fig. $3.7(c) \hat{\mathrm{I}}_{k+1}<-\mathrm{I}_{s}$ and so

$$
\Delta V_{k}<-V_{t}
$$

In cases (a) and (b) $\left|\wedge v_{k}\right| \leqslant v_{t}$, therefore $\#$ of iterations $\geqslant\left|\frac{v^{*}-v_{k}}{v_{t}}\right| \quad$ if
voltage iteration is used.

Let us consider the conditions for cases (a) and (b) to be true. From Eq. (3.9), we obtain

$$
\begin{align*}
\frac{-\Delta V_{k}}{V_{t}} & =\frac{\left(V_{k}-V_{D D}\right) / R+I_{s}\left(e^{\left.V_{k} / V_{t}-1\right)}\right.}{1 / R+\frac{I_{s}}{V_{t}} e^{V_{k} / V_{t}}}  \tag{3.19}\\
& =\frac{1 / R+\frac{I_{s}}{V_{t}} e^{V_{k} / V_{t}}+\frac{V_{k}-V_{D D}-V_{t}-R * I_{s}}{R * V_{t}}}{1 / R+\frac{I_{s}}{V_{t}} e^{V_{k} / V_{t}}} \tag{3.20}
\end{align*}
$$

$$
\begin{equation*}
\frac{v^{*}-V_{D D}}{R}+I_{s}\left(e^{v^{*} / V_{t}}-1\right)=0 \tag{3.21}
\end{equation*}
$$

so from Eqs. (3.19), (3.20) and (3.21), we obtain the following conclusions:
(1) If $V_{k} \geq V^{*}$ then $\Delta V_{k} \leq 0$.
(2) If $V_{k} \geq V^{*}$ and $R \geq \frac{V_{k}-V_{D D}-V_{t}}{I_{s}}$, then $-V_{t} \leq \Delta V_{k} \leq 0$.
(3) If $R$ and $V_{k}$ are chosen such that $V_{k} \xrightarrow{ } V^{*}$ and $R \geq \frac{V_{k}-V_{D D}-V_{t}}{I_{s}}$, and voltage iteration is used in the forward region, then the number of iteration is lowerbounded by $\left|\frac{v^{*}-V_{k}}{v_{t}}\right|$.
For example, if $V_{k}-V^{*}=1.0 \mathrm{~V}$ and $V_{t}=0.025 \mathrm{~V}$, then $\left|\frac{v^{*}-v_{k}}{v_{t}}\right|=40$.

The above conclusions show why current iteration must be used in the forward region.

Now we would like to examine under what conditions current iteration should be used and if there is a $V_{\text {REF }}$ (such as the $V_{R E F}$ used in SPICE2) to determine whether current iteration or voltage iteration should be used.

Let us consider the simple diode circuit in Fig. 3.8. $V_{k}, V^{*}$ and $\hat{V}_{k}$ satisfy Eq. (3.23)
$I_{s}\left(e^{V^{*} / V_{t}}-e^{\hat{V}_{k} / V_{t}}\right)=\frac{V_{k}-V^{*}}{R}$
Let us consider the limiting case when $V^{*}-V_{k} \ll V_{t}$, then Eq.
b- can be rewritten as:

$$
\begin{equation*}
\frac{R^{*} I_{s}}{V_{t}} e^{V^{*} / V_{t}}\left(V^{*}-\hat{V}_{k}\right)=V_{k}-V^{*} \tag{3.24}
\end{equation*}
$$



Fig. 3.8 Comparison of Current Iteration to Voltage Iteration.
so if $\frac{R * I}{V_{t}} e^{V^{*} / V_{t}} \gg 1$, then current iteration is preferred; else if $\frac{R^{*} I_{s}}{V_{t}} e^{*} / V_{t} \ll 1$, then voltage iteration is preferred. These two conditions are illustrated in Figs. 3.9(a) and (b).

From Eq. (3.24) we can conclude that $V_{\text {REF }}$ must satisfy

$$
\begin{equation*}
\frac{R * I_{S}}{V_{t}} e^{V_{R E F} / V_{t}}=1 \tag{3.25}
\end{equation*}
$$

Since the value of $R$ is not a constant, there is no universal $V_{R E F}$. Experiments of the simple diode circuit with different values of $R$ and $V_{D D}$ were done to test the above conclusion, and the data are given in Figs. 3.10(a), (b), (c), and (d). These data confirm Eq. (3.25). In Fig 3.10(a), $\frac{R * I_{s}}{V_{t}} e^{*} / V_{t}$ is always much larger than one, this explains why current iteration is always better than voltage iteration; in Fig. $3.10(b)$, when $V_{D D}$ is less than $0.7 V, \frac{R * I_{s}}{V_{t}} e^{*} / V_{t}$ is less than one, voltage iteration is better than current iteration; in Fig. 3.10(c), when $V_{D D}$ is less than $0.5 V, \frac{R * I_{s}}{V_{t}} e^{*} / V_{t}$ is less than one, so voltage iteration is better than current iteration; in Fig. $3.10(d)$, because $\frac{\Delta V_{k}}{V_{t}}$ may be less than -1 , strict current iteration in region III can not be done. Whenever $\frac{\Delta V_{k}}{V_{t}}$ is less than -1 , the next guess is reset to zero, and current iteration is resumed. for this approach, current iteration is always better than voltage iteration.

In the conventional current/voltage iteration approach, such as the one used in SPICE2, there is a universal $V_{\text {REF }}$, if $V_{k+1}$ exceeds $V_{\text {REF }}$, then current iteration is used; otherwise, voltage iteration is used. In SPICE2, this $V_{R E F}$ is set to the point of minimum radius of curvature:

(a)

(b)

FR. 7020
Fig. 3.9(a) Situation When Current Iteration Is Better Than Voltage Iteration.
(b) Situation When Voltage Iteration Is Better Than Current Iteration.

1


Fig. 3.10(a) Comparison of Iteration Methods for the Simple Diode Circuit.
(b)


Fig. 3.10(b) Comparison of Iteration Methods for the Simple Diode Circuit.
(c)


Fig. 3.10(c) Comparison of Iteration Methods for the Simple Diode Circuit.
(d)


Fig. 3.10(d) Comparison of Iteration Methods for the Simple Diode Circuit.

$$
\begin{equation*}
V_{R E F}=V_{t} \ln \left(\frac{V_{t}}{I_{s} \sqrt{2}}\right) \tag{3.26}
\end{equation*}
$$

From the above analysis, we can see that this $V_{R E F}$ does not provide any guarantee of fast convergence. The simple diode circuit was used again to test the approach used in SPICE2, $V_{D D}=5 \mathrm{~V}, \mathrm{R}=1000 \mathrm{~K}$, the number of iterations used by SPICE2 is 12 , while the number of iterations for strict current iteration is only 3.

There is another problem associated with the conventional current/voltage iteration used in SPICE2. This problem is illustrated in Fig. 3.11. Let us assume that the initial guess is $V_{0}$ and the first iterate solution is $V_{1}$. Now we do not know which load lines we are encountering, because both load lines will give us $V_{1}$. If it is load line 1, then voltage iteration should be used; if it is load line 2 , then voltage iteration is too slow. In SPICE2, because $V_{1}$ is less than $V_{R E F}$, voltage iteration is used for both cases. Experimental results show that for $V_{D D}=-5 V$ and $R=1000 \mathrm{~K}$ the number of iterations used by SPICE2 is 12. This problem can be solved by using the piecewise nonlinear approach. Whenever this situation occurs, then the next guess is changed to zero. If it is load line 1, the next iterate solution is in the first quadrant and voltage iteration is used to obtain the solution. If it is load line 2 , the next iterate solution is in the third quadrant. The number of Iterations used by recognizing that the load line 2 is being used and changing the next guess to zero is 3 .

Also let us examine Eq. (3.13) again. When $\frac{\Delta V_{k}}{V_{t}}$ is positive and much smaller than 1 , then $\Delta V_{k}^{\prime} \approx \Delta V_{k}$. If the difference between $\Delta V_{k}^{\prime}$ and $\Delta V_{k}$ is small compared to the iteration error tolerance, then there is no need to


Fig. 3.11 Another Problem with the Colon Method.
do the transformation. Let us assume the error tolerance is $10^{-6} \mathrm{~V}$, then when $\frac{\Delta V_{k}}{V_{t}}$ is less than 0.01 , there is no need to do the transformation.

The result of all the above analysis is a new iteration scheme. The flowchart for this new approach is shown in Fig. 3.12(a), the experimental data for the simple diode circuit are also given in Figs. 3.10(a), (b), (c), and (d). These data show that this algorithm works well for resistor load diode circuits. However, when transistor circuits are solved, the load line generated by linearization changes during the iterations and the algorithm goes into limit cycle for some circuits. If the piecewise nonlinear method presented in Section 3.3 (it corresponds to a Katzenelson's type algorithm [40] for the piecewise linear approach) is used, then probably the limit cycle problem will not occur. But the piecewise nonlinear method only allows one diode to change regions at 2 given iteration, so the convergence rate is slow; also the plecewise nonlinear method requires a linear search to accomplish the task that only one diode changes regions. So instead of using a strict piecewise nonlinear approach, the algorithm in Fig. 3.12(a) was modified to eliminate the limit cycle problem. The flow chart for the modified algorithm is given in Fig 3.12(b).

Three test circuits were used to test this new iteration scheme. These three circuits are given in Fig. 3.13(a), (b) and (c), and the data are given in Table 3.1. These test results show that the new iteration scheme is superior to the Colon method used in SPICE2.

Fig. 3.12(a) Flowchart for the New Iteration Scheme.
(a)


Fig. 3.12 (b) Modified Flowchart for the New Iteration scheme.
(b)


Fig. 3.13 Example Circuits (a) One Transistor Amplifier.
(b) TTL NAND Gate.
(c) Differential Amplifier.


Table 3.1 Comparison of the Results between the New Approach and SPICE2.

| Circuit | Number of iterations <br> (New approach) | Number of iterations <br> (SPICE2) |
| :--- | :---: | :---: |
| Fig. 3.13(a) | 3 | 6 |
| Fig. $3.13(\mathrm{~b})$ | 7 | 17 |
| Fig. $3.13(\mathrm{c})$ | 6 | 7 |

### 3.5. Piecewise Nonlinear Approach for Bipolar Devices Including Avalanche Effects

Although the avalanche charateristic of diode should consist of two separate exponential functions [35], in order to simplify the analysis, here the avalanche characteristic is chosen to consist of only one exponential function. The diode $I-V$ static characteristic used here is shown in Fig. 3.5.

In the forward biased region the equation for the diode current is

$$
\begin{equation*}
I_{d}=I_{s}\left(e^{V_{d} / V_{t}}-1\right) \tag{3.27}
\end{equation*}
$$

The reverse-biased current before breakdown is

$$
\begin{equation*}
I_{d}=G_{d} V_{d} \tag{3.28}
\end{equation*}
$$

The avalanche current is

$$
\begin{equation*}
I_{d}=-e^{A\left(V_{B}-B * V_{d}\right)} \tag{3.29}
\end{equation*}
$$

The constants $A$ and $B$ are determined from the $I-V$ characteristic curve, where $V_{B}$ is the breakdown voltage and $V_{d}$ is the junction voltage.

If, in order to hasten the convergence, strict current iteration is used in pieces $I$ and III, then divergence may be encountered as shown in Fig. 3.5. Therefore, the piecewise nonlinear approach for bipolar devices with avalanche modeling is as follows:
(1) choose $V_{0}$ equal to $V_{R E F}$ and piece III;
(2) find iterate solution $V_{k+1}$ by the new modified Newt on method. If $V_{k+1}$ is within piece III, repeat this step.
(3) otherwise go into piece II, choose $V_{k+1}=0$ and use the new derivative, if $V_{k+2}$ is within piece II, then solution is fo:
(4) otherwise, go into piece $I$, choose $V_{k+1}=V_{B}$, and use the new modified Newton method.

Remark: Only one nonlinear device is allowed to change its region at one iteration, otherwise, limit cycle problems may occur.

### 3.6. Piecewise Nonlinear Approach for MOSFET

Let us consider the simple resistor load MOS inverter which are shown in Fig. 3.14(a). The nodal equation is

$$
\begin{equation*}
F(V)=\frac{V-V_{D D}}{R}+\beta\left[\left(V_{I N}-V_{T}\right) V-\frac{v^{2}}{2}\right] \tag{3.30}
\end{equation*}
$$

By Taylor series expansion we obtain

$$
\begin{aligned}
F\left(V_{k+1}\right)= & F\left(V_{k}\right)+F^{\prime}\left(V_{k}\right)\left(V_{k+1}-V_{k}\right)+\frac{F^{\prime \prime}\left(V_{k}\right)}{2}\left(V_{k+1}-V_{k}\right)^{2} \\
& + \text { higher order terms }
\end{aligned}
$$

where $F^{\prime}\left(V_{k}\right)\left(V_{k+1}-V_{k}\right)=\left[\frac{1}{R}+8\left(V_{I N}-V_{T}-V_{k}\right)\right]\left(V_{k+1}-V_{k}\right)$ and

$$
\frac{F^{\prime \prime}\left(V_{k}\right)}{2}\left(V_{k+1}-V_{k}\right)^{2}=\frac{-\beta}{2}\left(V_{k+1}-V_{k}\right)^{2}
$$

If we assume $R$ is sufficiently large, then

$$
\begin{equation*}
\frac{\text { the third term in Eq. }(3.31)}{\text { the second term in Eq. }(3.31)}=\frac{-\left(V_{k+1}-V_{k}\right)}{2\left(V_{I N}-V_{T}-V_{k}\right)} \tag{3.32}
\end{equation*}
$$

If $\left(V_{k+1}-V_{k}\right)$ is not small compared to $2\left(V_{I N}-V_{T}-V_{k}\right)$, then the assumption that higher order terms in Eq. (3.12) are small and can be

(a)

(b)

(c)

Fig. 3.14(a) Resistor Load MOS Inverter Circuit.
(b) Saturated Load MOS Inverter Circuit.
(c) Depletion Load MOS Inverter Circuit.
neglected is not true, so the correction term $\Delta V_{k}=V_{k+1}-V_{k}$ obtained by the Newton-Raphson method is not good. This may result in very slow convergence. Fig 3.15(a) illustrates this problem. If the initial guess is $V_{0}$, then the first iterate solution is $V_{1}$, which is far away from the exact solution $V^{*}$. Because the derivative is large when $V_{k}$ is negative, so it requires a large number of iteration to converge to $\mathrm{V}^{\star}$. Fig. 3.15(b) and (c) illustrate the slow convergence problem with a saturated load MOS inverter circuit and a depletion load MOS inverter circuit as shown in Fig. 3.14(b) and (c) respectively.

The above slow convergence problem can be resolved by using the piecewise nonlinear approach. For example, for the resistor load MOS inverter circuit, first, the MOSFET characteristic is partitioned into two pieces as shown in Fig. 3.16, the first piece is from $-\infty$ to zero, the second piece is from zero to $+\infty$, then the circuit can be solved as illustrated in Fig. 3.16. The initial guess $V_{0}$ is in piece $I I$, the first iterate solution $\hat{V}_{1}$ is in piece $I$, so $V_{1}$ is chosen to be the breakpoint zero. Since $\hat{V}_{2}$ is within piece II, the Newton-Raphson method is used to find the solution $V^{*}$.

Remark: In the iteration scheme for MOS circuits, the piecewise nonlinear approach is used for $V_{D S}$. The change of $V_{G S}$ and $V_{G D}$ are limited by iV at each iteration.

### 3.7. Discussion

Two large circuits were used to test the piecewise nonlinear approach, one is a bipolar circuit as shown in Fig. 3.17, the other is a MOS circuit as shown in Fig.3.18. The data are given in Table 3.2. The results show that this method improves the convergence property of the basic


Fig. 3.15 The Slow Convergence Problem with the Circuits in Fig. 3.14




Fig. 3.17 Binary-to-Octal Decoder.

$$
\stackrel{\Phi}{\square}
$$

Table 3.2 Comparison of the Results between the Piecewise Nonlinear Approach (PWNL) and SPICE2.

| Circuit | Number of iterations <br> (PWNL) | Number of iterations <br> (SPICE2) |
| :--- | :---: | :---: |
| Fig. 3.17 | 34 | 48 |
| Fig. 3.18 | 10 | 28 |

Newton-Raphson method; however, the proof of the global convergence property or the conditions for global convergence have not yet been derived. More research work on this topic is needed.

## IV. NUMERICAL INTEGRATION

A numerical integration method is required to determine the transient response of a circuit. In order to make the numerical integration more accurate and efficient, some method of dynamically varying the timestep is needed, this is usually accomplished by a local truncation error (LTE) timestep control.

Let us denote the upperbound on the local truncation error by ET. In previous work, ET was established as follows [36]. First, a maximum allowable global truncation error $G_{\max }$ and the solution interval $T$ are specified. An assumption that this global error is distributed uniformly Within $T$ is made, then the maximum allowable ET per timestep ( $h$ ) is given by

$$
\begin{equation*}
E T=\frac{\operatorname{GE}_{\max }}{T} * \mathrm{~h} \tag{4.1}
\end{equation*}
$$

The LTE timestep control with trapezoidal integration is implemented as follows. First, the timestep $h$ and $t_{n+1}=t_{n}+h_{n}$ are determined, the solution at the timepoint $t_{n+1}$ is found, then the local truncation error (LTE) is evaluated by Eq. (4.2).
$L T E=\frac{h^{3}}{12} * \dddot{x}(T) \approx \frac{h^{3}}{2}$ DD3
where $D D 3$ is the 3 rd divided difference [1] and $t_{n} \leq \tau \leq t_{n+1}$. The kth divided difference is defined by the recursive relation

$$
\begin{equation*}
D D k=\frac{D D_{k-1}\left(t_{n+1}\right)-D D_{k-1}\left(t_{n}\right)}{\sum_{i=1}^{k} h_{n+1-i}}, \quad D D L=\frac{x\left(t_{n+1}\right)-x\left(t_{n}\right)}{h_{n}} \tag{4.3}
\end{equation*}
$$

If LTE > ET, then the timestep is considered too large, $h_{n}$ is rejected, a new $h_{g}$ is computed using

$$
\begin{equation*}
h_{\mathrm{n}}=\sqrt[2]{\frac{2 * G E_{\max }}{\mathrm{T} * \mathrm{DD} 3}} \tag{4.4}
\end{equation*}
$$

and then a new timepoint $t_{n+1}$ is determined. If, on the other hand, LTE < ET, then the local truncation error at timepoint $t_{n+1}$ is considered satisfactory, and the timestep $h_{n+1}$ is computed using

$$
\begin{equation*}
n_{n+1}=\sqrt[2]{\frac{2 * G E_{\max }}{T * D D 3}} \tag{4.5}
\end{equation*}
$$

### 4.1. Problems with Previous Work

When the above strategy is applied to determine the transient response of a circuit, there are two problems:
(1) Since $D D 3$ is only an approximation of $\ddot{\mathbf{x}}(\tau)$, whenever the timestep is changed or the input signal changes abruptly, our investigation shows that DD3 becomes an inaccurate estimate of LTE. This inaccuracy results in the following unwanted situations. One situation is that at the timepoint $t_{n+1}$, if LTE < ET, the timestep $h_{n+1}$ is increased, but at the next timepoint $t_{n+2}$, due to the inaccuracy, LTE is now found to be larger than ET, so this timepoint is rejected and the timestep is reduced. The other situation is even worse. If, at the timepoint $t_{n}$, the input changes abruptly, then due to the inaccuracy of the DD3 approximation to $\dddot{x}(T)$, LIE may be greater than ET and the timestep is reduced. Sometimes this happens repeatedly until the timestep becomes too small and the program terminates. These two situations are explained in detail later.
(2) For digital circuits, the total solution time $T$ may consist of several switching intervals. If a stable numerical integration method is used, initially in an interval the local truncation error accumulates and the global truncation error (GE) increases, but as the solution nears
steady state in a given switch interval the global error decreases, and as the solution approaches the steady state the global error goes to zero, so that the upperbound ET given by Eq. (4.1) is too conservative. This is illustrated in Fig. 4.1.

Now we will consider the above two problems in more detail. The first situation of the first problem can be illustrated by a simple RC circuit as shown in Fig. 4.2. In order to simplify the analysis, the backward Euler method is used. The exact solution for this circuit is

$$
\begin{equation*}
v(t)=5 e^{-t / \tau} \tag{4.6}
\end{equation*}
$$

the solution obtained by the backward Euler method is

$$
\begin{equation*}
v_{n+1}=\frac{v_{n}}{1+h_{n / \tau}} \tag{4.7}
\end{equation*}
$$

where $v_{0}=5 \mathrm{~V}$.

The local truncation error estimates at timepoints $t_{n+1}$ and $t_{n}$ are

$$
\begin{align*}
& \operatorname{LTE}_{n+1}=h_{n}^{2} * D D 2_{n+1}  \tag{4.8}\\
& \operatorname{LTE}_{n}=h_{n-1}^{2} * D D 2_{n} \tag{4.9}
\end{align*}
$$

where $D D 2_{n+1}=\frac{v_{n+1}-v_{n}}{h_{n}}-\frac{v_{n}-v_{n-1}}{h_{n-1}}+h_{n-1} \quad$

$$
D D 2_{n}=\frac{v_{n}-v_{n-1}}{h_{n-1}}-\frac{v_{n-1}-v_{n-2}}{h_{n-2}} h_{n-1}+h_{n-2} \quad
$$



Fig. 4.1 Numerical Integration Solution v.s. Exact Solution.

(a)

(b)

FP-7007

Fig. 4.2(a) Simple RC Circuit.
(b) Waveform of the Simple RC Circuit.

Let us consider the situation when $h_{n-2}=h_{n-1}=h$ and $h_{n}=a h$, where a is a ratio constant. From Eqs. (4.7), (4.8) and (4.9), we obtain

$$
\begin{equation*}
\frac{\operatorname{LTE}_{n+1}}{\operatorname{LTE}_{n}}=\frac{D D 2{ }_{n+1}}{D D 2_{n}} \frac{h_{n}^{2}}{h_{n-1}^{2}}=\frac{2 a}{a+1} * a^{2} *\left[\frac{1}{1+\frac{a h}{\tau}}\right] \tag{4.10}
\end{equation*}
$$

If both LTE ${ }_{n+1}$ and $\operatorname{LTE}_{\mathrm{n}}$ are good approximations of the true local truncation errors, then from Eq. (4.6), (4.7) and the definition of local truncation error [36] the ratio of LTE ${ }_{n+1}$ over LTE ${ }_{n}$ should be

$$
\begin{equation*}
\frac{\operatorname{LTE}_{n+1}}{\operatorname{LTE}_{n}} \approx a^{2} \star\left[\frac{1}{1+\frac{a h}{\tau}}\right] \tag{4.11}
\end{equation*}
$$

Comparison of Eq. (4.11) with Eq. (4.10) shows that the ratio computed by Eq. (4.10) is wrong by a factor of $\frac{2 a}{a+1}$. When $a=1$, that is, the timestep is constant, then the estimation by Eq. (4.8) is good. When a is different from 1, then the estimation by Eq. (4.8) is not good. Table 4.1 gives the simulation results of the simple $R C$ circuit, which confirms the above conclusion. The ET used is $10^{-3} \mathrm{~V}$. At the first three timepoiats, the timesteps are kept constant, so the estimation by Eq. (4.8) is good. At the fourth timepoint the timestep is increased by a factor of two. The true local truncation error is $0.8930 \mathrm{E}-3$, which is an acceptable error; but the estimation by Eq. (4.8) is $0.1207 \mathrm{E}-2$, which is larger than ET, so the timepoint is rejected.

Now we would like to see if this inaccuracy can be explained by the above conclusion. The ratio of the estimation at the fourth tiaepoint over the estimation at the third timepoint is

Table 4.1 Simulation Results of the Simple RC Circuit.

| t(ns) | h(timestep) | LTE (estimate) | True local <br> truncation error |
| :--- | :--- | :--- | :--- | :--- |
| 1.5 | 0.25 | $0.2355 \mathrm{E}-3$ | $0.2339 \mathrm{E}-3$ |
| 1.75 | 0.25 | $0.2332 \mathrm{E}-3$ | $0.2314 \mathrm{E}-3$ |
| 2.00 | 0.25 | $0.2309 \mathrm{E}-3$ | $0.2293 \mathrm{E}-3$ |
| 2.50 | 0.50 | $0.1207 \mathrm{E}-2$ | $0.8930 \mathrm{E}-3$ |

$$
\begin{equation*}
\frac{0.1207 \mathrm{E}-2}{0.2309 \mathrm{E}-3}=5.227 \tag{4.12}
\end{equation*}
$$

instead of 4 as predicted by Eq. (4.11). However, note that for $a=2$

$$
\begin{equation*}
\frac{2 a}{a+1}=\frac{4}{3}=1.333 \approx \frac{5.227}{4} \tag{4.13}
\end{equation*}
$$

Eq. (4.13) shows that the estimation of local truncation error is really wrong by a factor of $\frac{2 a}{a+1}$ as shown in Eq. (4.10).

Now let us consider the second situation of the first problem. This situation can be illustrated by a simple RC circuit as shown in Fig. 4.3. Again the backward Euler method is used here for the simplicity of the analysis. The exact solution for this circuit is

$$
v(t)= \begin{cases}5 e^{-t / \tau} & t \leq t_{n}  \tag{4.14}\\ v_{n} e^{-t / \tau}+5\left(1-e^{-\left(t-t_{n}\right) / \tau}\right) & t>t_{n}\end{cases}
$$

$$
\begin{align*}
& \text { the solution obtained by backward Euler method is } \\
& v_{k+1}=\left\{\begin{array}{ll}
\frac{v_{k}}{1+h_{k / \tau}} & k<n \\
\frac{v_{k}}{1+h_{k / \tau}}+\frac{5 \frac{h_{k}}{\tau}}{1+h_{k / \tau}} & k \geq n
\end{array} .\right. \tag{4.15}
\end{align*}
$$

where $v_{0}=5 \mathrm{~V}$.

In the early version of SPICE2, when $t_{n}$ exceeded a source breakpoint, then $h_{n-1}$ was reduced such that the value $t_{n}$ coincides with the breakpoint. The timestep was reduced to a small value and then the iteration was

(a)

(b)

(c)

Fig. 4.3(a) Simple RC Circuit.
(b) Wave form of $v_{i n}$.
(c) Wave form of $v$.
continued. Let us consider the situation when $h_{n-2}=h, h_{n-1}=a h$ and $h_{n}=b h$, where $a$ and $b$ are ratio constants. From Eqs. (4.8) and (4.15), we obtain the following estimate of the local truncation error:

$$
\begin{equation*}
\operatorname{LTE}_{n+1}=b^{2} h^{2}\left[\frac{5}{\tau(a+b) h}+\frac{b\left(v_{n+1}-5\right)}{\tau^{2}(a+b)}\right] \tag{4.16}
\end{equation*}
$$

Let us also assume that bh is not small enough, so that

$$
\begin{equation*}
\mathrm{LTE}_{\mathrm{n}+1}>\mathrm{ET} \tag{4.17}
\end{equation*}
$$

Then the timestep was reduced and a new timestep ch was computed by

$$
\begin{equation*}
c^{2} h^{2}\left[\frac{5}{T(a+b) h}+\frac{b\left(v_{n+1}-5\right)}{T^{2}(a+b)}\right]=E T \tag{4.18}
\end{equation*}
$$

where $c<b$.

The local truncation error for this new timestep ch is estimated by

$$
\begin{equation*}
\operatorname{LTE}_{n+1}^{\prime}=c^{2} h^{2}\left[\frac{5}{\tau(a+c) h}+\frac{c\left(v_{n+1}-5\right)}{\tau^{2}(a+c)}\right] \tag{4.19}
\end{equation*}
$$

It follows that

$$
\begin{equation*}
\frac{\text { LTE }_{n+1}^{\prime}}{E T}=\frac{a+b}{a+c} \frac{\left[1+\frac{\operatorname{ch}\left(v_{n+1}-5\right)}{5 T}\right]}{\left[1+\frac{b h\left(v_{n+1}-5\right)}{5 \tau}\right]} \tag{4.20}
\end{equation*}
$$

Since $c<b$ then $(a+c)<(a+b)$ and for $a$ small enough a the ratio in Eq. (4.20) could be greater than one. If this is the case, then the step
size will be reduced again. This could happen again and again. This was the case in early version of the SPICE2 program, and frequently after abrupt clock or signal changes the program would not converge and the job would be terminated prematurely.

Remark: In Eq. (4.16), the second term is an approximation to the true local truncation error, the first term, although it is dominant, is a parasitic term which is generated by the use of voltages at the timepoints of the previous switching interval.

The second problem is detailed as follows. Let $G_{\max }$ denote the maximum global truncation error at $t_{n}$ (Fig. 4.1). Assume that the trapezoidal method is used and that we are dealing with an exponentially decaying waveform. The local truncation error at the timepoint $t_{n+1}$ is given by

$$
\begin{equation*}
\operatorname{LTE}_{n+1}=\frac{h_{n}^{3}}{12} \dddot{x}\left(t^{\prime}\right) \quad t_{n} \leq t^{\prime} \leq t_{n+1} \tag{4.21}
\end{equation*}
$$

and for this example

$$
\begin{equation*}
\operatorname{LTE}_{\mathrm{n}+1} \approx \frac{h_{\mathrm{n}}^{3}}{12 \tau^{3}} V_{\mathrm{n}, \max } \tag{4.22}
\end{equation*}
$$

From Eq. (4.22) and the definition of local truncation error, we obtain

$$
\begin{equation*}
G E_{n+1} \approx G E_{\max } e^{-h_{n} / \tau}+\frac{h_{n}^{3}}{12 \tau^{3}} V_{n, \max } \leq G E_{\max } \tag{4.23}
\end{equation*}
$$

where $G E_{n+1}$ is the global truncation error at the timepoint $t_{n+1}$. Eq. (4.23) can be reduced to

$$
\begin{equation*}
\frac{h_{n}^{3}}{12 \tau^{3}} V_{n, \max } \leq G_{\max }\left(\frac{h_{n}}{T}\right) \tag{4.24}
\end{equation*}
$$

The local truncation error timestep control requires that

$$
\begin{equation*}
L I E_{n+1}=\frac{h_{n}^{3}}{12 \tau^{3}} V_{n, \max } \leq E T \tag{4.25}
\end{equation*}
$$

Assuming equality in Eq. (4.25) and eliminating $h_{n} / \tau$ in Eq. (4.24), we obtain the following upper bound on the local truncation error.

$$
\begin{equation*}
E T \leq \sqrt{\frac{12\left(G E_{\max }\right)^{3}}{V_{\mathrm{n}, \max }}} \tag{4.26}
\end{equation*}
$$

In order to check the above bound which was derived for $R C$ circuits, a number of digital circuits were simulated and the following empirical bound on the local truncation error was Cetermined.

$$
\begin{equation*}
E T=10 \sqrt{\frac{\left(G E_{\text {max }}\right)^{3}}{V_{\mathrm{DD}}}} \tag{4.27}
\end{equation*}
$$

where $V_{D D}$ is the voltage swing which is the supply voltage in this example. Eq. (4.27) also holds for exponentially rising waveforms. Given $G E_{\text {max }}$ and $V_{D D}$, then ET can be determined by Eq. (4.27), and then Eq. (4.2) can be used to control the timestep.

Eq. (4.26) shows that ET is proportional to (GEmax $)^{3 / 2}$ for RC circuits if the trapezoidal method is used. In general, for RC circuits, if a stable numerical integration method of order $\cap$ is used, similar derivation as used above can show that

$$
\begin{equation*}
\operatorname{ET} \propto\left(G_{\max }\right)^{(n+1) / n} \tag{4.28}
\end{equation*}
$$

Eq. (4.28) was verified experimentally for the backward Euler method and the trapezoidal method. The RC circuit as shown in Fig. 4.2 was used. The simulation results are given in Figs. 4.4 and 4.5 .



MICROCOPY RESOLUTION TEST CHART National burent of s:andafes : ac..a


Fig. 4.4 Simulation Data of the Backward Euler Method Which show that ET $\propto\left(\mathrm{GE}_{\max }\right)^{2}$.


Fig. 4.5 Simulation Data of the Trapezoidal Method Which Show that ET $\propto\left(\mathrm{GE}_{\max }\right)^{3 / 2}$.

### 4.2. Algorithm

The following algorithm for variable timestep control was developed based on the discussion in the previous section. The trapezoidal method is assumed. ET is computed by Eq. (4.27).

First let us derive an expression for the estimation of the local truncation error which is used in the algorithm. Let us assume that $h_{n-2}=h, h_{n-1}=$ ah and $h_{n}=b h$. The solution obcained by the trapezoidal method for an exponentially decaying waveform is

$$
\begin{equation*}
v_{n+1}=v_{n} \frac{1-h_{n} / 2 \tau}{1+h_{n} / 2 \tau} \tag{4.29}
\end{equation*}
$$

the 3 rd divided difference is given by

$$
\begin{gather*}
{ }^{D D} 3_{n+1}=\frac{D D 2_{n+1}-D D 2_{n}}{h_{n-2}+h_{n-1}+h_{n}} \\
=\frac{3(1+b)}{2(1+a+b)} * \frac{-v_{n}}{6 \tau^{3}\left(1-\frac{h}{2 \tau}\right)\left(1-\frac{a h}{2 \tau}\right)\left(1+\frac{b h}{2 \tau}\right)} \tag{4.30}
\end{gather*}
$$

where $\frac{3(1+b)}{2(1+a+b)}$ is the error factor in the estimate of the third derivative caused by varying the timestep.

By taking into account the effect of different timesteps, the expression of local truncation error is given by

$$
\begin{equation*}
\text { LTE }_{n+1}=\frac{h_{n}^{3}}{2} * \text { DD }_{n+1} * \frac{2(1+a+b)}{3(1+b)} \tag{4.31}
\end{equation*}
$$

or we can define a new quantity DD $3^{\prime}$ which is given by

$$
\begin{equation*}
\text { DD3 }{ }_{n+1}^{\prime}=\frac{D D 2_{n+1}-D D 2_{n}}{\left(h_{n-2}+h_{n}\right)} \tag{4.32}
\end{equation*}
$$

then Eq. (4.31) can be reduced to
$\operatorname{LTE}_{n+1}=\frac{h_{n}^{3}}{3} * \operatorname{DD}^{\prime}{ }_{n+1}^{\prime}$

## Algorithm:

(1) Record the initial time $t_{0}$, final time $t_{f}$, minimum stepsize $h_{m i n}$, maximum stepsize $h_{\text {max }}$, and source breakpoints.
(2) Set the initial timestep $h=h_{\text {min }}$.
(3) Compute $X_{1}$ at $t_{1}=t_{0}+h, X_{2}$ at $t_{2}=t_{0}+2 h$, and $x_{3}$ at $t_{3}=t_{0}+3 h$.
(4) Set $n=3$ and compute LTE by Eq. (4.33).
(5) Compute $h_{n}=n \sqrt[3]{\frac{E T}{L T E}}$. If $h_{n}<0.6 h$, then $h=h_{n}$ and go to (3); ntherwise, continue.
(6) Compute $t_{n+1}=t_{n}+h_{n}$. If $t_{n+1}$ does not exceed a source breakpoint, then go to (7). If $t_{n+1}$ exceeds a source breakpoint, then $h_{n}$ is reduced such that the value $t_{n+1}$ coincides with the breakpoint. Compute $X_{n+1}$ for this breakpoint. Compute LTE by Eq. (4.33), compute $n_{n+1}$, if $h_{n+1}<0.6 n_{n}$, then $h_{n}=n_{n+1}$ and go to (6); otherwise, set $h=h_{\text {min }}$ and $t_{0}=t_{n+1}$, then go to (3).
(7) Compute $X_{n+1}$. Compute LTE by Eq. (4.33), compute $n_{n+1}$, if $n_{n+1}<0.6 n_{n}$, then $h_{n}=h_{n+1}$ and go to (6); otherwise, continue.
(8) If $t_{n+1}>t_{f}$, then stop; if not, then $n=n+1$, and go to (6).

Remark: The above algorithm has been derived for a fixed order variable stepsize method which uses the trapezoidal rule. Our simulation results show that the problems we mentioned before in this chapter are resolved by this algorithm. If other fixed order methods are to be used, then the corresponding equations should be modified.
V. TEARING METHODS AND SPARSITY CONSIDERATIONS FOR NODE TEARING METHOD

There are two kinds of tearing methods - the branch tearing method and the node tearing method. The idea of branch tearing was first introduced by Kron [12]. Recently Chua and Chen [16] have shown that the branch tearing is just a special case of generalized hybrid analysis. The main idea of branch tearing is to select a set of tearing branches first, then the given network is torn apart into several subnetworks by removing these tearing branches (Fig. 5.1), analyzing each subnetwork separately, and obtaining the solution of the entire network by combining the solutions of the subnetworks via the tearing branches. Algebraically, this method is equivalent to a particular ordering of the hybrid analysis equations such that the resulting matrix has a bordered block-diagonal structure (Fig. 5.2). Each block corresponds to a subnetwork, and the border corresponds to the interconnections of the subnetworks.

The idea of node tearing was first introduced by Sangiovanni-Vincentelli, Chen and Chua [20]. The main idea is to select a set of tearing nodes first, then the network is torn apart into several subnetworks by removing these tearing nodes (Fig. 5.3), each subnetwork is analyzed separately, and the solution of the entire network is obtained via the tearing nodes. Algebraically, this method is equivalent to a particular ordering of nodal analysis equations such that the resulting matrix has a bordered block-diagonal structure (Fig. 5.4). Each block corresponds to a subnetwork, and the border corresponds to the interconnections of the subnetworks.

-     -         -             -                 -                     -                         -                             -                                 -                                     -                                         -                                             - 

1


Fig. 5.2 Bordered Block-Diagonal Matrix Formulated by Branch Tearing.


Fig. 5.3 Example of a Network Partitioned into Three Subnetworks by the Node Tearing Method.

| $\underset{\sim}{\text { Y }} 11$ |  | 0 | $\stackrel{Y}{\sim}_{\sim}^{\text {L }}$ | $\underset{\sim}{0}$ |
| :---: | :---: | :---: | :---: | :---: |
| 0 | $\stackrel{\mathrm{Y}}{\sim}{ }_{22}$ |  | $\stackrel{Y}{\sim}{ }_{\text {L }}$ 2 |  |
|  |  | ${\underset{\sim}{Y}}^{\text {S }}$ | ${\underset{\sim}{\sim}}_{\sim}{ }^{\text {3 }}$ |  |
| ${ }_{\sim}^{\text {Y }}$ It | $\stackrel{Y}{\sim} 2 t$ | $\underset{\sim}{\underset{\sim}{3}}$ ( | $\underset{\sim}{\text { Y }}$ t | $\stackrel{Y}{\sim} \mathrm{tr}$ |
| 0 |  |  | $\underset{\sim}{Y} \mathrm{r}$ t | $\underset{\sim}{\mathrm{Y}} \mathrm{rr}$ |


| $v^{v}$ |
| :--- |
| $\sim 1$ |
| $\stackrel{v}{\sim} 2$ |
| ${\underset{\sim}{v}}^{2}$ |
| $\underset{\sim}{v}$ |
| $\underset{\sim}{v}$ |

Fig. 5.4 Bordered Block-Diagonal Matrix Formulated by Node Tearing.

Recently, because tearing methods possess several advantages over conventional circuit analysis methods, a lot of effort has been devoted to the study of tearing methods for the analysis of large scale circuits. The advantages of tearing methods are as follows: First, tearing methods are suitable for the exploitation of the repetitiveness of a limited number of subnetworks; secondly, tearing methods are suitable for the exploitation of latency; thirdly, tearing methods are suitable for parsllel processing. In order to solve the network by tearing methods one must specify a partitioning strategy, and also one must specify a technique for solving the partitioned equations. In the literature mostly the branch tearing method has been used to solve large networks [22-24]. The solution strategy has been to estimate the current or voltage at each tearing port and to excite the torn subnetworks with independent sources at these ports. The remaining port responses are computed, and are substituted into the interconnection equations. If these equations are not satisfied, then another estimate is made of the variables chosen as port excitations. This iterative procedure continues until convergence is achieved. If the subnetworks are nonlinear a multilevel iteration scheme is used, such as a Gauss-Seidel [22], Newton-SOR [24] or a multilevel Newton iteration [42]. However, the first two iteration schemes do not have second-order convergence while the third scheme requires the computation of an additional Jacobian matrix.

Because the above approach introduces new variables, such as tearing branch currents, the complexity of the problem is increased. Also, a multilevel iteration scheme is required. Another disadvantage of the branch tearing method is that each subnetwork must contain the datum node for the network, or else a local datum node must be chosen for each subnetwork. In the program SLATE a different approach is used, and the
internal subnetwork variables are eliminated from the tearing node equations. Then the tearing node voltages are computed. Only a one level Newton iteration is required, and the internal variables for each subnetwork can be eliminated using parallel processing aethods. The elimination of the internal variables is equivalent to replacing the subnetworks by Norton equivalent circuits at the tearing nodes.

In our study of tearing methods it was assumed that the user specifies the subnetworks, and any part of the network not specified as a subnetwork is automatically included in the subnetwork called rest of netnork in Figs. 5.1 and 5.3. The subnetworks are processed first in the solution algorithm, and the tearing branches or nodes along with the rest of network equations are processed last. Thus, the branch tearing method and the node tearing method described in Section 5.1 and Section 5.2 are somewhat different from those found in the Iiterature [12-14]. In Section 5.3, a comparison between branch tearing and node tearing is given. In Section 5.4, the derivation of the construction of the node tearing matrix from subnetworks is detailed. The sparsity considerations for the node tearing method are presented in Section 5.5. The implementation of node tearing is described in Section 5.6. The circuit interpretation of the tearing methods is given in Section 5.7. Some conclusions are given in Section 5.8.
5.1. Derivation of the Branch Tearing Merhod

Let $N$ be a connected network having ( $n+1$ ) nodes: the datum node $n_{0}$ and the nondatum nodes $\alpha=\left\{n_{1}, n_{2}, \ldots \ldots, n_{n}\right\}$, and $b$ branches, $\beta=\left\{b_{1}, b_{2}, \ldots \ldots b_{b}\right\}$. Let the branch voltages, branch currents and the node-to-datum voltages be denoted by $\underset{\sim}{e}=\left(e_{1}, e_{2}, \ldots . e_{b}\right)^{T}$,
$\underset{\sim}{i}=\left(i_{1}, i_{2}, \ldots \ldots i_{b}\right)^{T}$ and $\underset{\sim}{v}=\left(v_{1}, v_{2}, \ldots \ldots v_{n}\right)^{T}$, respectively. Let the interconnection be defined as the remaining part of the network when all the subnetworks are removed, that is, the set of tearing branches and the rest of the network in Fig. 5.1. Let us assume that a proper set of tearing branches has been chosen such that there is no mutual coupling either among the torn subnetworks or between the torn subnetworiks and the interconnection, and that all subnetworks contain the common datum node. This latter assumption is made in order to avoid floating subnetworks which result in singular submatrices. The more general case when all subnetnorks do not contain a common datum node is discussed in [17]. Subscripts $s$, $t$ and $r$ are used to denote quantities pertaining to the subnetworks, the tearing branches and the remaining branches, respectively, so the bransin set $\beta$ is partitioned into three subsets $\beta_{s}, \rho_{t}$ and $\rho_{r}$ (Fig. 5.1), and the node set $\alpha$ is partitioned into two subsets $\alpha_{s}$ and $\alpha_{r}$. This yields the following special stractures for the reduced incidence matrix $A$ and the branch sonductance matrix $\underset{\sim}{G}$ of network $N$ :

Let the torn network have $k$ subnetworks $\quad N_{1}, N_{2}, \ldots, \ldots, N_{k}$. Let $\alpha_{s}$ and $\beta_{s}$ be partitioned correspondingly into $k$ subsets $\alpha_{1}, \alpha_{2}, \ldots, \ldots, \alpha_{k}$ and $B_{1}, B_{2}, \ldots \ldots, \beta_{k}$ respectiveiy. With this partitioning, the node-to-brancn incidence matrix $\underset{\sim}{A}$ can be written as

the branch conductance matrix $\underset{\sim}{G}$ can be written as:

The network variables are constrained by the Kirchhoff's current law (KCL), Kirchhoff's voltage law (KVL) and the branch constraint relations (BC) [28].

$$
\begin{align*}
& \text { (KCL) } \underset{\sim}{A_{s}} \underset{s}{i}+\underset{\sim}{A} \underset{\sim}{i} t=\underset{\sim}{0}  \tag{5.5}\\
& {\underset{\sim}{r}}^{A} \underset{\sim}{i}+\underset{\sim}{A} \underset{\sim}{i}=\underset{\sim}{i}  \tag{5.6}\\
& \text { (KVL) } \underset{\sim}{\underset{\sim}{e}}={\underset{\sim}{A_{V}}}_{\mathbf{A}_{s}} \tag{5.7}
\end{align*}
$$

$$
\begin{align*}
& {\underset{\sim}{e}}_{r}={\underset{\sim}{A}}_{\underline{T}}^{\underline{v}} \underset{r}{ }  \tag{5.9}\\
& \text { (BC) } \tag{5.10}
\end{align*}
$$

Substituting Eqs. (5.7), (5.8) and (5.9) into Eqs. (5.10) and (5.12), and then substituting the results into Eqs. (5.5) and (5.6), we obtain

Substituting Eq. (5.8) into Eq. (5.11), we obtain

and $\underset{\sim}{E} t s{\underset{\sim}{t s}}_{\underset{\sim}{e}}-\underset{\sim}{\underset{\tau}{j}}{ }_{t s}$.
Eqs. (5.13), (5.14) and (5.15) can be rewritten in the following form:

Let us now examine the term $A_{A_{g}} A_{s}^{T}$ in Eq. (5.16). Substituting $A_{s}$ of Eq. (5.2) and $\underset{\sim}{G}$ s of Eq. (5.3) into $\underset{\sim}{A} \underset{\sim}{A}{\underset{S}{s}}^{T}$, we obtain


so Eq (5.16) can be rewritten as:
 method, which has the desirable bordered block-diagonal structure and it is a particular ordering of the hybrid analysis equations.

### 5.2. Derivation of the Node Tearing Method

Let the interconnection be defined as the remaining part of the network when all the subnetworks are removed, that is, the set of tearing nodes and the rest of network in Fig. 5.3. Let us assume that a proper set of tearing nodes has been chosen such that no coupling exists either among the torn subnetworks or between the torn subnetworks and the interconnection. The node set $\alpha$ is partitioned into three subsets $\alpha_{s}, \alpha_{t}$ and $\alpha_{r}$, and the branch set $\beta$ is partitioned into two subsets $\beta_{s}$ and $\beta_{r}$ (Fig. 5.3). This yields the following special structures for the reduced incidence matrix $\underset{\sim}{A}$ and the branch conductance matrix $\underset{\sim}{G}$ of the network $N:$

Let the torn subnetwork have $k$ subnetworks $\quad N_{1}, N_{2}, \ldots, \ldots, N_{k}$ Let $\alpha_{s}$ and $\beta_{s}$ be partitioned correspondingly into $k$ subsets $\alpha_{1}, \alpha_{2}, \ldots, \ldots, \alpha_{k}$ and $\beta_{1}, B_{2}, \ldots \ldots, \beta_{k}$ respectively. With this partitioning, the node-branch incidence matrix $\underset{\sim}{A}$ can be written as:


The branch conductance matrix $\underset{\sim}{G}$ can be written as:

The network variables are constrained by the Kirchhoff's current law (KCL), Kirchhoff's voltage law (KVL) and the branch constraint relations (BC).

$$
\begin{align*}
& \text { (KCL) } \underset{\sim}{A} \underset{\sim}{1} s=\underset{\sim}{0} \tag{5.23}
\end{align*}
$$

$$
\begin{align*}
& {\underset{\sim}{x}}^{i} \underset{r}{ }=0 \tag{5.25}
\end{align*}
$$



$$
\underset{\sim}{e} r=\underset{\sim}{A_{V}^{T}} \underset{\sim}{r}+\underset{\sim}{A_{V}^{T}}{ }_{v}
$$

(BC) $\underset{\sim}{i} s={\underset{\sim}{s s}}^{I}+\underset{\sim}{G} \underset{s}{e}-\underset{\sim}{G} e_{s s}$

Substituting Eqs. (5.26) and (5.27) into Eqs. (5.28) and (5.29), and then substituting the results into Eqs. (5.23), (5.24) and (5.25), we obtain



Eqs. (5.30), (5.31) and (5.32) can be rewritten as:
 Eq. (5.21) and $\underset{\sim}{G}$ s of Eq. (5.22) into $\underset{\sim}{A} G_{s} \mathbb{A}_{s}^{T}$, we obtain
 , so Eq. (5.33) can be rewritten as:





and


Eq. (5.35) is the resulting matrix by node tearing method, which has the desirable bordered block-diagonal structure and is a particular ordering of the nodal analysis equations. In the above derivation the modified nodal method could have been used. In this case the vectors $v$ and $\underset{\sim}{v}$ consist of hoth node voltages and currents of branches for which an admittance description presents difficulties. This is actually the formulation used in the program SLATE.
5.3. Comparison of the Branch Tearing Method with the Node Tearing Method As described above, branch tearing is equivalent to a particular ordering of the hybrid equations, node tearing is equivalent to a particular ordering of the nodal equations, so branch tearing requires the use of tearing branch currents as extra variables. As a result of the
above property, node tearing possesses the following advantages over branch tearing:
(1) the dimension of the matrix formulated by node tearing is smaller than that formulated by branch tearing;
(2) the number of nonzero entries in the matrix formulated by node tearing is smaller than that formulated by branch tearing [20];
(3) for passive networks, node tearing generates a diagonally-dominant matrix, so any application of the Gaussian elimination method with diagonal pivoting is stable, while this is not the the case in the branch tearing method;
(4) usually, in the analysis of large scale circuits, node tearing preserves the identities of the resulting torn subnetworks, while branch tearing sometimes destroys the identities of the torn subnetworks. However, one can generate examples in which the opposite is true, but these situations were not encountered in our examples.

The above conclusions are not conclusive; although the dimension of the matrix formulated by branch tearing is larger than that of node tearing, the extra nonzero entries are either +1 or -1 . If this property is fully explofted, then node tearing may not be so advantageous. But full exploitation of the property that extra entries are either +1 or -1 requires a much more complicated sparse matrix technique. So node tearing is preferred and is used in the program SLATE.

### 5.4. Constructing the Node Tearing Matrix from Subnetworks

In Section 5.2, the derivation of node tearing is given; however, in the real implementation we do not want to solve the whole matrix equation at one time. We would like to process each subnetwork separately, and then obtain the solution of the entire network by combining the results of subnetwork process.

In the following the procedure of constructing the node tearing matrix from subnetworks is detailed. Let us consider one subnetwork $N_{i}$. Let the tearing nodes which are connected to $N_{i}$ be denoted by $\alpha_{t i}$, the node voltage of $\alpha_{t i}$ be denoted by $\underset{\sim}{ }{\underset{t i}{ }}$, the nodes of $N_{i}$ be denoted by $\alpha_{i}$, the node voltages of $\alpha_{i}$ be denoted by $\underset{\sim}{v}{ }_{S i}$, and the currents which represents the relationship of the rest of the network with this subnetwork be denoted by


$$
\begin{equation*}
{\underset{\sim}{v}}_{v}^{\Xi} \bigcup_{i}{\underset{\sim}{v}}_{v i} \tag{5.36}
\end{equation*}
$$

$$
\begin{equation*}
\sum_{i}{\underset{\sim}{L}}=0 \quad \text { (by KCL) } \tag{5.37}
\end{equation*}
$$



Fig. 5.5 One Subnetwork.

The node equations for this subnetwork have the following form:

By augmenting with appropriate zeros to match the dimension and summing Eq. (5.38) for all the subnetworks and the interconnection, we obtain

We can see that Eq. (5.39) is identical to Eq. (5.35).

Eq. (5.39) is solved by first eliminating all the $\mathcal{Y}_{t s i}$ to obtain the interconnection matrix equations.

$$
\left[\begin{array}{ll}
{\underset{\sim}{Y}}^{*} & \underset{\sim}{Y}  \tag{5.40}\\
\underset{\sim r}{Y} \\
\underset{\sim r}{ } & \underset{\sim}{Y} r r
\end{array}\right)\left[\begin{array}{c}
\underset{\sim}{v} \\
\underset{\sim}{v}
\end{array}\right]=\binom{{\underset{\sim}{v}}_{*}^{*}}{\underset{\sim r s}{J}}
$$




Eq. (5.40) is solved to obtain solutions for ${\underset{\sim}{t}}$ and ${\underset{\sim}{\Sigma}}$, and then the solution $\underset{\sim}{v}$ si can be obtained by using backward substitution.

The above solution procedure can be modified to enable us to process each subnetwork separately to obtain the interconnection matrix equations. Let us consider Eq. (5.38) again, after eliminating $\underset{\sim}{Y}$ ti, we have


$$
\left.{\underset{\sim}{Y} t i}_{*}^{*}={\underset{\sim}{Y} t i i}-{\underset{\sim}{\mathcal{Y}}}_{t s i}{\underset{\sim}{(Y}}_{s i}\right)^{-\mathcal{Y}_{s t i}}
$$

and

The partial interconnection matrix equations obtained from Eq. (5.41) are

$$
\left[{\underset{\sim}{Y}}_{t t i}^{*}\right]\left[{\underset{\sim}{v}}_{t i}\right]=\left[\begin{array}{c}
J_{t s i}^{*}  \tag{5.42}\\
{\underset{\sim}{t i}}
\end{array}\right]+[{\underset{\sim}{t i}}]
$$

By augmenting the above arrays with appropriate zeros to match the dimension of $\underset{\sim}{\underset{\sim}{*}} \underset{t}{*}$ (this is oniy conceptual, because sparse matrix techniques are used and every elment is put in the appropriate location in a one dimensional array) and summing up Eq. (5.42) for all the subnetworks and the interconnection, we obtain Eq. (5.40) again.

Since $\sum_{i}{\underset{\sim}{\sim}}^{J} i=\underset{\sim}{\sim}$ by $K C L$, therefore $\underset{\sim}{J}{ }_{t i}$ does not appear in the final matrix equations (5.39) and (5.40), so we can neglect $\underset{\sim}{J} i$ in both Eqs. (5.38) and (5.42).

So the modified solution procedure is as follows:
(1) Formulate the simplified Eq. (5.38) for each subnetwork, i.e.
(2) Process Eq. (5.43) to obtain the simplified Eq. (5.42) for each subnetwork, i.e.

$$
\left[\begin{array}{c}
\mathscr{L}_{t t i}^{*}
\end{array}\right]\left[\begin{array}{c}
\underline{v}_{t i}
\end{array}\right]=\left[\begin{array}{c}
J_{t s i}^{*} \tag{5.44}
\end{array}\right]
$$

(3) Sum up Eqs. (5.44) for all subnetworks and the interconnection with appropriate dimension match to obtain Eq. (5.40);
(4) Solve Eq. (5.40) to obtain $\underset{\sim}{v}$ and $\underset{\sim}{v}$;
(5) Solve the upper part of Eq. (5.41) by backward substitution to obtain $\underset{\sim}{\underset{\sim}{V} i}$ for each subnetwork $\underset{\sim}{\underset{\sim}{N}}$.

### 5.5. Sparsity Considerations for the Node Tearing Method

Now let us compare different ways of sparsity exploitation for processing Eq. (5.43). For the solution procedure discussed in the previous section, both LU factorization and substitution procedures are required. In SLATE the modified nodal equation formulation and the new reordering strategy described in Chapter 2 are used at the subnetwork level. After the current variables and the corresponding 'positive' node voltage variables are eliminated, the final subnetwork matrix is structurally symmetric. So here we assume that the subnetwork matrix is structurally symmetric, under this assumption, there are two possible $L U$ factorization procedures we would like to compare. there are other procedures described in [38], but these procedures are either equivalent to them or are less efficient.

The two LU factorization procedures are denoted by $F_{1}$ and $\bar{F}_{2}$ [38], and are given in Table 5.1.


$$
{\underset{\sim}{W}}^{T}=\underset{\sim}{\underset{\sim}{t}}{ }_{s k}{\underset{\sim}{U}}^{-1}, \quad \underset{\sim}{\hat{V}}={\underset{\sim}{U}}_{\underline{U}}^{-1} v,
$$

and

Table 5.1 Possible Factorization Procedures.


In $\bar{F}_{2}$, only those rows of $\underset{\sim}{\hat{V}}(\underset{\sim}{\hat{V}}{\underset{r e q}{ }}$ ) which are required to compute $\underset{\sim}{\hat{V}} \underset{p}{ }$ are computed, $\underset{\sim}{\underset{V}{V}}$ consists of those rows of $\underset{\sim}{\hat{v}}$ corresponding to the nonzero columns of $\underset{t s k}{ }{ }^{\text {. }}$

Let $|$.$| denote the number of nonzero elements in a vector or a matrix,$ and $M(. j)$ and $M(j$.$) the j t h$ column and the $j t h$ row of matrix $M$, respectively. Let $n_{t}$ denote the number of tearing nodes, and $B(k)$ satisfies the following relation.

$$
B(k)= \begin{cases}0 & k=0  \tag{5.45}\\ 1 & k \geq 1, k \text { integer }\end{cases}
$$

The following lemmas are used to compare the number of operations between $F_{1}$ and $\vec{F}_{2}$.

Lemma 5.1. Suppose the subnetwork matrix is structurally symmetric, then


Proof: From the definition of ${\underset{\sim}{\mathcal{V}}}_{p}$, we know that any nonzero row of $\underset{\sim}{V}$ which is not a row of ${\underset{\sim}{V}}_{p}$ must consist of only fill-ins. Suppose it is row i, then there must be a nonzero row $j$ of $\underset{\sim}{\hat{V}}, j<i$, and a nonzero ${\underset{\sim}{L}}^{\mathcal{L}}, i j$. Row $j$ together with ${\underset{\sim}{s k, i j}}$ creates the fill-ins of row i. Due to structure symmetry, there also must be a nonzero $\underset{\sim}{\underset{\sim}{U}}{ }_{\mathrm{sk}, \mathrm{ji}}$ So in order to evaluate the row $j$ of ${\underset{\sim}{\underset{P}{p}}}$, it is necessary to evaluate the row 1 of $\underset{\sim}{\hat{V}}$.

Lemma 5.2. Suppose the subnetwork matrix is structurally symmetric and $\underset{\sim}{\text { Y }}$ sk is mxm, then the difference in the number of operations between $F_{1}$ and $\bar{F}_{2}$ (DNE) is:

$$
D N F=\sum_{j=1}^{m}|\underset{\sim}{V}(j \cdot)||\underset{\sim}{V}(j \cdot)|+\sum_{j=1}^{m}\left(\left|{\underset{\sim}{V}}^{\mathrm{V}}(\mathrm{j} \cdot)\right|-1\right)|\underset{\sim}{V}(\mathrm{j} \cdot)|
$$

$-\sum_{j=2}^{m} \sum_{i=j+1}^{m}\left|{\underset{\sim}{U}}^{m}(j i)\right||\underset{\sim}{\hat{V}}(i \cdot)| B\left(|\underset{\sim}{V}(i \cdot)|-\sum_{j=1}^{m}|\underset{\sim}{Y} \underset{\sim}{ }(\cdot j)||\underset{\sim}{\underset{V}{V}}(j \cdot)| B(|\underset{\sim}{V}(j \cdot)|)\right.$
If $\hat{\sim}_{\text {req }}$ is full, then

$$
\begin{align*}
D N F= & \sum_{j=1}^{m}|\underset{\sim}{V}(j \cdot)||\underset{\sim}{V}(j \cdot)|-\sum_{j=1}^{m}\left(\left|\underset{\sim}{U_{s k}}(j \cdot)\right|-1\right)\left(n_{t}-|\underset{\sim}{V}(j \cdot)|\right) B(|\underset{\sim}{V}(j \cdot)|) \\
& -n_{t}|{\underset{\sim}{t} t s k}| \tag{5.47}
\end{align*}
$$



From structure symmetry, we obtain

$$
\begin{align*}
& |\underset{\sim}{\underset{\sim}{W}}(j \cdot)|=|\underset{\sim}{V}(j \cdot)|  \tag{5.49}\\
& |\underset{\sim}{\underset{W}{W}}(\cdot j)|=|\underset{\sim}{V}(j \cdot)|  \tag{5.50}\\
& |\underset{\sim}{\underset{\sim}{U}} \underset{s k}{T}(\cdot j)|=\left|\underset{\sim}{\underset{U}{U}}{ }_{s k}(j \cdot)\right| \tag{5.51}
\end{align*}
$$

From Lemma 5.1, we obtain

$$
\begin{equation*}
|{\underset{\sim}{\underset{V}{r e q}}}(\mathrm{j} \cdot)|=|\underset{\sim}{\hat{V}}(\mathrm{j} \cdot)| B(|\underset{\sim}{V}(\mathrm{j} \cdot)|) \tag{5.52}
\end{equation*}
$$

Substituting Eqs. (5.49), (5.50), (5.51), and (5.52) into Eq. (5.48), we obtain Eq. (5.46).

$$
\begin{align*}
& \text { If } \underset{\sim}{\hat{v}} \\
& \mid \underset{\sim}{\hat{V}}(j \cdot)  \tag{5.53}\\
& \text { is full, then } \\
& =n_{t}
\end{align*}
$$

Substituting Eq. (5.53) into Eq. (5.46), we obtain Eq. (5.47).

Associated with $F_{1}$ and $\bar{F}_{2}$, there are five possible substitution procedures denoted by $S_{1}, S_{1}^{*}, S_{1}^{* *}, S_{2}^{*}$, and $S_{2}^{* *}$ [38] which are given in Table 5.2, where $\underset{\sim}{b}$ req consists of the rows of $\underset{\sim}{b}$ which are required to compute ${\underset{\sim}{p}}_{p},{\underset{\sim}{p}}_{p}$ are those rows of $\underset{\sim}{b}$ corresponding to the nonzero columns of ${\underset{\text { tsk }}{ }}$.

Let $C($.$) denote the number of operations required in performing a$ given procedure $S$. By comparing the entries of the five substitution procedures in Table 5.2, the following equation is obtained.

$$
\begin{equation*}
C\left(S_{1}^{*}\right)-C\left(S_{1}^{* *}\right)=C\left(S_{2}^{*}\right)-C\left(S_{2}^{* *}\right) \tag{5.54}
\end{equation*}
$$

In large scale integrated circuits, most of devices are nonlinear. Due to the current sources generated by Newton-Raphson iteration and numerical integration, it is reasonable to assume that the source vectors $J_{s s k}$ and $J_{t s k}$ are full. Also, since $V_{s k}$ and $V_{t k}$ are the required node voltages, it is reasonable to assume that they are full. Under the above assumptions, we obtain the following lemmas.

Lemma 5.3. $\underset{\sim}{a}, \underset{\sim}{b}, \underset{\sim}{y}$, and $\underset{\sim}{a}$ are full.

Lemma 5.4. Suppose that the subnetwork matrix is structurally symmetric, then the number of rows of $b_{\text {req }}$ equals the number of nonzero rows of $V$.

Lemma 5.5. Suppose that that subnetwork matrix is structurally symmetric and ${\underset{\sim}{*}}_{\underset{*}{\mathbf{y}}}$ is mxm, then the difference in the number of operations between $S_{1}$ and $S_{1}$ (DNS1) is:

| *ャN |  |  |  |
| :---: | :---: | :---: | :---: |
| *か |  |  |  |
| * ${ }_{\text {* }}$ |  |  |  |
| * |  |  |  |
| $\mathrm{cos}^{-1}$ |  |  |  |

$$
\begin{aligned}
& \text { DNS } 1=C\left(S_{1}\right)-C\left(S_{1}^{*}\right) \\
& =\sum_{j=1}^{\mathbb{E}}|\underset{\sim}{V}(j \cdot)| \sum_{j=1}^{m}(|\underset{\sim s k}{U}(j \cdot)|-1) B(|\underset{\sim}{V}(j \cdot)|)-\sum_{j=1}^{\mathbb{E}}|\underset{\sim}{Y} t s k(\cdot j)|
\end{aligned}
$$

$$
\begin{align*}
& =(\text { number of fill-ins in } \underset{\sim}{V})-\sum_{j=1}^{\mathrm{m}}(|\underset{\sim}{\underset{s k}{ }}(\mathrm{j} \cdot)|-1) \mathrm{B}(|\underset{\sim}{\mathrm{~V}}(\mathrm{j} \cdot)|) \tag{5.55}
\end{align*}
$$

Lemma 5.6. Suppose that the subnetwork matrix is structurally symmetric and ${\underset{\sim}{s k}}^{Y}$ is $m \times m$, then the difference of number of operations between $S_{1}$ and $S_{1}^{* *}$ (DNS2) is:

$$
\begin{aligned}
& \text { DNS2 }=C\left(S_{1}\right)-C\left(S_{1}^{* *}\right) \\
& =\text { DNS } 1+\sum_{j=1}^{n_{t}}|V(\cdot j)|-\sum_{j=1}^{n_{t}}|\underset{\sim}{Y} s t k(\cdot j)| \sum_{j=1}^{m}\left|{\underset{v}{s k}}^{v}(\cdot j)\right| B(|\underset{\sim}{V}(j \cdot)|)
\end{aligned}
$$

$$
\begin{align*}
& =2 \text { ©DNS } 1-\sum_{j=1}^{\mathbb{M}} B(|\underset{\sim}{V}(j \cdot)|) \tag{5.56}
\end{align*}
$$

Remark. From Lemma 5.6 we can conclude that if $C\left(S_{1}\right) \leq C\left(S_{1}^{*}\right)$ then $\mathrm{C}\left(\mathrm{S}_{1}\right) \leq \mathrm{C}\left(\mathrm{S}_{1}^{* *}\right)$, that is, only when $\mathrm{C}\left(\mathrm{S}_{1}\right)>\mathrm{C}\left(\mathrm{S}_{1}^{*}\right)$ do we need to compare $\mathrm{S}_{1}$ with $S_{1}^{\star *}$.

Lemma 5.7. Suppose that the subnetwork matrix is structurally symmetric and ${\underset{\sim}{s k}}_{\mathbf{Y}}^{\text {is }} \mathrm{mxm}$, then

$$
\begin{align*}
& C\left(S_{1}^{*}\right) \leq c\left(S_{2}^{*}\right)  \tag{5.57}\\
& c\left(S_{1}^{* *}\right) \leq c\left(S_{2}^{* *}\right) \tag{5.58}
\end{align*}
$$

Proof: From Eq. (5.54), we obtain

$$
\begin{equation*}
C\left(S_{1}^{*}\right)-C\left(S_{2}^{*}\right)=C\left(S_{1}^{* *}\right)-C\left(S_{2}^{* *}\right) \tag{5.59}
\end{equation*}
$$

So we only need to prove $C\left(S_{1}^{*}\right)-C\left(S_{2}^{*}\right)<0$

$$
C\left(S_{1}^{*}\right)-C\left(S_{2}^{*}\right)=\sum_{j} \sum_{1}^{m}\left(\left|{\underset{\sim}{U}}^{s k}(j \cdot)\right|-1\right) B(|\underset{\sim}{v}(j \cdot)|)-\sum_{j=1}^{m}(|{\underset{U}{S k}}(j \cdot)|-1)
$$

$$
+\sum_{j=1}^{m}\left(\left|{ }_{\sim s k}^{U}(j \cdot)\right|-1\right)-\sum_{j=1}^{m}\left(\left|\|_{\sim s k}^{U}(j \cdot)\right|-1\right)\left|{\underset{\sim}{2}}^{(j}(j)\right|
$$

$$
\begin{equation*}
=\sum_{j=1}^{m}\left(\left|\underset{\sim}{U}{ }_{\sim k}(j \cdot)\right|-1\right)\left(B(\mid \underset{\sim}{V}(j \cdot) 1)-\left|\underset{\sim}{Z}{ }_{2}(j)\right|\right) \tag{5.60}
\end{equation*}
$$

Since

$$
\begin{equation*}
\left|{\underset{\sim}{z}}_{1}(j)\right|=B(|\underset{\sim}{V}(j \cdot)|) \tag{5.61}
\end{equation*}
$$

and

$$
\begin{equation*}
|\underset{\sim}{z}(j)| \geq|\underset{\sim}{z}(j)| \tag{5.62}
\end{equation*}
$$

so we have

$$
\begin{equation*}
|\underset{\sim}{z}(j)| \geqslant B(|\underset{\sim}{V}(j \cdot)|) \tag{5.63}
\end{equation*}
$$

Substituting Eq. (5.63) into Eq. (5.60), we obtain

$$
\begin{equation*}
C\left(S_{1}^{*}\right)-c\left(S_{2}^{*}\right) \leqslant 0 \tag{5.64}
\end{equation*}
$$

From Lemma 5.7, we know that $S_{2}$ and $S_{2}^{* *}$ are not as efficient as the other procedures, so they are eliminated from the list of possible substitution procedures. Now we are left with $S_{1}, S_{1}^{*}$ and $S_{1}^{* *}$. The possible combinations of factorization methods and substitution methods are $F_{1}+S_{1}, F_{1}+S_{1}^{*}, F_{1}+S_{1}^{* *}, \bar{F}_{2}+S_{1}^{*}$, and $\bar{F}_{2}+S_{1}^{* *}$. Theoretically we can not eliminate any of these five combinations, because we can always come up With a special subcircuit structure for which a particular combination gives the best result. However, after conducting a large number of studies, we found experimentally that $F_{1}+S_{9}$ gives the best results for all the practical circuits we used; moreover, $F_{1}+S_{1}$ is well compatible
with the sparse matrix techniques and is the easiest one to implement, so $F_{1}+S_{1}$ was chosen to be used in the program SLATE.

In the following we would like to present a small selection of the examples which we have analyzed by using Lemma 5.2 , Lemma 5.5 and Lemma 5.6.

Example 5.1. The subcircuit used is a TTL two-input NAND gate with parasitic resistors included (Fig. 5.6). The admittance matrix for this subcircuit is shown in Fig. 5.7.

```
DNF = 20-23-39=-42
DNS1 = 9-13=-4
DNS2 = -8 - 10=-18
```

so the best combination for this subcircuit is $F_{1}+S_{1}$.

Example 5.2. The subcircuit used is an ECL two-input NOR gate with parasitic resistors included (Fig. 5.8). The admittance matrix for this subcircuit is shown in Fig. 5.9.

```
DNF = 17-15-27=-25
DNS1 = 6-8=-2
DNS2 = -4-6 = -10
```

so the best combination for this subcircuit is $F_{1}+S_{1}$.

Example 5.3. The subcircuit used is an MOS two-input NAND gate with


Fig. 5.6 TTL Two-Input NAND Gate.
$-1$
 N0000000000000000000000×的的×世



 No0000000000000000000xxxx000

 go $0000000000000000 \times \times \times 000 \times 000$ $=00000000000000 \times \times \times 000040 \times 00$ 山 N00000000000000 $\times \times \times 0000$ 以 $0000 \times 1$ $00000000000000 \times \times \times \times 0000150000$ m $\times 000000000000 \times \times 000000 \times 00000$
 $\simeq 0000 \times 00000 \times \times \times 000001100000000$上 $00 x 0000000 \times x \times 100000000000000$ $\pm 0000000 \times \times \times 00000000010000 \times 00$ $\rightarrow 000 \times 000 \times \times \times 000000000$ 上10000000 n $0000000 \times \times \times 00000000000000000$ a $00000 \times x 00000$ 以 $000000000000 \times 00$ $\infty 00000 \times \times 00000 \times 00000000000000$ $0 \times 000 \times 000000 \times 000000 \times 00000000$ in 000×0000×0000000000×0000000 $R \times 0 \times 0000000 \times 00000000000000$ $N \times \times \times 0 \times 00000000 \times 0000000000000$ ～～～$\times 00000000000000000000000000$
 తత్ర్త


|  |  | 23 | 22 | 5 | 6 | 3 | 8 | 7 | 9 | 10 | 11 | 12 | 4 | 14 | 15 | 16 | 17 | 18 | 13 | 2 | 19 | 20 | 21 |
| ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| (5) | 23 | 1 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| (6) | 22 | 0 | 1 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ |
| (23) | 5 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 3 | 0 | 0 | 0 | 0 | $X$ | 0 | 0 | 0 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 8 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 7 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | 0 | 0 | 0 |  |
| 9 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $F$ | 0 | 0 | $X$ |  |
| 10 | 0 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 11 | 0 | 0 | 0 | 0 | $X$ | 0 | 0 | 0 | $X$ | $X$ | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | $X$ | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 4 | 0 | 0 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | 0 | $X$ | 0 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 |  |
| 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | 0 | 0 | 0 | $X$ | $X$ | 0 | 0 | 0 |  |
| 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | $X$ | 0 | 0 | $F$ | $X$ | $F$ | 0 | 0 | 0 |  |
| 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | $X$ | 0 | 0 | $X$ | 0 | 0 |  |
| 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | $X$ | 0 | $X$ | $F$ | 0 | 0 |  |
| 18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | 0 | $F$ | $X$ | $X$ | $X$ | $F$ | $F$ | $F$ | 0 | 0 |  |
| 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $X$ | 0 | 0 | $F$ | $X$ | $F$ | $F$ | $X$ | 0 |  |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $F$ | 0 | 0 | 0 | 0 | $X$ | $F$ | 0 | $X$ | $F$ | $F$ | $X$ | $F$ | $F$ | $F$ |  |
| 19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $F$ | $F$ | $F$ | $F$ | $X$ | $F$ | $F$ |  |
| 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | $F$ | $F$ | $X$ | $F$ |  |
| 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $X$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | $F$ | $F$ | $F$ | $X$ |  |

Fig. 5.9 Admittance Matrix for the Subcircuit in Fig. 5.8.
parasitic resistors included (Fig. 5.10). The admittance matrix for this subcircuit is shown in Fig. 5.11.

DNF $=18-9-30=-21$

DNS $1=3-5=-2$

DNS2 $=-4-7=-11$
so the best combination for this subcircuit is $F_{1}+S_{1}$.

These three examples show that at the subnetwork level the best combination is $\mathrm{F}_{1}+\mathrm{S}_{1}$.

For the interconnection matrix, because this matrix has the same property as that of the matrix formulated by the MNA for any circuit, so the new reordering strategy of the MNA and the Markowitz sparse matrix scheme are used to exploit the sparsity at this level.

### 5.6. Implementation of the Node Tearing Method

As concluded in the previous section the best combination at the subcircuit level is in most cases $F_{1}+S_{1}$, now we would like to describe the implementation of this approach. First, at the subcircuit level, the source vector is appended to the matrix to form

Secondly, use the new reordering strategy of the MNA and the Markowitz sparse matrix scheme to find the LU factorization of Lask. The LU


FP-7015

Fig. 5.10 MOS Two-Input NAND Gate.

|  |  | 13 | $\begin{aligned} & 2 \\ & \mathrm{X} \end{aligned}$ | 5 | 4 | 8 <br> $\times$ | 9 | 3 | 7 | 6 | 10 | 1 | 2 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| (2) | 13 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 5 | 0 | 0 | X | X | 0 | 0 | 0 | 0 | 0 | 0 | X | 0 |
|  | 4 | 0 | 0 | X | X | 0 | 0 | X | 0 | 0 | 0 | X | 0 |
|  | 8 | 0 | X | 0 | 0 | X | X | 0 | 0 | 0 | 0 | 0 | X |
|  | 9 | 0 | 0 | 0 | 0 | X | X | 0 | 0 | 0 | 0 | 0 | X |
|  | 3 | 0 | 0 | 0 | X | 0 | 0 | X | X | 0 | 0 | F | 0 |
|  | 7 | 0 | 0 | 0 | 0 | 0 | 0 | X | X | X | X | F | 0 |
|  | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | X | X | X | F | X |
|  | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | X | X | X | F | F |
|  | 11 | 0 | 0 | X | X | 0 | 0 | F | F | F | $F$ | X | F |
|  | 12 | 0 | 0 | 0 | 0 | X | X | 0 | 0 | X | F | F | X |

Fig. 5.11 Admittance Matrix for the Subcircuit in Fig. 5.10.
factorization operates on the whole matrix but terminates after the $L U$ factorization of ${\underset{\sim}{s k}}$ is obtained. Now the original matrix is transformed into

Thirdly, formulate the interconnection matrix equation (5.40). Fourthly, use the new reordering strategy of the MNA and the Markowitz sparse matrix scheme to find the LU factorization of Eq. (5.40), and use forward and backward substitution to find the solutions for $\underset{\sim}{y}$ and ${\underset{\sim}{r}}^{v}$. Fifthly, use backward substitution to solve the upper part of Eq. (5.46) to obtain the solutions $\underset{\sim}{V}$ sk .

Remark: The reordering of the subnetwork and network matrix equations is done in the preprocessing phase. The subnetwork matrix equations are reordered first. So when the network matrix equations are reordered, the
 and the circuit description of the network, we can reorder the network matrix equations by the new reordering strategy of the MNA and the Markowitz sparse matrix scheme.
5.7. Circuit Interpretation of the Tearing Methods

Although the derivations of the tearing methods are quite mathematical, there is a very simple circuit interpretation. The node tearing method is just a generalized Norton equivalent circuit approach. The branch tearing method is just a generalized Thevenin eqivalent circuit approach.

Let us consider node tearing first. Consider the special case when there is only one tearing node. The partial interconnection matrix equations (Eq. (5.42)) that we obtain for each subnetwork are just the Norton equivalent circuit matrix equations for that subnetwork. This is illustrated in Fig. 5.12(a). So for the case when there are more than one tearing node, Eq. (5.42) is just the generalized Norton equivalent circuit matrix equations for each subnec. See Fig. 5.12(b) for an illustration of the case of two tearing nodes.

Now let us consider branch tearing for the special case when there is only one tearing branch. The partial interconnection matrix equations that we obtain for each subnetwork are just the Thevenin equivalent circuit matrix equations for that subnetwork. This is illustrated in Fig. 5. 13 (a). So for the case when there is more than one tearing branche, the partial interconnection matrix equations that we obtain for each subnetwork are just the generalized Thevenin eqivalent circuit matrix equations for that subnetwork Fig. 5.13 (b) illustrates the case of two tearing branches.

(a) One Tearing Node.



Fig. 5. 12 Norton Equivalent Circuit Interpretation of the Node Tearing Method.


### 5.8. Discussion and Conclusion

The reason why we considered many possible LU factorization and substitution procedures at the subnetwork level is that at the subnetwork level we only want to perform Gaussian elimination for the variables ${\underset{\sim}{s k}}$ and the elimination procedure terminates after the LU factorization of ${\underset{\sim}{s k}}^{Y_{k}}$ is obtained. At the interconnection level, because we perform the Gaussian elimination for all the variables, only one $L U$ factorization and substitution procedure is used.

As discussed in Section 5.5 , it is possible to generate special subcircuit structures for which some other combinations give better results, however, our experimental results show that even for these specially constructed circuits the difference of the number of operations is only 1 or 2 most of the time, so this very small savings does not justify the extra difficulties of implementation associated with these other combinations; moreover, for all the practical circuits we tested, $F_{1}+S_{1}$ showed considerable savings over all the other combinations.

## VI. LATENCY EXFLOITATION

In conventional circuit simulation programs [1,2], all of the node voltages or branch voltages and currents are calculated at each iteration and each timepoint. Even with sparse matrix techniques the simulation of modern large-scale integrated (LSI) circuits is not possible in many situations due tc the excessive computation time and high storage requirements. The latency approach is a circuit analysis version of the selective trace approach used in logic simulation. This approach takes advantage of the fact that in some circuits only a small portion of the circuit is active at any given time and at any iteration, and $t$ us provides savings in CPU time.

In the program SLATE this approach is applied at three levels: device level; (2) subnetwork level; (3) network level. At the device level, it is also called the bypass scheme. This scheme is done by monitoring the operating point $\therefore$ each nonlinear device. If the operating point does not change significantly between timepoints or Newton-Raphson iterations, then the device models are not reevaluated, and the matrix entries computed at the previous timepoint or the previous iteration are used again. This scheme is used in SPICE2 [2] and SPLICE [39]. Latency at the subnetwork and network levels can be well exploited when tearing methods are used to analyze the network. The tearing method used in program SLATE is node tearing, so in the following discussion about latency at the subnetwork and network levels, node tearing is assumed. Latency exploitation at the subnetwork level is presented in Section 6.1. Latency exploitation at the network level is presented in Section 6.2. In Section 6.3 a discussion is given.
6.1. Latency Exploitation at the Subnetwork Level

As discussed in Chapter $V$, the node tearing method is just a generalized Norton equivalent circuit aporoach. After all the internal circuit variables of a subnetwork are eliminated, a generalized Vorton equivalent circuit of the subnetwork is obtained. Combining the equivalent circuits of all the subnetworks with the rest of the network (Fig. 5.1), we obtain the interconnection circuit. So after apolying the node tearing method to tear the network apart into several subnetworks, the preprocessing of each subnetwork to obtain the contribution to the interconnection matrix is equivalent to constructing an equivalent circuit for each subnetwork. If the solutions of the circuit variables of a subnetwork do not change significantly between timepoints or Newton-Raphson iterations, then there is no need to reconstruct an equivalent circuit for that subnetwork. The equivalent circuit constructed at the orevious timepoint or the previous iteration is used again and the subnetwork is declared as latent. The subnetwork remains latent until the solations of the circuit variables of the subnetwork change significantly between when it is declared latent and the present time or the present iteration. This is the basic concept of latency at subnetwork level.

There are two tyoes of latency at the subnetwork level: one is latency in the Newton-Raphson iteration, the other is latency in time.

Latency in the Newton-Raphson iteration is not natural. It is related to the convergence property of each subnetwork and the initial guess of the operating point for each subnetwork. Let us consider the example in Fig. 6.2. It is assumed that the input signal is constant and that a ic analysis is required. Different subnetworks may require different number


Fig. 6.1(a) A Network Partitioned into Three Subnetworks by the Node Tearing Method.
(b) Equivalent Interconnection Circuit obtained from the Node Tearing Method.

of iterations to converge, for example, because the initial guess of the operating points may be good for some subnetworks and bad for the others. After these subnetworks converge, they are declared as latent. The Newton-Raphson iterations continue until all the subnetworks converge. This phenomenon is called latency in the Newton-Raphson iteration. Taking advantage of this latency results in savings in the execution time. This latency may not exist in some circuits. For example, a linear circuit.

Latency in time is a natural ohenomenon. All ohysical devices have intrinsic delay time between excitation and response. For a large network, When the input signal changes, it takes time for this change to propagate to the rest of the network. Let us consider the example in Fiz. 6.2 again. Let us assume that the input changes from $0 V$ to 5 V at time $\mathrm{t}_{0}$. Initially, probabably only the first few subnetworks are not latent, the rest of the subnetworks are latent. As time oasses, the change in the response propagates to the intermediate subnetworks. At this time, only the intermediate subnetworks are not latent, the rest of the subnetworks are latent. Finally, the change propagates to the last few subnetworks and the rest of the subnetworks are latent. This ohenomenon is called latency in time, and it always exists in real circuits. Taking gdvantage of this iatency reaults in savings in the execution time.

In order to exploit these two tyjes of latency, some sort of latency criteria are required to determine if a subnetwork is latent. The latency criteria proposed here are developed for the node tearing method, if the branch tearing method is used, these criteria shoula be modified accordingly.

Let us consider one subnetwork $N_{k}$. Let the tearing node voltages of $N_{k}$ be denoted by $v_{t k}$, the node voltages of $N_{k}$ be denoted by $v_{s k}$, the node voltages of all the nonlinear devices be denoted by ${ }_{\sim}{ }_{n l k}$.

First, let us consider the latency criterion in the Newton-Raphson iteration. In principle, the solution of all the circuit variables of a subnetwork should be checked to determine if the subnetwork is latent. However, to check for latency in the Newton-Raphson iteration only the node voltages of all the nonlinear devices need be checked. The latency criterion in the Newton-Raphson iteration used in SLATE is as follows: A subnetwork $N_{k}$ is declared as latent at the ith iteration if the following two conditions are satisfied.
(1)
(2)

$$
\begin{align*}
& \left|v_{n 1 k_{m}}(i-1)-v_{n l k_{m}}(i-2)\right| \leqslant \varepsilon_{a}+\varepsilon_{r} \max \left(\left|v_{n 1 k_{m}(i-1)}\right|,\right.  \tag{6.1}\\
& \left.\left|v_{n 1 k_{m}}(i-2)\right|\right) \quad m=1,2, \ldots \\
& \left|v_{t k_{m}}(i)-v_{t k_{m}}(i-1)\right| \leqslant \varepsilon_{a}+\varepsilon_{r} \max \left(\left|v_{t k_{m}}(i)\right|,\left|v_{t k_{m}}(i-1)\right|\right)  \tag{6.2}\\
& m=1,2, \ldots
\end{align*}
$$

where $\varepsilon_{a}$ and $\varepsilon_{r}$ are absolute and relative error criteria.

The subnetwork $N_{k}$ will remain latent as long as

$$
\begin{align*}
& \text { (3) }\left|v_{t k_{m}}(i+j)-v_{t k_{m}}(i-1)\right| \leqslant \varepsilon_{a}+\varepsilon_{r} \max \left(\left.\right|_{t k_{m}}(i+j) \mid\right.  \tag{3}\\
& \left.\left|v_{t k_{m}}(i-1)\right|\right) \\
& \text { Once a subnetwork is declared as latent in the Newton-Raphson }
\end{align*}
$$ iteration, no linearization of the nonlinear devices of $N_{k}$ is required, no preprocessing of the subnetwork to obtain the partial contribution to the interconnection matrix is required, no backward substitution to obtain the solutions of the internal circuit variables is required, and no convergence

tests are required. One only needs to monitor the tearing node voltages and to bring the previous partial contribution to the interconnection matrix.

The simulation data from SLATE show that considerable savings in execution time is obtained and the output results are essentially the same as those from YSPICE. Table 6.1 gives the simulation data from SLATE for the circuit shown in Fig. 6.3. A dc analysis was performed. For this circuit, a $42.42 \%$ latency exploitation was achieved and a $22.65 \%$ savings in CPU time was obtained.

Remark: Because the CPU time shown in Table 6.1 is the total CPU time, which includes the time spent in the $I / O$ and other utility subroutines, the savings in CPU time is not the same as the latency exploitation.

For the latency in time criteria, four schemes are proposed here. The first three schemes, scheme 0 , scheme 1 and scheme 2 , have been implemented and tested in program SLATE. Scheme 3 is still under investigation.

Scheme 0 is the easiest and the crudest scheme that could be implemented. A subnetwork $N_{k}$ is considered latent at time $t_{n}$ if
(1) $\left|v_{t k_{m}}\left(t_{n}\right)-v_{t k_{m}}\left(t_{n-1}\right)\right| \leqslant \varepsilon_{a}+\varepsilon_{r} \max \left(\left|v_{t k_{m}}\left(t_{n}\right)\right|\right.$,

$$
\left.\left|v_{t k_{m}}\left(t_{n-1}\right)\right|\right) \quad m=1,2, \ldots
$$

The subnetwork ${ }^{m} N_{k}$ will remain latent as long as
(2) $\left|v_{t k_{m}}\left(t_{n+j}\right)-v_{t k_{m}}\left(t_{n-1}\right)\right| \leqslant \varepsilon_{a}+\varepsilon_{c} \max \left(\mid v_{t k_{m}}\left(t_{n+j}\right)\right.$,

$$
\left.\left|v_{t k_{m}}\left(t_{n-1}\right)\right|\right) \quad{\underset{j}{j}}=\frac{1}{1}, 2, \ldots .
$$

The advantages of Scheme 0 are:

(b)

Table 6.1 Simulation Data of a DC Analysis for the MOS Circuit in Fig. 6.3.

| DC analysis | SLATE | SPICE2 |
| :--- | :--- | :--- |
| \# of subnetworks <br> times \# of <br> iterations | 231 | 231 |
| \# of nonlatent <br> subnetworks times <br> \# of iterations | 133 |  |
| Latency exploitation | $42.42 \%$ |  |
| Total CPU time <br> (sec.) | 3.927 | 5.077 |
| Savings in CPU <br> time | $22.65 \%$ |  |

(1) It is very easy to implement and there is no overhead.
(2) It is faster than Scheme 1

The disadvantages of Scheme 0 are:
(1) It is not reliable and it is not accurate. If the network is a stiff system and some of the node voltages are slowly varying, then it is possible to declare a slowly varying subnetwork as latent and consequently wrong answers are obtained.
(2) The tearing node voltages at time $t_{n-1}$ must be stored, that is, more memory is required.

Scheme 1 is the most accurate scheme, and it only takes advantage of latency in the Newton-Raphson iteration. It is based on the idea that if a subnetwork is latent in time, then it is also latent in the Newton-Raphson iteration. Even if we do not take advantage of latency in time, all the subnetworks are treated as nonlatent in time and are solved at least once at every timepoint. Those subnetworks which are latent in time at any timepoint will be declared as latent in the Newton-Raphson iteration after one iteration at that timepoint. So at most one iteration for each latent subnetwork is wasted at one timepoint. Scheme 1 is as follows:
(1) Solve the entire network including all the subnetworks at least once at every timepoint.
(2) If any subnetwork is latent in time, then it is latent in all the subsequent Newton-Raphson iterations at that timepoint. So only one iteration for that subnetwork is performed.
(3) For the nonlatent subnetworks, the Newton-Raphson iterations are continued until convergence is obtained at that timepoint.

The advantages of Scheme 1 are:
(1) It is very easy to implement and there is no overhead.
(2) The tearing node voltages at time $t_{n-1}$ need not be stored, that is, less memory is required.
(3) It is accurate and reliable.

The disadvantage of Scheme 1 is that at every timepoint one iteration is wasted for each subnetwork latent in time.

Scheme 0 is efficient but is not reliable. Scheme 1 is reliable but is not efficient. If the network being solved is not stiff, then Scheme 0 is preferred. However, if the network is stiff and efficiency is important, then both Scheme 0 and Scheme 1 are not suitable. Scheme 2 was developed to accommodate this situation. It is similar to Scheme 0 in efficiency and it is similar to Scheme 1 in reliability. It differs from Scheme 0 in that some extra checks are made to make sure that slowly varying subnetworks will not be declared as latent. All the slowly varying subnetworks are declared as nonlatent.

Let the charges of capacitors and the fluxes of inductors of subnetwork $N_{k}$ be denoted by $\underline{Q}_{k}=\left(Q_{k 1}, Q_{k 2}, \ldots ., Q_{k b}\right)$. Let the currents of capacitors and the voltages of inductors of subnetwork $N_{k}$ be denoted by $I_{k}=\left(I_{k 1}, I_{k 2}, \ldots \ldots, I_{k b}\right)$. Scheme 2 is as follows: A subnetwork $N_{k}$ is considered as latent if the following three conditions are satisfied.
(1) $\left|v_{t k_{m}}\left(t_{n}\right)-v_{t k_{m}}\left(t_{n-1}\right)\right| \leqslant \varepsilon_{a}+\varepsilon_{r} \max \left(\left|v_{t k_{m}}\left(t_{n}\right)\right|\right.$,

$$
\begin{equation*}
\mid v_{t k_{m}}\left(t_{n-1}\right) \quad m=1,2, \ldots \tag{6.6}
\end{equation*}
$$

This condition is the same as that of Scheme 0 .

$$
\begin{align*}
& \text { (2) }\left|I_{k_{m}}\left(t_{n}\right)-I_{k_{m}}\left(t_{n-1}\right)\right| \leqslant \varepsilon_{c}+\varepsilon_{r} \max \left(\left|I_{k_{m}}\left(t_{n}\right)\right|,\right.  \tag{6.7}\\
& \left.\left|I_{k_{m}}\left(t_{n-1}\right)\right|\right) \quad m=1,2 \ldots, b
\end{align*}
$$

where $\varepsilon_{c}$ is the absolute error criterion for current. This condition is used to check if the changes of the energy-storage elements of subnetwork Nk are small.

$$
\begin{equation*}
\text { (3) } h_{n-1}\left|\frac{I_{k_{m}}\left(t_{n}\right)-I_{k_{m}}\left(t_{n-1}\right)}{Q_{k_{m}}\left(t_{n}\right)-Q_{k_{m}}\left(t_{n-1}\right)}\right| \quad 21 \quad m=1,2, \ldots, b \tag{6.8}
\end{equation*}
$$

This condition is used to check if there are slowly varying nodes within subnetwork $N_{k}$. In order to avoid division by zero, if $\left|I_{k_{m}}\left(t_{n}\right)-I_{k_{m}}\left(t_{n-1}\right)\right| \leqslant \varepsilon \quad$ ( $\varepsilon$ is a very small quantity, it is $10^{-12}$ in SLATE), then condition (3) is skipped.

The subnetwork $N_{k}$ will remain latent as long as
(4) $\left|v_{t k_{m}}\left(t_{n+j}\right)-v_{t k_{m}}\left(t_{n-1}\right)\right| \leqslant \varepsilon_{a}+\varepsilon_{r} \max \left(\left|v_{t k_{m}}\left(t_{n+j}\right)\right|\right.$,

$$
\left.\left|v_{t k_{m}}\left(t_{n-1}\right)\right|\right) \quad m=1,2, \ldots
$$

Eq. (6.8) is derived based on the following reasoning. Let us assume that we are dealing with a linear capacitor and an exponential waveform, then

$$
\begin{equation*}
Q=C^{*} V \tag{6.10}
\end{equation*}
$$

$$
\begin{equation*}
\frac{d Q}{d t}=I \tag{6.11}
\end{equation*}
$$

$$
\begin{align*}
& V=V_{D D}\left(1-e^{-t / \tau}\right)  \tag{6.12}\\
& Q\left(t_{n}\right)=C V_{D D}\left(1-e^{t_{n} / \tau}\right)  \tag{6.13}\\
& Q\left(t_{n-1}\right)=C * V_{D D}\left(1-e^{t_{n-1} / \tau}\right)  \tag{6.14}\\
& I\left(t_{n}\right)=-\frac{C * V_{D D}}{\tau} e^{-t_{n} / \tau}  \tag{6.15}\\
& I\left(t_{n-1}\right)=-\frac{C * V_{D D}}{\tau} e^{-t_{n-1} / \tau} \tag{6.16}
\end{align*}
$$

From Eqs. (6.13), (6.14), (6.15), and (6.16), we obtain

$$
\begin{equation*}
h_{n-1} \frac{I\left(t_{n}\right)-I\left(t_{n-1}\right)}{Q\left(t_{n}\right)-Q\left(t_{n-1}\right)}=\frac{h_{n-1}}{\tau} \tag{6.17}
\end{equation*}
$$

So Eq. (6.8) means that although the change in the response of a capacitor is very small, if $h_{n-1}$ is smaller than $\tau$, then the capacitor is not latent, it is just slowly varying.

The advantages of Scheme 2 are:
(1) It is faster than Scheme 1.
(2) It is accurate and reliable. Slowly varying subnetworks are detected and are treated as nonlatent subnetworks.

The disadvantages of Scheme 2 are:
(1) More checking is required.
(2) The tearing node voltages at time $t_{n-1}$ must be stored, that is, more memory is required.
(3) slowly varying subnetworks are detected and are treated as nonlatent subnetworks, even though the changes in the response may be negligible.

Remark: The detection of slowly varying subnetworks increases the accuracy and reliability of the program, that is why it is an advantage. However, since the changes in these slowly varying subnetworks are small, treating them as nonlatent subnetworks is not efficient, that is why it is also a disadvantage.

In order to overcome this problem, Scheme 3 was proposed. Scheme 3 takes full advantage of latency in time. Scheme 3 is as follows:
(1) The truncation error criteria are used to determine the timestep for each subnetwork.
(2) Each subnetwork is analyzed with its own timestep.

The advantages of Scheme 3 are:
(1) It should be faster than all the other schemes.
(2) It is accurate and reliable. Since every subnetwork has its own timestep, there will not be the problem of slowly varying subnetworks.

The disadvantages of Scheme 3 are:
(1) It is not compatible with the present version of SLATE. An extra event scheduler is required and the data structures of SLATE has to be revised.
(2) It is more complicated to implement.

Because Scheme 3 is not compatible with the present version of SLATE, therefore it has not been tested and implemented in SLATE.

In the following, a small selection of examples is presented to give a comparison among the first three schemes of SLATE: Scheme 0 , Scheme 1 , and Scheme 2, and YSPICE.

Example 6.1: The MOS circuit shown in Fig 6.3 was analyzed by SLATE and YSPICE. The output results of the three schemes of SLATE and YSPICE are essentially the same (within four significant figures). The simulation data of a transient analysis are given in Table 6.2. The simulation data show that both Scheme 0 and Scheme 2 are more efficient than Scheme 1. For this circuit, Scheme 2 is the most efficient and is about 2.5 times faster than YSPICE.

This example shows that the latency in time approach is useful for the analysis of MOS circuits. The next example shows that the latency in time approach is also useful for bipolar circuits.

Example 6.2: The TTL circuit shown in Fig 6.4 was analyzed by SLATE and YSPICE. The output results of the three schemes of SLATE and YSPICE are essentially the same(within four significant figures). The simulation data of a transient analysis are given in Table 6.3. For this example, the

Table 6.2 Simulation Data of a Transient Analysis for the MOS Circuit in Fig. 6.3.

| Transient <br> analysis | Scheme 0 | Scheme 1 | Scheme 2 | YSPICE |
| :--- | :--- | :--- | :--- | :--- |
| \# of subnetworks <br> times 非 of <br> iterations | 2849 | 2475 | 2585 |  |
| \# of nonlatent <br> subnetworks times <br> \# iterations | 790 | 1277 | 706 |  |
| Latency <br> exploitation | $72.27 \%$ | $48.40 \%$ | $72.69 \%$ |  |
| Total cPU time <br> (sec.) | 18.358 | 22.073 | 17.022 | 42.877 |
| Savings in <br> CPU time | $57.12 \%$ | $48.52 \%$ | $60.30 \%$ |  |



Table 6.3 Simulation Data of a Transient Analysis for the TTL Circuit in Fig. 6.4.

| Transient <br> analysis | Scheme 0 | Scheme 1 | Scheme 2 | YSPICE |
| :--- | :--- | :--- | :--- | :--- |
| \# of subnetworks <br> times 非 of <br> iterations | 2850 | 2945 | 2770 |  |
| F of nonlatent <br> subnetworks times <br> \#f iterations | 1516 | 2128 | 1945 |  |
| Latency <br> exploitation | $46.81 \%$ | $27.74 \%$ | $29.78 \%$ |  |
| Total CPU time <br> (sec.) | 68.996 | 97.164 | 86.033 | 132.338 |
| Savings in <br> CPU time | $47.86 \%$ | $26.58 \%$ | $34.99 \%$ |  |

simulation data show that Scheme 0 is the most efficient, Scheme 1 is still the least efficient. The reason why Scheme 2 is not as efficient as Scheme 0 can be traced to the close-coupling of bipolar circuits. So the conclusion obtained from this example is that the error criteria should be loosened for bipolar circuits. Scheme 0 is about 2 times faster than YSPICE.

Although the above two examples show that Scheme 0 is very efficient, however, as mentioned before, Scheme 0 has a reliability problem. The following example shows that Scheme 0 may give inaccurate output results for stiff systems.

Example 6.3: The RC circuit shows in Fig. 6.5 was analyzed by the three schemes of SLATE. This circuit is a stiff system. the output results are given in Table 6.4. For scheme 0 , because the changes of the tearing node voltages of subnetwork 1 and subnetwork 2 are very small after $t=13 \mathrm{~ns}$, both subnetworks are declared as latent. Since the input is constant too after $t=13 \mathrm{~ns}$, all the calculated output voltages will remain unchanged afterwards, while the true output voltages should increase slowly. This phenomenon can be observed from the unchanged output voltages after $t=13 \mathrm{~ns}$ in Table $6.4(\mathrm{a})$. This example shows that Scheme 0 may not give accurate results when the network is stiff and that both Scheme 1 and Scheme 2 give accurate results even when the network is stiff.


Fig. 6.5 An Example of a Stiff System,

Table 6.4(a) Scheme 0.

| TIME | $V(2)$ | $V(3)$ | $V(4)$ |
| :---: | :--- | :--- | :--- |
| $0.000 D+00$ | $0.000 D+00$ | $0.000 D+00$ | $0.000 D+00$ |
| $1.000 D-09$ | $2.936 D+00$ | $7.567 D-01$ | $4.445 D-13$ |
| $2.000 D-09$ | $3.604 D+00$ | $1.146 D+00$ | $3.726 D-12$ |
| $3.000 D-09$ | $3.733 D+00$ | $1.237 D+00$ | $1.144 D-11$ |
| $4.000 D-09$ | $3.749 D+00$ | $1.249 D+00$ | $2.403 D-11$ |
| $5.000 D-09$ | $3.751 D+00$ | $1.251 D+00$ | $4.161 D-11$ |
| $6.000 D-09$ | $3.749 D+00$ | $1.249 D+00$ | $6.187 D-11$ |
| $7.000 D-09$ | $3.750 D+00$ | $1.250 D+00$ | $8.020 D-11$ |
| $8.000 D-09$ | $3.750 D+00$ | $1.250 D+00$ | $9.850 D-11$ |
| $9.000 D-09$ | $3.749 D+00$ | $1.249 D+00$ | $1.161 D-10$ |
| $1.000 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.260 D-10$ |
| $1.100 D-08$ | $3.748 D+00$ | $1.248 D+00$ | $1.307 D-10$ |
| $1.200 D-08$ | $3.748 D+00$ | $1.248 D+00$ | $1.302 D-10$ |
| $1.300 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.246 D-10$ |
| $1.400 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $1.500 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $1.600 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $1.700 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $1.800 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $1.900 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $2.000 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $2.100 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $2.200 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $2.300 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $2.400 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |
| $2.500 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.145 D-10$ |

## Table 6.4(b) Scheme 1.

| TIME | $V(2)$ | $V(3)$ | $V(4)$ |
| :---: | :--- | :--- | :--- |
| $0.000 D+00$ | $0.000 D+00$ | $0.000 D+00$ | $0.000 D+00$ |
| $1.000 D-09$ | $2.936 D+00$ | $7.567 D-01$ | $4.444 D-13$ |
| $2.000 D-09$ | $3.604 D+00$ | $1.146 D+00$ | $3.726 D-12$ |
| $3.000 D-09$ | $3.733 D+00$ | $1.237 D+00$ | $1.144 D-11$ |
| $4.000 D-09$ | $3.749 D+00$ | $1.249 D+00$ | $2.403 D-11$ |
| $5.000 D-09$ | $3.751 D+00$ | $1.251 D+00$ | $4.161 D-11$ |
| $6.000 D-09$ | $3.749 D+00$ | $1.249 D+00$ | $6.187 D-11$ |
| $7.000 D-09$ | $3.750 D+00$ | $1.250 D+00$ | $8.020 D-11$ |
| $8.000 D-09$ | $3.750 D+00$ | $1.250 D+00$ | $9.850 D-11$ |
| $9.000 D-09$ | $3.749 D+00$ | $1.249 D+00$ | $1.172 D-10$ |
| $1.000 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.404 D-10$ |
| $1.100 D-08$ | $3.749 D+00$ | $1.249 D+00$ | $1.666 D-10$ |
| $1.200 D-08$ | $3.74 y D+00$ | $1.249 D+00$ | $1.958 D-10$ |
| $1.300 D-08$ | $3.750 D+00$ | $1.250 D+00$ | $2.281 D-10$ |
| $1.400 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $2.629 D-10$ |
| $1.500 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $2.919 D-10$ |
| $1.600 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $3.208 D-10$ |
| $1.700 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $3.498 D-10$ |
| $1.800 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $3.788 D-10$ |
| $1.900 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $4.077 D-10$ |
| $2.000 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $4.367 D-10$ |
| $2.100 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $4.656 D-10$ |
| $2.200 D-08$ | $3.751 D+00$ | $1.251 D+00$ | $4.946 D-10$ |
| $2.300 D-08$ | $3.750 D+00$ | $1.250 D+00$ | $5.236 D-10$ |
| $2.400 D-08$ | $3.750 D+00$ | $1.250 D+00$ | $5.525 D-10$ |
| $2.500 D-08$ | $3.750 D+00$ | $1.250 D+00$ | $5.815 D-10$ |

Table 6.4(c) Scheme 2.

| TIME | V (2) | V (3) | $V(4)$ |
| :---: | :---: | :---: | :---: |
| 000D+00 | $0.000 \mathrm{D}+00$ | $0.000 \mathrm{D}+00$ | $0.000 \mathrm{D}+00$ |
| .000D-09 | 2.936D+00 | 7.567D-01 | 4.444D-13 |
| .000D-09 | $3.604 \mathrm{D}+00$ | 1.146D+00 | 3.726D-12 |
| .000D-09 | $3.733 \mathrm{D}+00$ | 1.237D +00 | 1.144D-11 |
| . 000D-09 | $3.749 \mathrm{D}+00$ | 1. $249 \mathrm{D}+00$ | 2.403D-11 |
| .000D-09 | $3.751 \mathrm{D}+00$ | $1.251 D+00$ | 4. 161D-11 |
| .000D-09 | $3.749 \mathrm{D}+00$ | $1.249 D+00$ | 6.187D-11 |
| . 000D-09 | $3.7500+00$ | $1.2500+00$ | 8.0200-11 |
| . 000D-09 | 3.750D+00 | 1. $250 \mathrm{D}+00$ | 9.850D-11 |
| . 000D-09 | $3.749 \mathrm{D}+00$ | $1.249 \mathrm{D}+00$ | 1. 172D-10 |
| . 000D-08 | $3.749 \mathrm{D}+00$ | 1. $249 \mathrm{D}+00$ | 1.404D-10 |
| . 100D-08 | $3.749 \mathrm{D}+00$ | 1. $249 \mathrm{D}+00$ | $1.666 \mathrm{D}-10$ |
| . 200う-08 | $3.749 \mathrm{D}+00$ | $1.249 \mathrm{D}+00$ | 1.958D-10 |
| . 300D-08 | $3.750 \mathrm{D}+00$ | $1.2500+00$ | 2.281D-10 |
| . 400D-08 | $3.751 \mathrm{D}+00$ | 1. $251 \mathrm{D}+00$ | 2.629D-10 |
| . 500D-08 | $3.751 \mathrm{D}+00$ | $1.251 D+00$ | 2.919D-10 |
| . 6000-08 | $3.751 \mathrm{D}+00$ | 1. $251 \mathrm{~L}+00$ | 3.208D-10 |
| . 700D-08 | $3.751 \mathrm{D}+00$ | $1.251 D+00$ | 3.498D-10 |
| . 8000-08 | $3.751 \mathrm{t}+00$ | $1.2510+00$ | 3.788D-10 |
| . 900D-08 | $3.751 \mathrm{D}+00$ | $1.251 D+00$ | 4.077D-10 |
| . $000 \mathrm{D}-08$ | $3.751 \mathrm{D}+00$ | $1.251 \mathrm{t}+00$ | 4.367D-10 |
| . $100 \mathrm{D}-08$ | $3.751 \mathrm{D}+00$ | $1.251 D+00$ | 4.656D-10 |
| . 2000-08 | $3.751 \mathrm{t}+00$ | $1.251 D+00$ | 4.946D-10 |
| . 300D-08 | $3.7500+00$ | 1. $2500+00$ | 5.236D-10 |
| . $400 \mathrm{D}-08$ | $3.750 \mathrm{D}+00$ | 1. $2500+00$ | 5.525D-10 |
| . 500D-08 | $3.7500+00$ | 1. $250 \mathrm{D}+00$ | 5.815D-10 |

### 5.2. Latency Exploitation at the Vetwork Level

When the submatrices are large and the interconnection matrix is small and sparse, then latency exploitation at the subnetwork level provides most of the savings in CPU time. When the reverse is trua, that is, the submatrices are small and the interconnection matrix is large and relatively dense, then the latency exploitation at network level becomes important. Usually, the latter situation is true for MOS circuits.

Latency exploitation at the network level is equivalent to solving a swaller interconnection matrix by using voltage source substitution. From the substitution theorem we know that the same results will be obtained. This approach can be explained by the following example as shown in Fig. 6.5(a). Let us assume that at a particular time or a particular iteration, only subnetworks 5 and 6 are nonlatent, all the other subnetworks are latent. By using voltage source substitution, the network can be replaced by the equivalent network as shown in Fig. 6.6(b). The equivalent network is solved to obtain the solutions of all the nonlatent nodes (nodes which belong to nonlatent subnetworks). This equivalent network is obtained as follows. First, all the nonlatent subnetworks and all the latent subnetworks which are adjacent to the nonlatent subnetworks are included in the equivalent network; secondly, all the tearing nodes which only belong to latent subnetworks are replaced by voltage sources, the resulting network is the equivalent network. For this example, in the equivalent network, the nonlatent subnetworks are subnetworks 5 gnd 5 , the latent subnetworks are subnetworks 4 and 7 , the tearing nodes which are replaced by voltage sources are nodes 5 and 9. After the solution for all the nonlatent nodes is obtained, subnetworks 4 and 7 are checked to see if they remain latent. If the answer is yes, then the same equivalent circuit is
$\underbrace{2}$
(a) Fig. 6.6(a) A Chain of Inverters: Subnetworks 5 and 6 are nonlatent.
(b) Equivalent Network for (a).
used again. If the answer is no, then a new equivalent network is generated.

The above is the conceptual idea. In the implementation, because sparse matrix techniques are used, we do not want to really generate the equivalent network and we do not want to reorder the interconnection matrix and reconstruct the sparse matrix pointer systems everytime a new equivalent circuit is generated. So the following algorithm is implemented in SLATE.
(1) The ordering of the interconnection matrix is detemined in the preprocessing phase assuming all the subnetworks are nonlatent, and this ordering is used in the whole analysis.
(2) At every timepoint or iteration, all the subnetworks are checked to determine their latent status. All the tearing nodes which only belong to latent subnetworks are labelled as latent nodes.
(3) All the rows corresponding to the latent nodes are replaced by the branch constraint relations of grounded voltage sources. This is done by skipping these rows and columns during the $I U$ factorization and forward and backward substitutions.

Example 6.4: The MOS circuit shown in Fig. 3.18 was analyzed by SLATE with and without the latency exploitation at network level. Scheme 2 sas used. The output results are essentially the same for both approaches(whithin four significant figures). The simulation data for both approaches are given in Table 5.5. This example shows that the latency exploitation at network level also provides savings in CPU time.

Table 6.5 Simulation Data of a Transient Analysis for the MOS Circuit in Fig. 3.18.

| Transient <br> analysis | Scheme 2 with <br> latency exploitation <br> at network level | Scheme 2 without <br> latency exploitation <br> at network level |
| :--- | :--- | :--- |
| Total CPU time <br> (sec.) | 80.102 | 102.365 |
| Savings in <br> CPU time | $21.75 \%$ |  |

### 6.3. Discussion

Four schemes for latency exploitation at subnetwork level are proposed in this chapter. Scheme 0 , Scheme 1 and Scheme 2 were implemented and tested in the program SLATE. Scheme 3 is not compatible with the $F$ ant version of SLATE, so it is still at the development stage. From the simulation data obtained from the first three schemes, our conclusion is that Scheme 2 is the best of these three schemes. However, for bipolar circuits, Scheme 2 is not the most efficient one. Conceptually, Scheme 3 should be the optimal one, so more work will be devoted to study this scheme .

In order to illustrate the ideas and to estimate the inherent latency easily, chains of inverters are used as example circuits in this chapter. More complicated circuits are used in the next chapter to evaluate the latency approaches used in SLATE.

## VII. CONCLUSIONS

The examples used in Chapter 6 are chains of inverters. One example has eleven levels of inverters, the other has five levels of inverters. From the simulation data we can see that the latency exploitation increases with the number of levels of logic gates. Since the number of levels of logic gates for those circuits is large and those circuits have very simple interconnection networks, so significant latency exploitation was obtained. In this chapter, the simulation data for some circuits, which have a complicated interconnection network and for which the number of levels of logic gates is small, are presented to see if the latency approach can provide significant savings in CPU time for these circuits. The simulation data are compared with those obtained from our DEC-10 version of SPICE2.

Example 7.1: The TTL circuit shown in Fig. 3.17 was analyzed by SLATE and SPICE2. Scheme 2 was used in SLATE. The output results of SLATE and SPICE2 are essentially the same (within four significant figures). The simulation data of a transient analysis are given in Table 7.1. For this bipolar circuit example, a $32.66 \%$ latency exploitation was achieved and a 40.15\% savings in CPU time was obtained.

Example 7.2: The MOS circuit shown in Fig. 3.18 was analyzed by SLATE and SPICE2. Scheme 2 was used in SLATE. The output results of SLATE and SPICE2 are essentially the same (within four significant figures). The simulation data of a transient analysis are given in Table 7.2. For this MOS circuit example, a $22.53 \%$ latency exploitation was achieved and a 46.70\% savings in CPU time was obtained.

Table 7.1 Simulation Data of a Transient Analysis for the TTL Circuit in Fig 3.17

| Transient <br> analysis | SLATE | SPICE2 |
| :--- | :--- | :--- |
| 非 of subnetworks <br> times 非 of <br> iterations | 6426 |  |
| \# of nonlatent <br> subnetworks times <br> of iterations | 4327 |  |
| Latency exploitation | $32.66 \%$ |  |
| Total CPU time <br> (sec.) | 189.56 | 316.74 |
| Savings in <br> CPU time | $40.15 \%$ |  |

Table 7.2 Simulation Data of a Transient Analysis for the MOS Circuit in Fig. 3.18.

| Transient <br> analysis | SLATE | SPICE2 |
| :--- | :--- | :--- |
| \# of subnetworks <br> times 非 of <br> iterations | 3600 |  |
| \# of nonlatent <br> subnetworks times <br> 非 iterations | 2789 |  |
| Latency exploitation | $22.53 \%$ |  |
| Total CPU time <br> (sec.) | 72.23 | 135.52 |
| Savings in <br> CPU time | 46.70 |  |

Also these simulation data show that not only the latency approach but also other new approaches implemented in SLATE provide savings in CPU time. This observation is obtained by noting that the latency exploitation is smaller than the savings in CPU time. These other new approaches which also provide savings in CPU time are the new reordering scheme for the modified nodal approach presented in Chapter 2 and the piecewise nonlinear approach presented in Chapter 3. The new reordering scheme for the modified nodal approach avoids the problem of pivoting on zero diagonal elements and decreases the number of operations at the same time. However, the efficiency provided by the new reordering scheme is problem-dependent. For example, if the circuit does not have voltage sources or inductors, then certainly no efficiency can be obtained by using our approach. The piecewise nonlinear approach is still at the experimental stage. All the examples we have simulated show that the use of the piecewise nonlinear approach hastens the convergence and improves the global convergence property of the Newton-Raphson method for bipolar and MOS circuits. However, the proof of global convergence or the conditions for global convergence for the piecewise nonlinear approach has not been obtained. Further research is needed to prove the global convergence, or to modify the approach we proposed to ensure global convergence. Also more work is needed to study if the strict piecewise nonlinear approach is efficient, and if it is not efficient, then the problem of now to use the ideas of the piecewise nonlinear approach to hasten the convergence and to improve the global convergence property of the Newton-Raphson method should be studied. The solution of the two problems of numerical integration makes the program more reliable and more accurate. This is described in Chapter 4. An equation was presented to compute the upperbound on the local truncation
error (LTE) from the maximum global error ( $G E_{\text {max }}$ ) and the solution time $?$ The inaccuract in the estimation of the local truncation error caused by different timesteps was resolved by introducing a new fomula for the estimation. The inaccuracy in the estimation of the local truncation error caused by using the node voltages at timepoints of the previous switch interval was resolved by recognizing this situation and restarting the numerical integration from the breakpoint.

In Chapter 5, the ideas of tearing methods were detailed, the most efficient way of implementing the node tearing methoi was deternined theoretically and experimentally, and a cirouit interpretation of tearing methods was given.

In Chapter 6, four latency criteria schemes were proposed. The first three schemes: Scheme 0 , Scheme 1 and Scheme 2, were implemented and tested. From the simulation data we conclude that Scheme 2 is the best out of these three. Scheme 3 is still under investigation and we think it should be the best scheme to exploit latency. More work is needed to study how to implement this scheme efficiently and reliably, and to find out if it is really the best scheme.

The nested subnetwork approach [-41,42,43] is the approach which allows several levels of subnetworks and in which the latency approach is used at every level of the subnetworks. This approach may provide savings in the time spent in checking the latent status of subnetworks. Oniy latency exploitation at the network level is implemented in program SLATE and we believe that this checking time may be small, thus the savings in CPJ time provided by the nested subnetworks approach probably is not significant. However, further investigation is needed to yield conclusive results.
Device characteristic latency and function latency are two concepts which
may provide some more savings in CPU time. More investigations need to be
done to expioit these two latencies.
In the present version of SLATE, a lot of information which is not
needed is still stored because SLATE evolved from YSPICE. Due to this
reason, although tearing methods should provide savings in memory, no
comparison of memory usage was presented in this thesis.

## REFERENCES

[1] L. W. Nagel, "SPICE2: A Computer Program to Simulate Semiconductor Circuits," ERL Memo., No. ERL-M520, University of California, Berkeley, May 1975.
[2] L. W. Nagel and R. A. Rohrer, "Computer Analysis of Nonlinear Circuits, Excluding Radiation (CANCER)," IEEE J. Solid-State Circuits, Vol. SC-6, August i971, pp. 166-182.
[3] R. A. Rohrer, L. W. Nagel, R. G. Meyer, and L. Weber, "CANCER: Computer Analysis of Nonlinear Circuits, Excluding Radiation," Proc. 1971 IEEE International Solid-Etate Circuits Conference, Philadelphia, February 18, 1971, pp. 124-125.
[4] E. S. Jenkins and S. P. Fan, "TIME---A Nonlinear DC and Time-Domain Circuit-Simulation Program," IEEE J. Solid-State Circuits, Vol. SC-6, August 1971, pp. 188-192.
[5] "ASTAP General Information Manual (GH20-1271-0)," IBM Corp., Mechanicsburg, Pa.
[6] H. Qassemzadeh, and T. R. Scott, "Algorithm for ASTAP---A Network Analysis Program," IEEE Trans. on Circuit Theory, Vol. CT-20, November 1973, pp. 628-634.
[7] "ECAP II Application Description Manual (GH20-0983)," IBM Corp., Mechanicsburg, Pa.
[8] F. H. Branin, G. R. Hogsett, R. L. Lunde, and L. E. Kugel, "ECAP II--A New Electronic Circuit Analysis Program," IEEE L. Solid-State Circuits, Vol. SC-6, August 1971, pp. 146-165.
[9] B. Dembart and L. Milliman, "Circus-2, A Digital Computer Program for Transient Analysis of Electronic Circuits," Harry Diamond Laboratories, Washington, D. C., July 1971.

〔10] L. D. Milliman, W. A. Massena, and R. H. Dickhaut, "CIRC'JS--A Dizital Vomputer Program for Transient Anslysis of Electronic Circuits, "Harry Diamond Laboratory, Washington, D. C., Rep. AD-346-1, January 1967.

「11] P. R. Bryant, I. N. Hajj, S. Sxelboe and M. Vlach, "NATAND Primer," University of Naterl00, Report 73-4, June 1973.
[12] G. Kron, "A Set of Principles to Interconnect the Solation of Physical Systems," Journal of Applied Physics, Vol. 24, No. 3., August 1953, pp. 965-980.
[13] G. Kron, Diakoptics---Piecewise Solution of Large-Scale Systems, Macdonald, Lond on, 1963.
[14] H. H. Happ, Diakoptics And Networks, Academic Press, New York, 1971.
[15] F. i. Branin, "The relation between Kron's Method and the Olassical methods of Network Analysis," The Matrix and Tensor Quarterly, Vol. 12, No. 3, March 1962, pp. 69-1 15.
[16] L. O. Chua and L. K. Chen, "Diakoptics and Generalized Hybrid Analysis," IEEE Trans. on Circuits and Systems, Vol. CAS-23, No. 12, December 1976, pp. 694-705.
[17] F. F. Wh, "Solution of Large Scale Networks by Tearing," IEEE Trans. on Circuits and Systems, Vol. CAS-23, No. 12, December 1975, pp. 706-713.
[18] L. O. Chua and L. K. Chen, "Nonlinear Diakoptics," Proc. IEEE Int. Symp. on Circuits and Systems, April 1975, 0p. 373-375.
[19] K. U. Nang and T. Chao, "Diakoptics for Large Scale Nonlinear TimeVarving Networks," Proc. IEEE Int. Symp. on Circuits and Systems, April 1975, pp. 277-278.

〔20] A. Sangiovanni-Vincentelli, L. K. Chen, and L. O. Chua, "A New Tearing Approach---Node Tearing Nodal Analysis," Proc. IEEE Int. Symp. on Circuits and Systems, April 1977, pp. 143-147.
[21] P. Linardis, K. G. Nichols, and E. J. Zaluska, "Network Partitioning and Latency Exploitation in Time Domain Analysis of Nonlinear Electronic Circuits," Proc. IEEE Int. SFmp. on Circuits and Systems, May 1978, pp. 510-513.
[22] N. B. Rabbat and H. Y. Hsieh, "A Latent Macromodular Approach to Large-Scale Soarse Networks," IEEE Trans. On Circuits and Systems, Vol. CAS-22, No. 13, December 1976, pp. 745-752.
[23] H. Y. Hsieh and N. B. Rabbat, "Computer-Aided Design of Large Networks by Macromodular and Latent Techniques," IEEE Int. Symp. on Circuits and Systems, April 1977, pp. 688-592.
[24] N. B. Rabbat and H. Y. Hsieh, "Concepts of Latency in the Time-Domain Solution of Nonlinear Differential Equations," Proc. IEEE Int. Symp. on Circuits and Systems, May 1978, pp. 813-825.
[25] C. W. Ho, A. E. Muehli and P. A. Brennan, "The Modified Nodal Approach to Network Analysis," IEEE Trans. on Circuits and Systems, Vol. CAS-22, pp. 504-509, June 1975.
[26] A. R. Newton and D. O. Pederson, "A Simulation Program with Large Scale Integrated Circuit Emphasis," IEEE Int. Symp. on Circuits and Systems, New York, May 1978.
[27] G. Arnout and H. J. Deman, "The use of Threshold Functions and Boolean-Controlled Network Elements for Macromodelling of LSI Circuits," IEEE J. Solid-State Circuits, vol. SC-1 3, pp. 326-332, June 1978.
[28] C. A. Desoer and E. S. Kuh, Basic Circuit Theory, McGraw-Hill Book Co., New York, 1969.
[29] H. M. Markowitz, "The Elimination Form of the Inverse and its Application to Linear Programming," Management Sci., Vol. 3, pp. 255-269, 1957.
[30] I. N. Hajj, P. Yang and T. N. Trick, "Avoiding Zero Pivots in the Modified Nodal Approach," IEEE Trans. on Circuits and Systems, to appesr.
[31] W. J. McCalla and D. O. Pederson, "Elements of Computer-Aided Circuit Analysis," IEEE Trans, on Circuit Theory, Vol. CT-18, January 1971, pp. 14-26.
[32] R. D. Berry, "An Ootimal Ordering of Electronic Circuit Equations for a Sparse Matrix Solution," IEEE Trans. on Circuit Theory, Vol. CT-13, January 1971, pp. 40-50.
[33] W. H. Kao, "Comparison of Quasi-Newton Methods for the $D C$ analysis of Electronic Circuits," M. S. Thesis, University of Illinois at Urbana-Champaign, 1972.
[34] H. Gupta and J. Sharma, "An Algorithm for DC Solution of Electronic Circuits," IEEE Int. Fymp. on Circuits and Systems, July 1979.
[35] M. E. Daniel, "Development of Mathematical Models of Semiconductor Devices for Computer-Aided Circuit Analysis," Proceeding of the ITEE, Vol. 55, No. 11, Nov. 1967.
[36] L. O. Chua and P. M. Lin, Computer-Aided Analysis of Electronic Circuits: Algorithms and Computational Techniques. Englewood Cliff, New Jersey, Prentice-Hall, 1975.
[37] F. H. Branin, "A Sparse Matrix Modification of Kron's Method of Piecewise Analysis," Proc. IEEE Int. Symp. on Circuits and Systems, April 1975, pp. 383-386.
[38] I. N. Hajj, "Sparsity Considerations in Network Solution by Tearing," IEEP Trans. on Circuits and Systems, May 1980, pp. 357-366.
[39] A. R. Newton and D. O. Pederson, "A Simulation Program with Large Scale Integrated Circuit Emphasis," IEEE Int. Symp. on Circuits and Systems, New York, May 1978.
[40] J. Katzenelson, "An Algorithm for Solving Nonlinear Resistive Network", Bell Syst. Tech. J., Vol. 44, pp. 1605-1520, Oct. 1965.
[41] A. E. Ruehli, A. L. Sangiovanni-Vincentelli and N. B. Rabbat," Time Analysis of Large-Scale Circuits Containing One-iay Macromodels," Proc. IEEE Int. Symp. on Circuits and Systems, April 1980, pp. 766-770.
[42] N. B. Rabbat, A. L. Sangiovanni-Vincentelli and H. Y. Hsieh, "A Maltilevel Newton Algorithm with Macromodeling and Latency for the Analysis of Large-Scale Nonlinear Circuits in the Time Domain", IEEE Trans. on Circuits and Systems, Vol. CAS-26, po. 733-741, Sept. 1979.
[43] H. Y. Hsieh and N. B. Rabbat, "Maltilevel Newton Algorithm for Nested Macromodel Analysis of Bipolar Networks", Proc. IEEE Int. Symp. on Circuits and Systems, April 1980, pp. 762-765.

## VITA

Ping Yang was born in Ping-Tung, Taiwan, China on July 15, 1952. He attended National Taiwan University in Taipei and Graduated in June 1974 With a Bachelor of Science in Physics. He was an Ensign in the Ohinese Navy from 1974 to 1976. In August 1976 he entered the University of Illinois. From August 1975 to August 1980 he held a teaching assistantship in the Electrical Engineering department and a research assistantshio with the Coordinated Science Laboratory. His research interests are in the areas of computer-aided design, circuits and systems, large-scale integrated circuits, and solid-state electronics.


