# Design of a 2.5 GHz Continuous-Time Equalizer

# and

# Design of a 10 GHz Eye Opening Monitor

A Project Report

submitted by

# JITHIN JANARDHAN

in partial fulfilment of the requirements for the award of the dual degree of

BACHELOR OF TECHNOLOGY and MASTER OF TECHNOLOGY



# DEPARTMENT OF Electrical Engineering INDIAN INSTITUTE OF TECHNOLOGY, MADRAS.

July 15, 2009

# THESIS CERTIFICATE

This is to certify that the thesis titled **Design of a 2.5 GHz Continuous-Time Equalizer and Design of a 10 GHz Eye Opening Monitor**, submitted by **Jithin Janardhan**, to the Indian Institute of Technology, Madras, for the award of the degree of **Bachelor of Technology** and **Master of Technology**, is a bona fide record of the research work done by him under our supervision. The contents of this thesis, in full or in parts, have not been submitted to any other Institute or University for the award of any degree or diploma.

#### Dr. Shanthi Pavan

Project Advisor Assistant Professor Dept. of Electrical Engineering IIT-Madras, 600 036

Place: Chennai Date: June 2, 2009

# ACKNOWLEDGEMENTS

I would like to thank Dr. Shanthi Pavan, for guiding me not only through this project but also through the various Analog design courses over the past few years which led me to pursue my interests in the field of Analog design. I also thank Dr. Nagendra Krishnapura for his courses and for guidance during my project work.

I thank my lab mates who have enriched my time here with stimulating discussions and have provided much needed company and support. I also wish to thank T.Prabu Sankar who has been very helpful with various aspects of my project work.

Finally, I would like to thank my parents who have been a moral support all through my student life.

# Abbreviations

| CTE            | Continuous Time Equalizer   |
|----------------|-----------------------------|
| $\mathbf{SNR}$ | Signal to Noise Ratio       |
| MMSE           | Minimum Mean Square Error   |
| DAC            | Digital to Analog Converter |
| ADC            | Analog to Digital Converter |

# ABSTRACT

This report describes two projects which were undertaken as part of my Dual Degree Project. The first involves the design of a Continuous Time Equalizer operating at a data rate of 2.5Gbps. The equalizer has been designed in UMC CMOS  $0.18 \,\mu\text{m}$  technology with a supply voltage of  $2.5 \,\text{V}$ . The equalizer occupies the entire area of a 1mm X 1mm chip. The complete design and simulation results are described in this report.

The second project involves the design of an Eye Opening Monitor system to reconstruct the output eye of high speed equalizers. It was designed to work at a data rate of 10 GHz. UMC CMOS  $0.13 \,\mu$ m technology with a supply voltage of  $1.2 \,\text{V}$  was used for the design. Schematic level simulations of high speed blocks like the multiphase clock generator and clocked comparator have been successfully carried out. Some system level simulations were also performed and are described in this report.

# TABLE OF CONTENTS

| A        | CKN            | OWLEDGEMENTS                                                                                 | i        |
|----------|----------------|----------------------------------------------------------------------------------------------|----------|
| A        | BSTI           | RACT                                                                                         | iii      |
| LI       | ST (           | OF FIGURES                                                                                   | 1        |
| 1        | The            | Continuous Time Equalizer - Introduction                                                     | <b>2</b> |
|          | 1.1            | Continuous Time Equalizer                                                                    | 2        |
|          |                | 1.1.1 Advantages                                                                             | 3        |
|          |                | 1.1.2 Implementation $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ | 3        |
|          | 1.2            | Specifications                                                                               | 3        |
| <b>2</b> | The            | MMSE algorithm and MATLAB simulation                                                         | <b>5</b> |
|          | 2.1            | MMSE algorithm                                                                               | 5        |
|          | 2.2            | MATLAB simulation                                                                            | 6        |
| 3        | $\mathbf{Des}$ | ign of the Equalizer                                                                         | 7        |
|          | 3.1            | Tunable weights                                                                              | 7        |
|          |                | 3.1.1 Number of tuning bits                                                                  | 7        |
|          |                | 3.1.2 Design of the tunable weights                                                          | 8        |
|          | 3.2            | Filter design                                                                                | 12       |
|          |                | 3.2.1 Design considerations                                                                  | 12       |
|          |                | 3.2.2 Area efficient differential inductors                                                  | 13       |
|          |                | 3.2.3 Inductor parasitics and compensation of series loss $\ldots$                           | 13       |
|          |                | 3.2.4 Input transconductors                                                                  | 15       |
| 4        | Mis            | cellaneous Circuit Blocks                                                                    | 17       |
|          | 4.1            | Bias generation                                                                              | 17       |

|          | 4.2 | Output amplifier                                                                | 17        |
|----------|-----|---------------------------------------------------------------------------------|-----------|
| <b>5</b> | CT  | E Simulation and Results                                                        | <b>21</b> |
|          | 5.1 | MMSE algorithm including the responses of the tunable weights                   | 21        |
|          | 5.2 | Post layout extraction simulation results                                       | 21        |
| 6        | Equ | alizer response measurements                                                    | 22        |
|          | 6.1 | Measuring the response of the Equalizer                                         | 22        |
|          |     | 6.1.1 Test buffer design                                                        | 24        |
|          |     | 6.1.2 Simulation results                                                        | 25        |
|          | 6.2 | Measuring the response of the channel                                           | 25        |
|          |     | 6.2.1 Computing the Z parameters of a 2-port network from $z_{in}$ measurements | 27        |
|          |     | 6.2.2 Simulation results                                                        | 28        |
|          | 6.3 | Serial input of weight tuning bits                                              | 28        |
|          | 6.4 | On-chip PRBS for testing                                                        | 29        |
| 7        | Eye | e Opening Monitor - Introduction                                                | 31        |
|          | 7.1 | The Eye Opening Monitor System                                                  | 31        |
|          | 7.2 | System parameters and MATLAB simulation                                         | 34        |
|          |     | 7.2.1 System parameters                                                         | 34        |
|          |     | 7.2.2 System simulation in MATLAB                                               | 34        |
| 8        | Des | sign of the Clocked Comparator                                                  | 38        |
|          | 8.1 | Circuit implementation                                                          | 38        |
|          | 8.2 | Schematic simulation results                                                    | 40        |
| 9        | Des | sign of the Multiphase clock generator                                          | 42        |
|          | 9.1 | Generation of coarsely spaced clock phases                                      | 42        |
|          |     | 9.1.1 Bandwidth enhancement                                                     | 42        |
|          |     | 9.1.2 Delay cell chain                                                          | 43        |
|          | 9.2 | Phase Interpolator design                                                       | 44        |
|          |     | 9.2.1 Design 1 - eliminated                                                     | 44        |

|              |      | 9.2.2             | Design 2 - chosen for this work                                                 | 45 |
|--------------|------|-------------------|---------------------------------------------------------------------------------|----|
|              |      | 9.2.3             | Final clock phase interpolator circuit                                          | 47 |
|              | 9.3  | 8:1 mu            | ltiplexer design                                                                | 48 |
|              |      | 9.3.1             | Compensating the effects of feed-through                                        | 49 |
|              |      | 9.3.2             | Generation of two coarse phases from the multiplexer output                     | 50 |
|              | 9.4  | Simula            | tion results                                                                    | 51 |
|              |      | 9.4.1             | Schematic                                                                       | 51 |
|              |      | 9.4.2             | Layout extracted                                                                | 51 |
| 10           | Syst | em Si             | mulations                                                                       | 53 |
|              | 10.1 | Contro            | ol block                                                                        | 53 |
|              | 10.2 | Ideal H           | Eye Opening Monitor in Cadence                                                  | 54 |
|              |      | 10.2.1            | Results                                                                         | 55 |
|              | 10.3 | Ideal H<br>genera | Eye Opening Monitor in Cadence with real multiphase clock<br>tor and comparator | 55 |
|              |      | 10.3.1            | Results                                                                         | 56 |
| 11           | Futu | ıre wo            | rk                                                                              | 59 |
|              | 11.1 | Eye op            | bening monitor system - remaining work                                          | 59 |
| $\mathbf{A}$ | Pin  | Detail            | s of the Continuous Time Equalizer Chip                                         | 60 |

# LIST OF FIGURES

| 1.1 | Equalizer implemented using a ladder filter and its dual                                                                                                                                        | 4  |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.1 | Output eye diagram (with an ideal model of a loss compensated filter) : $\tau = \frac{12}{16}T_b$ and $t_0 = 0.156T_b$                                                                          | 6  |
| 3.1 | Simple differential pair based tunable weight - with 3bit + 1 sign bit tuning                                                                                                                   | 9  |
| 3.2 | Cascoded differential pair based tunable weight - with 5bit + 1 sign bit tuning                                                                                                                 | 10 |
| 3.3 | $2.5 \text{ V}$ supply, cascoded differential pair based tunable weight - with $5bit + 1$ sign bit tuning $\ldots \ldots \ldots$ | 11 |
| 3.4 | 0/1.8 V to $0.9/2.5$ V level shifter                                                                                                                                                            | 12 |
| 3.5 | An area efficient fully differential inductor                                                                                                                                                   | 14 |
| 3.6 | Model of an on-chip inductor                                                                                                                                                                    | 14 |
| 3.7 | The input transconductor circuit                                                                                                                                                                | 16 |
| 3.8 | Filters after compensation                                                                                                                                                                      | 16 |
| 4.1 | The bias generation circuit block                                                                                                                                                               | 19 |
| 4.2 | The output amplifier                                                                                                                                                                            | 20 |
| 6.1 | Test setup in literature                                                                                                                                                                        | 23 |
| 6.2 | New test setup                                                                                                                                                                                  | 24 |
| 6.3 | Test buffer circuit diagram                                                                                                                                                                     | 25 |
| 6.4 | Measured and actual equalizer response magnitude                                                                                                                                                | 26 |
| 6.5 | Measured and actual equalizer response phase                                                                                                                                                    | 27 |
| 6.6 | Setup for measuring channel response shown with a model channel                                                                                                                                 | 28 |
| 6.7 | Measured and actual channel response magnitude                                                                                                                                                  | 29 |
| 6.8 | Measured and actual channel response phase                                                                                                                                                      | 30 |
| 7.1 | Block diagram of the eye opening monitor                                                                                                                                                        | 33 |

| 7.2  | The actual eye diagram at the channel output $\ldots \ldots \ldots$                                                                                           |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 7.3  | The $32x32$ eye diagram image obtained directly from the eye open-<br>ing monitor output (replicated to show 2 cycles) $\ldots \ldots \ldots$                 |
| 7.4  | The 125x125 eye diagram image obtained after bi-linear interpola-<br>tion (replicated to show 2 cycles)                                                       |
| 7.5  | The $125x125$ eye diagram image obtained after bi-cubic interpola-<br>tion (replicated to show 2 cycles) $\ldots \ldots \ldots \ldots \ldots \ldots \ldots$   |
| 8.1  | The CML D-latch                                                                                                                                               |
| 8.2  | The first stage of the clocked comparator                                                                                                                     |
| 8.3  | The clocked comparator                                                                                                                                        |
| 9.1  | The delay cell                                                                                                                                                |
| 9.2  | The delay cell chain - generates 8 coarse clock phases $\ldots$ .                                                                                             |
| 9.3  | Clock phase interpolator - Design 1                                                                                                                           |
| 9.4  | Results for design 1 with ideal input coarse phase waveforms $\ .$ .                                                                                          |
| 9.5  | Clock phase interpolator - Design 2                                                                                                                           |
| 9.6  | Results for design 2 with ideal input coarse phase waveforms $\ .$ .                                                                                          |
| 9.7  | The final clock phase interpolator circuit                                                                                                                    |
| 9.8  | 2:1 multiplexer cell                                                                                                                                          |
| 9.9  | Final 2:1 multiplexer cell with feed-through compensation $\ldots$                                                                                            |
| 9.10 | Multiphase clock generation - results for schematic block simula-<br>tion: 2 sets of adjacent interpolated phases                                             |
| 10.1 | Ideal setup: The actual eye diagram at the channel output                                                                                                     |
| 10.2 | Ideal setup: The 125x125 eye diagram image obtained using the better method of numerical differentiation and bi-cubic interpolation (length = Tclk x $1.12$ ) |
| 10.3 | Ideal multiphase clock generator and real comparator: The $221 \times 125$<br>eye diagram image obtained after bi-cubic interpolation (length = Tclk x 1.12)  |
| 10.4 | Real multiphase clock generator and comparator: The 221x125 eye diagram image obtained after bi-cubic interpolation (length approx.<br>= Tclk x $0.8$ )       |
| A.1  | Pin diagram of the Continuous Time Equalizer chip                                                                                                             |

# CHAPTER 1

# The Continuous Time Equalizer - Introduction

High speed data links operating at multi-gigahertz frequencies pose a challenge in the form of equalization. (A vs D) Traditional analog implementations use transmission line delays to generate a set of taps which can be weighed appropriately to achieve equalization. These implementations have several problems like inefficient power usage due to transmission line nonidealities and double ended termination.

[1] proposes a Continuous Time Equalizer which uses a more suitable basis set of tap impulse responses. The state impulse responses of a singly terminated LC ladder are used in this implementation. In order to understand the idea behind such a setup, the construction of the net equalizer response from the tap impulse responses can be analyzed. In a delay line based implementation, the "delay cells" have sharp impulse responses with bandwidths greater than the data rate. These responses are weighed appropriately to get the equalizer response. In reality, we can use any set of impulse responses whose linear combination can give the required range of equalizer responses without having a significant length of ringing. Therefore, the Continuous Time Equalizer makes use of the state impulse responses of a singly terminated LC ladder. The following section describes the basic working of such a setup and its advantages over a transmission line based equalizer.

# 1.1 Continuous Time Equalizer

The Continuous Time Equalizer makes use of the state impulse responses of a singly terminated LC ladder as basis functions for building the equalizer response. The first five states are used to build a 5 tap CTE.

#### 1.1.1 Advantages

The CTE has many advantages over a transmission line filter based equalizers.

- 1. The bandwidth of the filter are much lower than the bandwidths of the transmission lines required in case of a delay line based structure as the impulse responses span many bit intervals.
- 2. No explicit anti-aliasing filter is required as the filter itself has a low bandwidth.
- 3. Since the ladder is singly terminated, there is no voltage reduction.
- 4. Input capacitances of the weighing transconductors which degraded the responses of the transmission line filters are not a problem here as they can be absorbed into the ladder capacitances.

#### 1.1.2 Implementation

The main blocks in the CTE are the filter and the tunable tap weights. A weighted summation of the state variables of the filter is required. Tapping the capacitor voltages and weighing and summing them is easy and can be carried out in the current domain by using tunable transconductors. Tapping the inductor currents is not straightforward. A dual ladder is proposed as a solution to this problem in [1]. A dual ladder would have the roles of current and voltage interchanged. The inductor currents on the main ladder would correspond to capacitor voltages on the dual. These can be easily tapped off and weighed. Figure 1.1 shows the equalizer structure implemented using the ladder and its dual. The capacitor voltages which are tapped off correspond to the required state variables. These are weighed using tunable transcondcutors and their currents are summed and converted to a voltage across the load resistor.

Practical aspects of the design have been detailed in chapter -.

# **1.2** Specifications

A CTE was designed to meet the following specifications:-



Figure 1.1: Equalizer implemented using a ladder filter and its dual

Table 1.1: Design specifications

| Data Rate                    | $2.5 \mathrm{Gbps}$    |  |  |  |
|------------------------------|------------------------|--|--|--|
| Number of taps               | 5                      |  |  |  |
| Input swing                  | 600mV p-p differential |  |  |  |
| SNR(including thermal noise) | 20 dB                  |  |  |  |
| SNR(w/o thermal noise)       | 27 dB                  |  |  |  |

# CHAPTER 2

# The MMSE algorithm and MATLAB simulation

The CTE sets tap weights such that the mean square error between the equalizer output and the transmitted symbol is minimized. The optimal tap weights and the minimum mean square error can be estimated via the expressions given in [1] and shown in the section below.

# 2.1 MMSE algorithm

The output of the equalizer is

$$y(t) = \sum_{i=1}^{N} \sum_{k=-\infty}^{\infty} w_i a(k) c_i(t - kT_b) + \sum_{i=1}^{N} w_i(n(t) * x_i(t))$$
(2.1)

where  $c_i(t) = p(t) * x_i(t)$ . If this output is sampled at  $nT_b + t_0$  ( $t_0$  is the sampling instant with the input data rising edge as a reference), the samples are given by

$$y(n) = \sum_{i=1}^{N} \sum_{k=-\infty}^{n} w_i a(k) c_i (nT_b + t_0 - kT_b) + \sum_{i=1}^{N} w_i (n(t) * x_i(t))|_{t=nT_b + t_0}$$
(2.2)

This can be cast into matrix form as follows

$$y(n) = a^T(n)Cw + \eta^T(n)w$$
(2.3)

Let  $\delta_{opt}$  be the minimum diagonal element of  $(I - CA^{-1}C^T)$ 

where

$$A = C^T C + M/((\sigma)^2)$$

and  $h_{\delta}$  is a vector of 0's with 1 at the delta position

The optimal weight vector which results in minimum mean square error is given by

$$w_{opt} = A^{-1}C^T h_\delta \tag{2.4}$$

# 2.2 MATLAB simulation

The CTE was simulated in MATLAB. To model the filter, an ABCD matrix corresponding to the compensated non-ideal filter was used. The channel is a PMD channel with an impulse response of the form  $\delta(0) + \delta(\frac{k}{16}T_b)$  where k is an integer from 1-15 and  $T_b$  is the bit period. The optimal weight vector was found using the MMSE algorithm. The output eye diagram for a particular case is shown in figure 2.1.



Figure 2.1: Output eye diagram (with an ideal model of a loss compensated filter) :  $\tau = \frac{12}{16}T_b$  and  $t_0 = 0.156T_b$ 

# CHAPTER 3

# Design of the Equalizer

# 3.1 Tunable weights

Five tunable weights scale the five outputs of the filter and provide output currents corresponding to these scaled values. The output currents are summed up by feeding them into a resistive load. Therefore, the weight blocks are transconductors whose transconductance can be changed using control bits.

### 3.1.1 Number of tuning bits

The weights required to equalize channels as found by the MMSE algorithm span a continuous range. However, in practice, we can only implement a discrete set of weight values. The number of levels necessary can easily be found in MATLAB by quantizing the weights given by the algorithm and finding the minimum number of levels necessary to achieve the required SNR. An SNR of 27dB was aimed at. MATLAB simulations revealed that 5 bits of quantization would suffice. This number can be understood loosely in the following manner. The "quantization noise" due to the quantization of the tap weights is  $(fullscale/2^n)/12$ . For 5 bits, this corresponds to around 26dB SNR. All the five taps add quantization noise and scale the signals by different amounts and the analysis is quite complicated. However, in a loose manner, we can get a feel for the minimum number of bits required. The actual SNR obtained from MATLAB simulations with infinite bandwidth weights and 5 bits of quantization was around 31dB.

### 3.1.2 Design of the tunable weights

The tunable weights were designed as a set of binary weighted differential pair transconductors. The tunable weights were designed keeping in mind these important factors.

- Proper variation of gain value with bit setting
- Linearity of the transconductance for a  $\pm 150 \,\mathrm{mV}$  differential input
- Bandwidth around 1.25 GHz

Initial design used simple differential pairs with PMOS loads which were biased using a common mode biasing circuit as shown in figure 3.1. Only a 3 bits plus 1 sign bit based tuning was implemented in this design. The problem with this design was that with varying loads, the total transconductance would change as required, but the output impedance would change too as different number of differential pairs got turned on. Thus, the effective weight values achieved were not a linearly varying set. They reduced significantly from their expected values as the weight increased since the output impedance decreased.

The linearity of the transconductance was maintained for the full swing of  $\pm 150 \,\mathrm{mV}$  differential swing. The overdrives of the input transconductors were designed to be 400 mV at the tt, 70°C process corner. With a simple differential pair, keeping the transistors in saturation with this overdrive was not difficult.

In order to overcome the problem posed by the varying output impedance, cascode NMOS and PMOS devices were added to the differential pairs in order to increase the output impedance to a value significantly greater than the load resistance so that the variation in the output impedance would not affect the gain. The circuit diagram is shown in figure 3.2. In order to maintain linearity, a high overdrive was required in the input pair. However, the cascode transistors could not be sized up to reduce their overdrives as it would lead to a loss of bandwidth. With a supply voltage of 1.8 V, careful design and biasing was necessary in order to enable the functioning of this tunable weight while meeting the requirements



Figure 3.1: Simple differential pair based tunable weight - with 3bit + 1 sign bit tuning

mentioned earlier. The tail current transistors and the PMOS transistors were scaled up significantly to accommodate the overdrives of all the devices with a 1.8 V supply (3.2). The bias generation circuit is discussed in -. This tunable weight design achieved a linear variation in gain with increasing weight setting.

The transconductance corresponding to the two lower bits were initially implemented using a single differential pair. It was found that the impulse response due to this pair did not match that of the other binary weighted pairs. The same impulse response is required for all bit settings for the MMSE algorithm. So, two differential pairs feeding into a common cascode NMOS transistor pair was used (3.2).

The sign bit was implemented using differential NMOS switches (3.2) as the input common mode voltage was changed to 0.9 V.

The design details are given in table 3.1.

**Post layout problems:** After the complete layout of the five tunable weight block setup, it was found that the bandwidth at the output dropped by a factor



Figure 3.2: Cascoded differential pair based tunable weight - with 5bit + 1 sign bit tuning

Table 3.1: Design details

| Tuning          | 5  bit + 1  sign bit |
|-----------------|----------------------|
| Current         | $45uA \times 0-7$    |
| Load resistance | $448\Omega$          |

of more than two. So, in order to increase the bandwidth, the load resistance was scaled down and an output amplifier was added to make up for the lost gain. This amplifier is discussed in -. However, it was found that the bandwidth of the weights was around 1.1 GHz. Although this was not low enough to cause any drop in SNR, it was not satisfactory. Moreover, a full recovery of the amplitude to 600 mV p-p differential at the output of the amplifier was not possible again due to bandwidth limitations.

Shift to a 2.5 V supply: In order to make a better design bandwidth-wise, it was decided to change the supply voltage to 2.5 V. This enabled the scaling up of the overdrives in the differential pairs thus increasing the bandwidth. A small increase in output amplitude was also achieved. The differential pairs were scaled in the following manner:

• Overdrive - scaled up by 4/3 times

- Tail current scaled up by 16/9 times
- Thus, transconductance increased by 4/3 times

This increase in transconductance was distributed between increasing gain and increasing bandwidth. A final bandwidth of 1.5 Ghz was achieved. The circuit diagram is shown in figure 3.3. The sign bit switches were changed to NMOS versions. In order to operate the transconductors in the differential pairs working as switches, a 1.8 V level was necessary in order to bias at a safe level from break-down. The PMOS switches however, needs a high voltage level of 2.5 V. Therefore, voltage levels of 0.9V and 2.5 V were used for the PMOS switches. Level shifters were necessary in order to convert the 0/1.8 V levels to 0.9/2.5 V levels. The level shifter circuit diagram is shown in figure 3.4.

The final design details are given in table 3.2



Figure 3.3: 2.5 V supply, cascoded differential pair based tunable weight - with 5bit + 1 sign bit tuning

Table 3.2: Design details of the 2.5 V supply based tunable weight

| Tuning          | 5  bit + 1  sign bit   |
|-----------------|------------------------|
| Current         | $80\mu A \times 0 - 7$ |
| Load resistance | $448\Omega$            |



Figure 3.4: 0/1.8 V to 0.9/2.5 V level shifter

# 3.2 Filter design

A seventh order Butterworth filter with a 3dB bandwidth of 1.25 GHz is used in this design. The first five states of this filter are used as taps for the CTE. The last two states have larger bandwidths and are thus less suitable for use. The component values for both ladders are listed in table 3.3

Table 3.3: Component values for the ladders: (capacitance in pF and inductance in nH)

|        | C1   | L1    | C2   | L2   | C3    | L3   | C4    | L4  |
|--------|------|-------|------|------|-------|------|-------|-----|
| Ladder | 3.96 | 11.44 | 4.24 | 8.88 | 2.688 | 4.16 | 0.568 | NA  |
| Dual   | 4.6  | 9.92  | 3.56 | 10.5 | 1.672 | 6.68 | NA    | 1.4 |

#### 3.2.1 Design considerations

The first design issue that needs to be taken care of in the design of the filter is whether to use an active filter or a passive one. Power and area consumption were used as criteria to decide this matter.

The filter is implemented as a fully differential setup. It is is driven with a 600 mV p-p differential input. The terminating resistance is  $100\Omega$  differentially. The power consumption in the resistors is  $((0.3)^2/100)$ W. Transconductors would be required at the input of each ladder in order to provide a high input impedance.

These would have differential transconductances of 10 mS. The power consumed in these would be greater than that in the resistors as linearity is required. The inductor values are all of the order of a few nanohenries. An active filter implementation with the same impedance level would not require a separate dual ladder as the inductor currents are already available as capacitor voltages. However, thirteen transconductors, each having a differential transconductance of 10 mS would be required considering four transconductors per biquad section. The capacitors corresponding to the inductors would have capacitances of a few picofarads. If the impedance level is scaled up to reduce power consumption, the capacitance values would be too small. Therefore, a passive filter implementation was decided upon.

### 3.2.2 Area efficient differential inductors

The area of the inductors can be reduced significantly by making use of the mutual inductance between the corresponding inductors on each differential arm. At any instant, the currents flowing through the corresponding inductors are equal in magnitude and opposite in direction. A structure which effectively couples the fluxes of these inductors constructively would enable reduction of area. Figure 3.5 shows the schematic and layout of a fully differential inductor as proposed in [1]. This layout ensures maximum coupling and is thus very area effective.

#### 3.2.3 Inductor parasitics and compensation of series loss

The spiral inductors implemented on chip have many parasitics. A sufficiently accurate model of an inductor is shown in figure 3.6. The total capacitance can be split and lumped at both ends. The capacitance is not a problem as it can be absorbed into the capacitance at that particular node.

The series resistance is a problem as it degrades the responses of the filter states. The main observed effect is an increase in ringing in the state impulse responses of the filter beyond the equalizer span, which leads to a reduction in



Figure 3.5: An area efficient fully differential inductor



Figure 3.6: Model of an on-chip inductor

SNR. The series resistance can be minimized by increasing the widths of the metal strips constituting the turns of the inductor. This, however, leads to an increase in the area of the inductor and an increase in the parasitic capacitance. The quality factor of an inductor is a measure of its lack of loss. It is defined as  $Q = 2\pi f L/r_{ser}$  where L is the inductance and  $r_{ser}$  is the series resistance. The inductors required for this filter were designed to have Q = 5 at f = 1.25 GHz, the bandwidth of the filter. The optimal dimensions were found using ASITIC<sup>®</sup>. These are listed in table 3.4.

The effect of inductive loss can be sufficiently compensated by adding shunt capacitive loss [1]. The quality factor of a capacitor is defined as  $Q = 2\pi f C/G_{shunt}$ where C is the capacitance and  $G_{shunt}$  is the conductance in shunt with the capacitor. The value of shunt resistance that will achieve compensation is  $R_{shunt} =$ 

|                                            |        | Ladder |       | Dual  |        |       |       |
|--------------------------------------------|--------|--------|-------|-------|--------|-------|-------|
|                                            | L1     | L2     | L3    | L1    | L2     | L3    | L4    |
| Inductance (nH)                            | 11.44  | 8.88   | 4.16  | 9.92  | 10.5   | 6.68  | 1.4   |
| Series resistance ( $\Omega$ ) for $Q = 5$ | 17.97  | 13.95  | 6.53  | 15.6  | 16.5   | 10.5  | 2.2   |
| Side width $(\mu m)$                       | 176    | 165.1  | 169   | 166   | 170    | 162.2 | 148.3 |
| Line width $(\mu m)$                       | 4      | 4      | 6     | 4     | 4      | 5     | 11    |
| Spacing $(\mu m)$                          | 1      | 1      | 1.6   | 1     | 1      | 1     | 1.5   |
| No. of turns                               | 6.5    | 5.5    | 3.5   | 6.5   | 6.5    | 5.5   | 2.5   |
| Achieved inductance (nH)                   | 11.441 | 8.882  | 4.157 | 9.937 | 10.531 | 6.685 | 1.400 |
| Achieved series resistance $(\Omega)$      | 16.4   | 13.8   | 6.462 | 15.1  | 15.62  | 10.04 | 2.197 |

Table 3.4: Inductor layout dimensions

 $\frac{L}{C \times r_{ser}}$ . The final filter after introducing shunt capacitive loss and absorbing the parasitic capacitance into the node capacitance values is shown in figure 3.8. The parasitic capacitance of the input transconductor described below is also absorbed.

### **3.2.4** Input transconductors

The filter inputs are converted to norton equivalents by adding a transconductor at the input and changing the connection of the terminating resistor. These transconductor blocks provide a very high input impedance. The design of these transcondcutors is described here.

The transconductors were designed to have a transconductance of 20mS or a differential transconductance of 10mS. A simple NMOS differential pair with PMOS loads was used. A CMFB circuit provides CMFB bias to the PMOS gates. The circuit diagram is shown in figure 3.7. The output resistance and the output capacitance of this circuit block will change the transfer function of the filters. The output conductance can be absorbed into the norton resistance. The output capacitance can be absorbed into the norton resistance. The output capacitance can be absorbed into the capacitance at the same node in the ladder. In the dual filter, this capacitance will remain as an extra parasitic capacitance at the output of the transconductor along with the parasitic capacitance due to the inductor connected to the same node. The single ended output resistance of the transconductor is  $362.5\Omega$  at the tt,  $70^{\circ}$ C corner. Thus, the resistor at its output was increased to  $58\Omega$  so that the net single ended norton resistance is  $50\Omega$ . This resistor can vary with process and therefore, a provision for tuning the resistor load using two bits is incorporated in the design.



Figure 3.7: The input transconductor circuit

The final filter is shown in figure 3.8.



Figure 3.8: Filters after compensation

# CHAPTER 4

# **Miscellaneous Circuit Blocks**

This chapter details the design of some of the blocks that are involved in the full design of the CTE.

### 4.1 Bias generation

The bias generation block generates the bias voltages used to bias the tail currents of various blocks like the tunable weights, the CMFB op-amp, the final amplifier, the input transconductors etc. It also generates the 0.9 V voltage level which is the low logic level for the PMOS switches in this design. The circuit diagram of this block is shown in figure 4.1.

The cascode bias voltages for the tunable weights are generated such that all transistors are biased with a  $|V_{ds}|$  close to their overdrive voltages with just enough room for the nodes to swing. Transistors Mvt1, Mvt2 and Mvt5 are large transistors drawing small currents such that they have a  $|V_{gs}|$  almost equal to the threshold voltage  $|V_T|$ . The bulk nodes of these transistors are connected to their sources so that their  $V_T$  is same as that of the transistors whose  $V_T$  they are trying to mimic. These other transistors are thus biased with a  $|V_{ds}|$  almost equal to their overdrive voltages.

# 4.2 Output amplifier

The output amplifier serves to increase the output amplitude of the equalizer. It is a simple differential pair circuit with resistor loads. The tail current and resistor value set the output common mode voltage to 1.25 V. The circuit diagram of this block is shown in figure 4.2.

The design of this amplifier posed challenges in the form of a trade-off between gain and bandwidth. Since the input capacitance of the test buffers loading the amplifier was made small, the amplifier does not drive a heavy load. In a complete receiver, the equalizer output would drive a sampler. Another buffer or amplifier could be added in such a case.

As mentioned earlier, the tail current and resistor value set the output common mode voltage. This should not vary with process as we need the output common mode to be equal to the input common mode so that both test buffers have the same input common mode voltage. Therefore, a provision for tuning the resistor load using two bits is incorporated in the design. This allows the resistance to be set to the nominal value despite process variations and ensures that the output common mode voltage does not vary significantly from its nominal value.



Figure 4.1: The bias generation circuit block 19



Figure 4.2: The output amplifier

# CHAPTER 5

# **CTE** Simulation and Results

# 5.1 MMSE algorithm including the responses of the tunable weights

The response of the tunable weights has to be included along with the impulse responses of the filter taps while working with the simulation of real blocks. The impulse response of the weight blocks was found in Cadence by finding the response of the weights to a step and then differentiating this response. This response was then exported to MATLAB and included in the tap impulse responses. The MMSE algorithm was then used as usual to find the optimal weights.

# 5.2 Post layout extraction simulation results

The equalizer was simulated for 1000 clock cycles using an ideal PRBS-15 input for different process corners. The results are shown in table 5.1. These simulations are for a channel with  $\tau = \frac{12}{16}T_b$ .

Table 5.1: Post layout extraction simulation results

| Corner                                     | tt, $70^{\circ}$ C | ss, $70^{\circ}C$ | ss, $0^{\circ}C$ | ff $70^{\circ}C$ | ff, $0^{\circ}C$    |
|--------------------------------------------|--------------------|-------------------|------------------|------------------|---------------------|
| SNR                                        | 29.97dB            | 29.0  dB          | 28.4dB           | 33.1dB           | $32.85 \mathrm{dB}$ |
| Output amplitude (p-p differential) - (mV) | 367                | 270               | 377              | 361              | 527                 |

# CHAPTER 6

### Equalizer response measurements

# 6.1 Measuring the response of the Equalizer

In order to find the correct set of tap weights for a given channel, it is necessary to know the impulse responses of the five paths from the input of the equalizer to the output via the five taps. During design, we have an a priori knowledge of the tap impulse responses since we have chosen a certain filter and used ideal passive components. However, after chip fabrication, the responses of the filter nodes would vary by some amount from the expected responses. Therefore, it is necessary to characterize these tap responses accurately.

On chip filters are usually characterized by employing the technique shown in figure 6.1. We can use the same technique for the equalizer. Two identical test buffers are used to drive the IO pads of the chip and the external loads. Balun TF1 converts the single ended input to a differential form which can be used by the equalizer. Baluns TF2 and TF3 convert the differential outputs of the two test buffers into single ended signals. There are now two paths between the input and the output. The path TF1-IOin-B1-IOo1 is called the direct path and TF1-IOin-equalizer-B2-IOo2 is called the equalizer path.

The frequency repose H(f) of the equalizer can be found in the following manner.  $H_{in}$  is the response of the path from the input source to the input of the equalizer,  $H_b$  are the identical responses of the test buffers and  $H_{out}$  is the response of the path from each buffer output to the measuring equipment at the output. Then, the equalizer response is given by

$$H(f) = \frac{H_{in}H(f)H_bH_{out}}{H_{in}H_bH_{out}} = \frac{V_{eq}(f)}{V_{dir}(f)}$$
(6.1)



Figure 6.1: Test setup in literature

This method, although simple and widely used, has been reported to have poor accuracy in the measurement of filter response in the stop-band [2]. Moreover, coupling between the various nodes on the direct and equalizer paths leads to inaccurate measurements.

A technique for accurate frequency response measurement of integrated continuoustime filters was proposed in [2]. This method effectively cancels the spurious feedthrough. Each test buffer is given a switch to switch the sign of the signal. Turning on the sign select would multiply the transfer function of the path involving that buffer by -1. Making measurements using both signs for each buffer, we can remove a large amount of coupling interference. The interference terms cancel out when the following method is used to find H(f).

$$H(f) = \frac{V_{eq}(f) - V_{eq}b(f)}{V_{dir}(f) - V_{dir}b(f)}$$
(6.2)

where  $V_{eq}b(f)$  and  $V_{dir}b(f)$  are the outputs of the direct path and equalizer path buffers with the sign bit on. The test setup is shown in figure 6.2. The direct path and equalizer path outputs can be tapped out from the same pins if an on/off feature is added to each test buffer. This is the case shown in figure 6.2



Figure 6.2: New test setup

### 6.1.1 Test buffer design

The design of the test buffer needed to take care of the following requirements. It would have to drive a  $50\Omega$  external impedance. It would have to block signal transmission effectively when turned off.

The input and output common mode voltages of the equalizer were initially different. While the input common mode voltage was 1.25 V, the output common mode was 0.9 V. When simulations were carried out using these different input common mode voltage levels for the two buffers, results were unsatisfactory. Therefore, the output common mode voltage was shifted to 1.25 V.

An NMOS source follower followed by a PMOS source follower were finally used to shift the voltage levels to that which could be used by the main differential pair. These increasingly sized source followers also served as a layer of buffers enabling the driving of the main differential pairs at the high speeds required. The switches S1-4 serve to switch the sign of the buffer gain based on the bit 'SGN'. Since the controls of any of these switches are on only when the bit 'on' is high, these switches also serve as a signal block when the buffer needs to be turned off. The main differential pair can be turned on or off using the 'on' bit. Thus, there are two stages which block the passage of the input signal when the buffer is turned off. The cascode transistors in this differential pair serve to isolate the output by increasing the output impedance and reducing the feed-through. The cascode gates at biased at Vdd. They are connected to Vdd via a resistance in order to prevent oscillations due to the gate capacitance and bond wire inductance.

The circuit diagram is shown in figure 6.3



Figure 6.3: Test buffer circuit diagram

### 6.1.2 Simulation results

Simulation of the equalizer response measurement technique was carried out using the setup described above. The frequency response of the equalizer was measured for a set of optimal weights corresponding to the a channel with  $\tau = \frac{12}{16}T_b$  and the equalizer in the ss, 70°C corner. Results showing magnitude and phase are shown in figures 6.4 and 6.5.

# 6.2 Measuring the response of the channel

To test the equalizer chip after fabrication, a channel will be connected to the input. A PRBS signal will be given to the chip via the channel. In order to estimate the correct weights for equalizing the channel using the MMSE algorithm, we will need to know the response of the channel.



Figure 6.4: Measured and actual equalizer response magnitude

The method detailed here will enable the accurate estimation of the channel response. The channel is essentially a reciprocal 2-port system. Thus, it can be completely specified using three parameters. The Z parameters specifying the channel are shown below out of which  $z_{12} = z_{21}$  as the channel is reciprocal.

$$Z = \left(\begin{array}{cc} z_{11} & z_{12} \\ z_{21} & z_{22} \end{array}\right)$$

A variable resistor load is placed on chip at the input to the equalizer. Input impedance measurements are made for 3 different values of this resistance  $(R_1, R_2$ and  $R_3)$ . The setup is shown in figure 6.6.



Figure 6.5: Measured and actual equalizer response phase

# 6.2.1 Computing the Z parameters of a 2-port network from $z_{in}$ measurements

The looking in impedance of a reciprocal 2-port network loaded by a resistor R is given by

$$z_{in} = z_{11} - \frac{(z_{12})^2}{z_{22} + R} \tag{6.3}$$

Thus, with  $z_{in}$  (zin1, zin2, zin3) for three different values of R ( $R_1, R_2, R_3$ ), we can compute the Z parameters:

$$z_{22} = \frac{R_2 k - R_3}{1 - k} \tag{6.4}$$



Figure 6.6: Setup for measuring channel response shown with a model channel

where 
$$k = \frac{zin1 - zin2}{zin1 - zin3} \frac{R_1 - R_3}{R_1 - R_2}$$

$$(z_{12})^2 = (zin1 - zin2)(z_{22} + R_2)(z_{22} + R_1)/(R_1 - R_2)$$
(6.5)

In order to find the correct sign of  $z_{12}$ , we can enforce continuity of phase from the lower frequency limit to the higher frequency limit.

$$z_{11} = zin1 + \frac{(z_{12})}{z_{22} + R1} \tag{6.6}$$

### 6.2.2 Simulation results

Simulation of the channel response measurement technique was carried out using the setup described above. The frequency response of the channel was measured. Results showing magnitude and phase are shown in figures 6.7 and 6.8.

# 6.3 Serial input of weight tuning bits

Since 30 pins would otherwise be necessary to input the weight tuning bits, a serial to parallel converter was designed using Verilog and Encounter. This takes in the bits serially and stores them in 30 registers which hold the bits for use by the equalizer.



Figure 6.7: Measured and actual channel response magnitude

# 6.4 On-chip PRBS for testing

A 2.5 GHz,  $2^{15}$  PRBS was designed to provide an on-chip input source which could be fed via an external channel to the equalizer input.



Figure 6.8: Measured and actual channel response phase

# CHAPTER 7

# Eye Opening Monitor - Introduction

The Continuous time equalizers designed in our lab work at high speeds with clock rates of 2.5 GHz and 10 GHz. To view the output eye diagrams of high speed equalizers, we cannot simply bring out the output nodes. The frequency response of channel consisting of the bond pad inductances and pad capacitances filters the output voltages and the effective signals outside give do not give a good representation of the equalizer output eye.

An eye opening monitor circuit is used to convert the high speed analog output of an equalizer to a low speed digital output which gives an effective estimate of the signal level for various phases within a clock cycle. From this digital data, it is possible to construct the equalizer output eye diagram in gray scale with reasonable accuracy.

# 7.1 The Eye Opening Monitor System

The eye diagram of the equalizer output consists of various bit periods of the output signal plotted in an overlapping manner in one frame. Each phase point in this diagram consists of all the values taken by the equalizer output at that particular phase over all the bit periods measured. Therefore, the information needed to reconstruct the eye diagram is the set of these values at all phase points. This section describes how this information, or in reality, a sample of this information can be brought out of the chip using the eye opening monitor system and how this can be used to reconstruct the eye diagram.

The eye opening monitor system takes the equalizer output as its input and generates the information required to reconstruct the eye diagram. The first step is to sample this input at various clock phases. We have to limit the number of phases for which the signal levels need to be estimated. Therefore, the input is sampled at a fixed number of phases which are close enough to reconstruct the eye diagram accurately.

In order to estimate the signal levels at different phases within a cycle, an eye opening monitor samples the output of the equalizer at various phases and compares it to a set of reference levels. The number of reference levels is limited to the minimum number required for accurate reconstruction of the eye diagram. The next step in the system is thus the comparison of the sampled value to a fixed set of reference levels. The sampling and comparison operations can be combined into a single operation by using a clocked comparator.

The main blocks of the eye opening monitor system are thus a clocked comparator and a multiphase clock generator. The clock phase generator provides a series of clocks whose phases lie between the phases of two successive cycles of the data rate clock. For each of these phases, the equalizer output is sampled for many cycles and compared with all the reference levels in order to find out how often the signal is greater than each reference level at each phase.

The multiphase clock generator gives clocks at various phases out of which one can be selected at a time using control logic. This phase is maintained for a large number of cycles of the data rate clock. This period during which a phases is maintained is divided into many smaller periods. Each of these smaller periods too span many cycles of the data rate clock period. During each of these smaller periods, a certain reference level is fed to the clocked comparator. Thus, the comparator compares the sampled signal and a certain reference voltage for this sub-period.

An averaging circuit computes the average of the comparator output during each sub-period and this average value is converted to digital form using an ADC. Averaging is done over a large number of cycles. The output data is low speed and can therefore be taken out of the chip. During each of these divisions, a different reference voltage is used and thus in a given period during which the clock phase is held constant, all references are swept through. This is then repeated for all clock phases until we have digital data corresponding to the number of times the signal is greater than each reference level for each clock phase.

The digital data brought out of the chip can be processed easily to reconstruct the eye diagram. The digital data corresponds to the cumulative probability of the occurrence of the output signal above each reference level at each of the phases. This information can be used to generate the probability of the signal occurring between two reference levels at each phase. This probability function is then plotted as an intensity diagram using a gray-scale mapping. This diagram is the reconstructed eye diagram.

The block diagram for the eye recovery system is shown in figure 7.1. The blocks implement the flow described in the paragraphs above which can be summarized by

Sampling at multiple phases  $\Rightarrow$  Comparison with reference levels  $\Rightarrow$  Averaging  $\Rightarrow$  A/D Conversion



Figure 7.1: Block diagram of the eye opening monitor

# 7.2 System parameters and MATLAB simulation

#### 7.2.1 System parameters

The performance of the eye opening monitor system depends on a number of system parameters. These parameters are given below with descriptions.

**Phase resolution -** The number of phases at which the input signal is sampled decides the horizontal resolution with which the eye diagram can be reconstructed. The minimum phase resolution required has to be found an incorporated in the design.

**Resolution of the comparator reference levels -** The number of reference levels against which the input signal is compared at each phase decides the vertical resolution with which the eye diagram can be reconstructed.

Number of averaging cycles - One averaging cycle should include as many samples as possible to average out the irregularities in the occurrence of the signal at a certain level at a particular phase. If a PRBS source is used as the input to the equalizer, then the period of the PRBS could ideally be used as the averaging period as this would take into account all the possible signal values at each phase. However, due to the presence of noise, more cycles are necessary.

Number of quantization bits for the output - This parameter affects the clarity and accuracy of the reconstructed eye diagram.

#### 7.2.2 System simulation in MATLAB

MATLAB simulation of an ideal eye opening monitor system was carried out to demonstrate its working and to estimate the minimum values of the above mentioned system parameters. [3] was helpful in giving an estimate of these parameters to test and work on. An ideal eye opening monitor was simulated using the following setup. A  $2^7$  PRBS signal was fed via a channel into an eye opening monitor which reconstructed the eye diagram at the output of the channel. The actual eye diagram at the channel output is shown in figure 7.2. The minimum system parameter values were estimated and are given below.

No. of reference levels - 32
No. of phases - 32
No. of cycles of averaging per data point - 2<sup>7</sup>
No. of quantization bits for the output - 8

Channel: 1st order Butterworth filter with 3dB bandwidth = 3 GHz

The eye diagram reconstructed directly is shown in figure 7.3. Eye diagrams were constructed after applying bi-linear and bi-cubic interpolation on the 32x32 image to obtain 125x125 images (by adding 3 extra points between every 2 data points). These are shown in figures 7.4 and 7.5.



Figure 7.2: The actual eye diagram at the channel output



Figure 7.3: The 32x32 eye diagram image obtained directly from the eye opening monitor output (replicated to show 2 cycles)



Figure 7.4: The 125x125 eye diagram image obtained after bi-linear interpolation (replicated to show 2 cycles)



Figure 7.5: The 125x125 eye diagram image obtained after bi-cubic interpolation (replicated to show 2 cycles)

# CHAPTER 8

# Design of the Clocked Comparator

The clocked comparator is required to compare the input signal with reference voltage levels. This block needs to work at 10 GHz. The following sections describe the design of the clocked comparator.

# 8.1 Circuit implementation

The clocked comparator is required to perform the functions of sampling the input, comparing it to a reference and latching it for a bit period of 100 ps.

The basic building block of the comparator is the D-latch. The latch was implemented using CML latches since the speed of operation is 10 Ghz and only a small swing is possible. The schematic of the latch is shown in figure 8.1.

The comparison function can be achieved via different methods. The initial design attempted was based on a latch with the input signal AC coupled about the reference level which is almost a DC value as described in [3]. However, this would require the RC time constant of the AC coupler to be at least equal to the averaging period in which case each reference level would take the length of one averaging cycle to establish it value. If the RC time constant is made significantly smaller than the averaging period, then the low frequency variations in the input signal would get filtered out.

A capacitor could be used at the input to charge up to the reference voltage during a charging phase and hold the reference voltage for the working phase. However, such a setup with switches would have a bandwidth significantly lower than 10 GHz and would thus alter the input signal. Therefore, it was decided to use a double differential pair sampling structure to sample and compare the input signal and the reference voltage. The circuit digram of the first stage of clocked comparator designed using this idea is shown in figure 8.2. This was connected together with a latch sampling at the other clock edge to create a clocked comparator. Many scaled designs were tried with increasing tail currents and transistor sizes until the comparator functioned at 10 GHz. The tail current in each of the latches is 480uA. The input and output swings are 600mV p-p differential.



Figure 8.1: The CML D-latch

At this stage, it was found that the bandwidth of the first stage was still low which resulted in a low sensitivity. Therefore, it was necessary to add inductive peaking to this stage using an inductance of 3 nH in order to increase its sensitivity and ensure clean working at 10 GHz. It was also necessary to add two more latches to regenerate the output signal completely for small input differences. The circuit digram of the clocked comparator is shown in figure 8.3.



Figure 8.2: The first stage of the clocked comparator

# 8.2 Schematic simulation results

Simulations were carried out on the schematic. This clocked comparator functions well at  $10 \,\text{GHz}$  with a sensitivity of around  $10 \,\text{mV}$  input differential difference.



Figure 8.3: The clocked comparator

# CHAPTER 9

# Design of the Multiphase clock generator

As described earlier, it was found that 32 clock phases were required and the minimum clock phase spacing required was 3.125 ps. These phases have to be generated from the input clock. It was decided to have 8 coarse phases which could be generated using delay cells. 32 phases could be generated by interpolating between these coarse phases. However, the first and last phases of each interpolated set would be the same as the bordering phases of the neighboring interpolated set. So, we need to generate more phases. So, 56 phases were generated using 8 coarse phases and 8 interpolated phases per adjacent coarse phases pair after taking into account the overlapping phases. The design of the various blocks involved in the generation of the multiple phases is described in this chapter.

# 9.1 Generation of coarsely spaced clock phases

The generation of coarse phases was done in the following manner. A line of delay cells was designed to generate equal delays. More than 8 delay cells were required in order to ensure matching input and output conditions for each delay cell. A few delay cells alter the input clock such that the signal shapes at the outputs of the 8 output producing cells are the same.

### 9.1.1 Bandwidth enhancement

Initial design failed to have sufficient bandwidth. Therefore, 1.2 nH inductance based inductive peaking was introduced in each delay cell. Such a design takes up a lot of area with each differential inductor being a few times larger than the delay cells themselves. Therefore, a different method was adopted. The inductors were removed and the load resistor values were scaled up so that even if a full swing of voltage was not achieved, each delay cell output would cross the 600 mV p-p differential swing limit necessary to completely switch the following delay cell. The delay cell is shown in figure 9.1.



Figure 9.1: The delay cell

### 9.1.2 Delay cell chain

The delay cell chain used to generate the 8 coarse phases is shown in figure 9.2. identical buffers were used at the output of each delay cell to ensure equal loading. An 8:1 multiplexer block would follow this setup. The multiplexer would offer different input capacitances at each of its inputs. Therefore, without these buffers, the delay cells would be unequally loaded and equal delays would not be achieved along the chain.



Figure 9.2: The delay cell chain - generates 8 coarse clock phases

# 9.2 Phase Interpolator design

The phase interpolator is used to generate 8 phases between and including every 2 adjacent coarse phases. Two kinds of designs were tested for the phase interpolator.

These two designs work with 600mV p-p differential inputs and give the same output swing.

#### 9.2.1 Design 1 - eliminated

This design uses two differential pairs which have the two coarse phases as inputs. The tails currents to these two differential pairs are adjustable to 8 values. The tail currents to the two pairs are varied linearly with bit setting. The sum of the tail currents in the two pairs is kept constant. The outputs of the two pairs are connected together a differential resistor load. The circuit is shown in figure 9.3.

Cascode transistors - A simple differential pair will have instant feed-through via the Cgd - Cdd capacitive divider. The cascode transistors were added to reduce the effect of the feed-through. The results for this design with ideal coarse phase waveforms are shown in figure 9.4. This interpolation is not linear.

#### Features



Figure 9.3: Clock phase interpolator - Design 1

(i) This design cannot offer linearity in phase interpolation.

### 9.2.2 Design 2 - chosen for this work

This design tries to interpolate the phase by linearly varying the value of the transconductance working on each coarse clock. If we are interested in the area around the zero crossings, transconductance is the parameter that should be varied linearly for linear interpolation. This design does so by using three sets of differential pairs - whose tail currents and sizes are both in geometric progression. Each differential pair set has current running through only one of the differential pairs. In this design, the sizes and currents through the effective total input transistors both scale linearly with bit setting. Thus, transconductance scales linearly with bit setting.

The current corresponding to each of the two clocks also varies linearly with bit setting. Thus, the interpolation in the region where the cells are slewing should



Figure 9.4: Results for design 1 with ideal input coarse phase waveforms

be sufficiently linear too. I had stated in the previous paragraph that if we are interested in the area around the zero crossings, transconductance is the parameter that should be varied linearly for linear interpolation. However, if the current to voltage conversion at the output is slow due to a large capacitance, then slewing might occur before the zero crossing occurs. Thus, this design take scare of both cases.

The select switches are added above the input transistors. This ensures that there is zero feed-through from the differential pairs which are off. This alleviates the problem of the phases spilling out beyond their range within two coarse phases. This is of importance at the edges of the interpolation space where Cgd feedthrough will cause the fine phase to jump outside the coarse phase it is supposed to match exactly with. Select switches are also present below the tail currents for extra isolation in the off state. The circuit is shown in figure 9.5.

The problem with this design is that the phase interpolator now offers different input capacitances to the course clock phase generator. Thus, the input waveforms to the phase interpolator vary according to the bit setting.

The results for this design with ideal input coarse phase waveforms is shown in figure 9.6. This interpolation is linear.

#### Features



Figure 9.5: Clock phase interpolator - Design 2

(i) This design ensures linearity with ideal input waveforms.

(ii) With real coarse delay cells, the linearity would be highly disturbed.

(iii) This phase interpolator offers different input capacitances to the course clock phase generator. Thus, the input waveforms to the phase interpolator vary according to the bit setting.

**Replica interpolator:** In order to ensure similar loading for both coarse clocks for all interpolator bit settings, the coarse phase generator is loaded with a replica interpolator circuit whose bits are the negated versions of those controlling the actual interpolator. This leads to a direct waste of 700uA of current.

Therefore, design 2 with a replica interpolator was used as it could give linear phase variation.

### 9.2.3 Final clock phase interpolator circuit

Design 2 described above was used to implement the clock phase interpolator. A replica interpolator was used to provide equal input capacitance to both coarse clock inputs.



Figure 9.6: Results for design 2 with ideal input coarse phase waveforms

In order to achieve sufficient bandwidth, inductive peaking was used. The load resistors were reduced from the values required for 600 mV p-p differential swing by a factor of 2. The circuit digram is shown in figure 9.7. A 4-buffer chain with varying sized buffers was added to the output to regenerate the clock signals to a 600mV p-p differential output amplitude.

### 9.3 8:1 multiplexer design

A PMOS switch tree based multiplexer cannot support 10 GHz signals, and so we need to use a different design for the multiplexer. An 8:1 multiplexer tree was designed using the 2:1 multiplexer cells shown in figure 9.8. The multiplexer was designed to consist of 3 levels corresponding to the 3 select bits. The gradual increase in size allows the signals to drive each stage and the final buffer at 10 GHz.

In order to achieve required functioning at 10 GHz, the resistor loads of the 2:1 multiplexer cells were increased beyond the value required to get a full swing of 600 mV p-p differential output swing at DC. This enables them to achieve a 600mV p-p differential output despite having insufficient bandwidth.



Figure 9.7: The final clock phase interpolator circuit

# 9.3.1 Compensating the effects of feed-through

Simulations of the multiplexer and the coarse phase generator showed that the spacing between the coarse clock phases was not uniform and varied according to the number of coarse clock phase selection bits that changed between each set of adjacent coarse clock phases. This was found to be due to  $C_{gd}$  feed-through in the 2:1 multiplexer differential pairs. Therefore, replica input pairs with inputs fed in the reverse manner were added to each 2:1 multiplexer cell. The final 2:1 multiplexer cell is shown in figure 9.9.

The details of the multiplexer stages are given in table 9.1.



Figure 9.8: 2:1 multiplexer cell



Figure 9.9: Final 2:1 multiplexer cell with feed-through compensation

# 9.3.2 Generation of two coarse phases from the multiplexer output

The multiplexer output is fed via a buffer into a chain of delay cells identical to the main delay cell chain. Two of the outputs are tapped out to server as the two coarse clock outputs. A series of varying sized buffers was placed at the output to enable this structure to drive the clock interpolator.

Table 9.1: Multiplexer stages

| Stage | Tail current | Input transistor size                | Load resistance $\Omega$ |
|-------|--------------|--------------------------------------|--------------------------|
| 1     | 960uA        | $15.04 \mathrm{um}/0.12 \mathrm{um}$ | 312.5                    |
| 2     | 1.280mA      | $20 \mathrm{um}/0.12 \mathrm{um}$    | 234                      |
| 3     | 1.6mA        | 24.96 um/0.12 um                     | 187                      |

# 9.4 Simulation results

### 9.4.1 Schematic

Simulation was carried out using the schematic of the multiphase clock generation circuit block which included the coarse phase generator, 8:1 multiplexer with dual coarse clock output and the clock phase interpolators. 100fF capacitance was added to the output nodes to account for the loading of the comparator and the layout parasitics. Results are shown in figure 9.10. Here, 2 sets of coarse clock periods are shown with interpolated phases. The boundary clock phases in adjacent sets nearly overlap (figure 9.10). A linear variation in phase is observed. A full swing of 600mV p-p differential is also achieved.

### 9.4.2 Layout extracted

Simulation was carried out using the layout of the multiphase clock generation circuit block. There was a significant drop in amplitude along the multiplexer path leading to incomplete switching in the multiplexer and following cells. The replica delay line following the multiplexer functions badly and does not generate the same delays as in the main delay chain. The clock interpolation therefore, suffers terribly and the different phases show varying amplitude and bad linearity.

**Future solution:** By adding buffers in the multiplexer to regenerate the signal amplitude wherever necessary, the problem could be tackled.



Figure 9.10: Multiphase clock generation - results for schematic block simulation: 2 sets of adjacent interpolated phases

# CHAPTER 10

# System Simulations

This chapter describes the simulation of the eye opening monitor system for various settings. It starts from the simulation of an ideal system and then goes on to describe the simulation with the schematics of the clocked comparator and the multiphase clock generator.

Ideal DAC, PRBS generator and channel blocks designed by a fellow student were used in these simulations. The properties of a successive approximation ADC designed were also used to design the control block although the ADC itself was not included in these simulations.

# 10.1 Control block

The control block performs the function of providing control signals to the DAC, the multiphase clock generator, the averaging circuit and the ADC.

The control block was implemented in Verilog. The inputs and outputs of the Verilog control block are as shown below. (noc is the number of averaging cycles, i.e. the number of cycles for which a single clock phase and DAC level are used)

Inputs:

1. clk(frequency (fclk) =  $12 \times 10 GHz/(32 \times 56 \times \text{noc})$ . This clock frequency is 12 times higher than the averaging frequency. This provides a faster clock to the ADC which has a latency of 12 cycles.

2. reset - a short pulse which resets all counters and outputs to their required initial values

Outputs:

1.  $dac_ctrl[4:0]$ 

2.  $phase\_ctrl[5:0]$ 

3. enable\_avg - Out of the 12 cycles of the input clock that make up an averaging cycle, enable\_avg is turned off during the first cycle to allow time for the DAC output to settle and for the averaging capacitor to discharge. This is not relevant when simulating using ideal blocks. The averaging circuit used is described in the next section.

4. eoc - a high signal lasting one period of the input clock lets us know that a single reference level has been passed through

5. done - a high signal would let us know that the eye opening monitor has characterized the eye completely

# **10.2** Ideal Eye Opening Monitor in Cadence

The simulation of the eye opening monitor system was done using the following ideal blocks:

- 1. PRBS  $2^7$  generator (schematic)
- 2. Channel (VerilogA)
- 3. Clocked comparator (VerilogA)
- 4. Multi phase clock generator (ideal schematic)

5. Averaging circuit: Consists of a leaky integrator made using an RC filter with  $1/(2\pi RC) = f_{averaging}/10$ . Thus,  $RC = 2\pi T_{averaging}/10$  and thus gives a linear response to a step of any length within the averaging period. In other words, it functions as an integrator.

6. A/D conversion - in MATLAB

Specifications of the eye opening monitor:

No. of reference levels - 32

No. of phases - 56 with 2ps spacing spanning  $112ps(Tclk \times 1.12)$ 

No. of cycles of averaging per data point -  $2^7$ 

### 10.2.1 Results

The actual eye diagram at the channel output is shown in figure 10.1.

Eye diagrams were constructed after applying bi-cubic interpolation (which showed better results when compared to bi-linear interpolation during the simulation of the ideal eye opening monitor in MATLAB) on the 32x32 image to obtain 125x125 images (by adding 3 extra points between every 2 data points).

To obtain the probability density matrix from the cumulative density matrix, numerical differentiation can be performed in two ways:

$$pdf(h) = \frac{cdf(h) - cdf(h-1)}{h}$$
$$pdf(h) = \frac{cdf(h+1) - cdf(h-1)}{2h}$$

The second method results in an error of  $O(h^2)$  and is therefore more suitable for our purposes. The eye diagram constructed using this method and bi-cubic interpolation is shown in figure 10.2

# 10.3 Ideal Eye Opening Monitor in Cadence with real multiphase clock generator and comparator

Simulations were carried out on the eye opening monitor using the following setups:

- (i) Ideal multiphase clock generator and real(schematic) clocked comparator.
- (ii) Real(schematic) multiphase clock generator and real(schematic) clocked com-



Figure 10.1: Ideal setup: The actual eye diagram at the channel output parator.

### 10.3.1 Results

Eye diagrams were constructed after applying bi-cubic interpolation on the 56x32 image to obtain 221x125 images (by adding 3 extra points between every 2 data points). The eye diagram constructed using an ideal multiphase clock generator and real comparator is shown in figure 10.3. The eye diagram constructed using a real multiphase clock generator and real comparator is shown in figure 10.4. Since the clock generator phases, before layout, span only a little over 80ps, only about 80% of the eye is recovered in this case.



Figure 10.2: Ideal setup: The 125x125 eye diagram image obtained using the better method of numerical differentiation and bi-cubic interpolation (length = Tclk x 1.12)



Figure 10.3: Ideal multiphase clock generator and real comparator: The  $221 \times 125$ eye diagram image obtained after bi-cubic interpolation (length = Tclk x 1.12)



Figure 10.4: Real multiphase clock generator and comparator: The 221x125 eye diagram image obtained after bi-cubic interpolation (length approx. = Tclk x 0.8)

# CHAPTER 11

# **Future work**

# 11.1 Eye opening monitor system - remaining work

1. Multiphase clock generator

This block currently does not work post layout extraction. The low bandwidth in the multiplexer lines is reducing the amplitude of the clock signal and therefore, the replica delay lines following the multiplexer does not produce equal delays along its path as each delay cell is switched incompletely by different amounts. Adding buffers to regenerate amplitude would get the block working.

2. Clocked comparator

This block has reduced sensitivity post layout. Bandwidth improvement can be achieved by reducing the interline capacitance in the layout in all differential line pairs by spacing them further apart.

3. Averaging

The averaging circuit needs an op-amp based differential to single ended converter to drive the ADC block.

4. System simulation

The entire eye opening monitor system needs to be simulated with the ADC block and its timing signals.

# APPENDIX A

# Pin Details of the Continuous Time Equalizer Chip



Figure A.1: Pin diagram of the Continuous Time Equalizer chip

| Pin               | Name                   | Functionality                               |
|-------------------|------------------------|---------------------------------------------|
| 17,20,24,31,40,45 | gnd                    | common ground for analog and digital        |
|                   |                        | blocks                                      |
| 7,16,21,23,41,44  | vdd_2.5V               | 2.5 V supply voltage                        |
| 47                | vdd_1.8V               | 1.8 V supply voltage - digital level for    |
|                   |                        | equalizer weight bits                       |
| 29,33             | vdd_1.8V_prbs          | 1.8 V supply voltage for the PRBS           |
| 2                 | $0.9\mathrm{V}$        | variable input current to vary the on-      |
|                   |                        | chip generated $0.9 \mathrm{V}$ level       |
| 46                | IO                     | Reference Current source (40uA)             |
| 18,19             | inp, inn               | Differential equalizer input                |
| 43,42             | buff_outp, buff_outn   | Differential output of the equalizer test   |
|                   |                        | buffers                                     |
| 14,15             | R_dual_b1, R_dual_b2   | Dual filter resistor tuning bits            |
| 27,28             | R_ladd_b1, R_ladd_b2   | Ladder filter resistor tuning bits          |
| 25,26             | R_IO_b1, R_IO_b2       | Input and output $50\Omega$ resistor tuning |
|                   |                        | bits                                        |
| 39,38             | Rl_b1, Rl_b2           | Equalizer output stage resistor tuning      |
|                   |                        | bits                                        |
| 22                | vcm                    | Common mode reference voltage of            |
|                   |                        | $1.25\mathrm{V}$                            |
| 3,4               | buff_dp_SGN,           | Sign bits for direct path and equalizer     |
|                   | buff_fp_SGN            | path test buffers                           |
| 5,6               | buff_dp_on, buff_fp_on | On/off bits for direct path and equal-      |
|                   |                        | izer path test buffers                      |
| 8                 | RTW                    | Request to write input signal for se-       |
|                   |                        | rial input of equalizer tap weight tuning   |
|                   |                        | bits                                        |
| 9                 | ready                  | Ready output signal for serial input of     |
|                   |                        | equalizer tap weight tuning bits            |
| 10                | $\operatorname{shift}$ | Shift input signal for serial input of      |
|                   |                        | equalizer tap weight tuning bits            |
| 11                | shift_clk              | Clock input for serial input of equalizer   |
|                   |                        | tap weight tuning bits                      |
| 12                | ser_in                 | Serial input for equalizer tap weight       |
|                   |                        | tuning bits                                 |
| 37                | PRBS_ON                | On/off bit for PRBS                         |
| 35,34             | PRBS_clk, PRBS_clkb    | Differential input clock for the PRBS       |
| 30,32             | PRBS_outp,             | PRBS output                                 |
|                   | PRBS_outn              |                                             |

# REFERENCES

- Shanthi Pavan, "Power and Area-Efficient Adaptive Equalization at Microwave Frequencies," in *IEEE Trans. on Circuits and Systems I: Regular Papers*, pp. 1412–1420, July 2008.
- [2] S. Pavan and T. Laxminidhi, "A Technique for Accurate Frequency Response Measurement of Integrated Continuous-Time Filters," in *Custom Integrated Circuits Conference*, *IEEE*, pp. 77–80, 2006.
- [3] Mrinmay Vyankatesh Talegaonkar, "Electronic Eye Diagram Reconstruction System for 10-Gbps Data Transmission Systems," Master's thesis, IIT Madras, 2007.