### Fully-Integrated 10.5 to 13.5 Gbps Transceiver in 0.13μm CMOS

Guoqing Miao, Peicheng Ju, Devin Ng, John Khoury, and Kadaba Lakshmikumar

Vitesse Semiconductor Corporation New Jersey Design Center (formerly Multilink Technology Corporation) Somerset, NJ, USA

# Outline

- System overview
- Transmitter
  - VCO
  - Duo-binary coding and the output buffer
- Receiver
  - Quadrature VCO
  - Half-rate binary phase/DC offset detector
  - High gain input buffer
  - DC offset cancellation circuit
  - Data retiming clock phase tuning
- Experimental results
- Conclusions

#### **System Overview**



**Intended applications:** 

- SONET OC-192 with 7 25 % FEC overhead
- 10G Ethernet with added FEC function and 64/66 code rate
- SOC integration

# **Transmitter Block Diagram**



- Half-rate architecture
- Multiplexer uses CML logic for speed > 2.5GHz

# **Tx VCO Circuit**



- LC oscillator for superior jitter performance
- VCO runs at half the data rate
- Multiple VCO tuning curves to achieve low  $K_{VCO}$  and wide range
- Selectable capacitors for coarse tuning, MOS varactors for fine tuning



#### • Duo-binary coding:

Pre-coding:  $p(k) = p(k-1) \oplus x(k)$ , Filtering: y(k) = p(k) + p(k-1)

#### Implementation:

Pre-coding before the last 4:1 MUX Filtering in the output buffer

### **Transmitter Output Buffer (I)**



Output buffer uses four parallel paths:

For normal NRZ pattern:  $b_0 = b_1 = b_2 = b_3 = x(k)$ For NRZ with pre-emphasis:  $b_0 = b_1 = b_2 = x(k)$ ,  $b_3 = -x(k-1)$ For output duo-binary coding:  $b_0 = b_1 = p(k)$ ,  $b_2 = b_3 = p(k-1)$ 

# **Transmitter Output Buffer (II)**



- Four matched parallel paths
- Inductive peaking in each pre-buffer to improve output rise/fall times
- 50  $\Omega$  on chip termination to reduce reflections

### **Receiver Block Diagram**



• Half rate dual loop architecture

**Loop-I** for VCO training

**Loop-II** for data phase locking and retiming

- High gain input buffer with DC offset compensation
- Phase tuning of the data sampling clock (Q)

## **Rx Quadrature LC Oscillator**



- Two cross-coupled LC tanks to generate quadrature phases
- Multiple VCO tuning curves to achieve low  $K_{VCO}$  and wide range
- Selectable capacitors for coarse tuning, MOS varactors for fine tuning

## **Binary Phase/DC Offset Detection**



| ΑΤΒ   | Phase   | DC offset                    |
|-------|---------|------------------------------|
| 0 0 0 | -       | Decrement if 512 occurrences |
| 0 0 1 | retard  | decrement                    |
| 0 1 0 | -       | -                            |
| 0 1 1 | advance | increment                    |
| 100   | advance | decrement                    |
| 101   | -       | -                            |
| 1 1 0 | retard  | increment                    |
| 1 1 1 | -       | increment if 512 occurrences |

#### **Half-Rate Phase Detector Implementation**



- Half-rate Alexander phase detector
- Use both edges of In-phase clock to sample the data transition
- Use both edges of quadrature clock to sample the data center
- Automatic 1-to-2 de-multiplexing
- All in CML logic

## **Data Sampling Clock Phase Tuning**



When Di(p,n) has different rise/fall time
(2) is the optimal data sampling point, away from the data center (1)



- $\alpha$  > 0: extra delay to *Ckqo(p,n)*
- *α* < 0: less delay to *Ckqo(p,n)*
- Digitally controlled α provides +/- 45° tuning range, with 6.4° step size



- Digital accumulator "integrates" the T bit to measure input DC offset
- To save power, the digital accumulator runs at lower frequency
- 6 bit current DAC input is thermo-coded to guarantee output monotonic

### High Gain Input Buffer (I)



- *los(p,n)*: offset cancellation current
- Two modes of operation:
  - with all 4 stages => high gain for high sensitivity with 1<sup>st</sup> and 4<sup>th</sup> stages => low gain for low power
- Active shunt peaking for bandwidth extension

# **High Gain Input Buffer (II)**



- Input-referred offset canceled with current-mode DAC via *I*<sub>osp</sub>, *I*<sub>osn</sub>
- NMOS load with active shunt peaking

DC gain ≈



• Equivalent shunt peaking inductance

$$L \approx \frac{R_g}{\omega_{T1}} = \frac{R_g \times C_{gs1}}{g_{m1}}$$



- 0.13 $\mu$ m, 1.2v 8M standard CMOS
- Flip-chip layout, wire-bonding package for test chip
- On chip 1.5mm T-lines for high speed I/Os
- 2.0 x 4.0 mm<sup>2</sup> macro size
- 118-pin PBGA package

### **Transmitter Test Results(I)**



NRZ output @ 10.68Gbps, 2<sup>31</sup>-1 PRBS pattern



- N II 🔺 🖉

Pulse Y Amplitude Y AR AN AR AR AR X AN AR Y 🛰 🗰 👫

Duo-binary coding @12.5 Gbps, 2<sup>31</sup>-1 PRBS pattern

 Slow edges caused by the non-ideal on-chip T-lines and the wire-bond package

## **Transmitter Test Results(II)**



NRZ output @ 10.3Gbps, 2<sup>7</sup>-1 PRBS pattern

Duo-binary coding @10.3 Gbps, 2<sup>7</sup>-1 PRBS pattern

 Improved output data eyes, the 10G macro is integrated into a XAUI to 10G transceiver in a 400 pin Flip Chip Plastic BGA (FC-PBGA) package



### **Transmitter Test Results(III)**





Tx half rate clock jitter: 1.2ps rms, 8.4ps p-p jitter @ 5.5GHz Tx clock spectrum Phase noise:-103dBc/Hz @ 1MHz offset

#### **Receiver Test Results(I)**





Recovered divide-by-4 clock, input 40mV, 2<sup>31</sup>-1 PRBS pattern, 11.0Gbps, 2.46ps rms, 12ps p-p jitter Recovered divide-by-4 clock Spectrum, phase noise: -122dBc/Hz @ 1MHz offset

#### **Receiver Test Results(II)**



Rx jitter tolerance: input 40mV 2<sup>31</sup>-1 PRBS pattern, 11.0Gbps 10<sup>-12</sup> error threshold

Rx input sensitivity: with DC offset cancellation, 2<sup>31</sup>-1 PRBS pattern, 11.0Gbps

# **Summary of Experimental Results**

| <u>Parameter</u>                     | Experimental results    |
|--------------------------------------|-------------------------|
| Technology                           | 0.13μm standard CMOS    |
| Speed                                | 10.5 ~ 13.5 Gbps        |
| Rx input sensitivity                 | < 15 mV single-ended    |
| Tx jitter generation (50kHz – 80MHz) | 6.7 mUI rms             |
| Rx jitter generation (50kHz – 80MHz) | 8.9 mUI rms             |
| Tx output swing                      | 700 mV p-p differential |
| Macro size                           | 8.0 mm <sup>2</sup>     |
| Power supply                         | 1.2V, 1.8V              |
| Power consumption                    | Tx = 450mW, Rx = 550mW  |

- A fully integrated 10.5 to 13.5 Gbps transceiver in 0.13  $\mu m$  standard CMOS has been demonstrated
- Half-rate architecture has been demonstrated for both Tx and Rx
- Tx output duo-binary coding integrated
- A new DC offset cancellation technique is implemented
- Rx achieves < 15mV input sensitivity
- SONET OC-192 compliant performance achieved