## A 10-Gb/s Adaptive Decision Feedback Equalizer with On-Chip Eye Opening Monitoring

Chang-Kyung Seong

The Graduate School

Yonsei University

Department of Electrical and Electronic Engineering

# A 10-Gb/s Adaptive Decision Feedback Equalizer with On-Chip Eye Opening Monitoring

A Dissertation

Submitted to the Department Electrical and Electronic Engineering and the Graduate School of Yonsei University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

**Chang-Kyung Seong** 

January 2011

This certifies that the dissertation of Chang-Kyung Seong is approved.

Thesis Supervisor : Woo-Young Choi

**Gun-Hee Han** 

**Tae-Wook Kim** 

**Sung-Min Park** 

Pyung-Su Han

**The Graduate School** 

Yonsei University

January 2011

## Contents

| Li | ist of ] | Figures                                | iv  |
|----|----------|----------------------------------------|-----|
| Li | ist of ' | Tables                                 | xi  |
| A  | bstrac   | ct                                     | xii |
| 1  | Intr     | oduction                               | 1   |
|    | 1.1      | High-Speed Adaptive Equalizers         | 1   |
|    | 1.2      | On-Chip Testability and Cost Reduction | 5   |
|    | 1.3      | Outline of Dissertation                | 8   |
| 2  | Bac      | kgrounds and Motivations               | 9   |
|    | 2.1      | Channel Model                          | 9   |
|    | 2.2      | Review of Equalizing Filters           | 11  |
|    |          | 2.2.1 Category of equalizing filters   | 11  |
|    |          | 2.2.2 Decision Feedback Equalizers     | 14  |
|    | 2.3      | Review of Adaptation Methods           | 20  |
|    | 2.4      | Aim of Dissertation                    | 21  |

| 3 | Proj | posed S | ystem                                                         | 23 |
|---|------|---------|---------------------------------------------------------------|----|
|   | 3.1  | Structu | are of Proposed Decision Feedback Equalizer                   | 23 |
|   | 3.2  | Look-a  | ahead Eye-Opening Monitoring                                  | 30 |
|   | 3.3  | Adapta  | ation of Decision Feedback Equalizer Using ISI Monitor        | 39 |
|   | 3.4  | Decisi  | on of design parameters                                       | 44 |
|   |      | 3.4.1   | Resolution of DAC                                             | 44 |
|   |      | 3.4.2   | Number of samples                                             | 44 |
| 4 | Circ | uit-Lev | el Design and Post-Layout Simulation Results                  | 46 |
|   | 4.1  | Config  | guration of System                                            | 46 |
|   |      | 4.1.1   | Overall structure                                             | 46 |
|   |      | 4.1.2   | Signal range and bias scheme                                  | 49 |
|   | 4.2  | Decisi  | on Feedback Equalizer and Its Building Blocks                 | 53 |
|   |      | 4.2.1   | Unit sampler module                                           | 54 |
|   |      | 4.2.2   | Offset amplifier                                              | 56 |
|   |      | 4.2.3   | Track-and-hold switch and clocked-sense amplifier             | 59 |
|   | 4.3  | Clock   | Generator and Its Building Blocks                             | 61 |
|   |      | 4.3.1   | Four-phase clock generator                                    | 63 |
|   |      | 4.3.2   | Phase interpolator and differential-to-single-ended converter | 64 |
|   |      | 4.3.3   | Clock trees                                                   | 69 |
|   | 4.4  | Miscel  | llaneous Blocks                                               | 70 |
|   |      | 4.4.1   | Code deserializer                                             | 70 |
|   | 4.5  | Post-L  | ayout Simulation of Modules                                   | 72 |

| 5  | Experimental Results                            | 83 |
|----|-------------------------------------------------|----|
| 6  | Conclusion                                      | 96 |
| Aj | opendix                                         | 98 |
|    | A. Design and revision of printed circuit board | 98 |

## **List of Figures**

| 1.1 | Measured $S_{21}$ parameters of 10cm, 20cm, 30cm and 40cm long PCB            |    |
|-----|-------------------------------------------------------------------------------|----|
|     | channels each                                                                 | 2  |
| 1.2 | Measured symbol responses of 10cm, 20cm, 30cm and 40cm long PCB               |    |
|     | channels each                                                                 | 2  |
| 1.3 | Concept of equalizer for band-limited channel (a) bandwidth extension         |    |
|     | using equalizer (b) under-, optimum and -over equalization depending          |    |
|     | on boosting gain of equalizer                                                 | 4  |
| 1.4 | Block diagram of general adaptive equalizer                                   | 5  |
| 1.5 | Cost of silicon manufacturing and test                                        | 6  |
| 1.6 | Concept of on-chip eye-opening monitoring                                     | 7  |
| 1.7 | BER estimation by extrapolation                                               | 7  |
| 2.1 | Band-limited symbol response and notation                                     | 10 |
| 2.2 | Category of widely used equalizers                                            | 12 |
| 2.3 | Structures of widely-used equalizing filters (a) continuous-time analog       |    |
|     | filter (b) FIR filter with analog tap-delay (c) FIR filter with discrete-time |    |
|     | tap-delay (d) decision-feedback equalizer                                     | 13 |
|     |                                                                               |    |

| 2.4 | Block diagram of full-rate 1-tap look-ahead DFE                               | 15 |
|-----|-------------------------------------------------------------------------------|----|
| 2.5 | Simplified eye-diagrams of (a) received signal (b) candidate signal $y_1$ (c) |    |
|     | $y_0$                                                                         | 15 |
| 2.6 | Block diagram of half-rate 1-tap look-ahead DFE                               | 16 |
| 2.7 | Timing diagram of half-rate 1-tap look-ahead DFE (a) entire timing dia-       |    |
|     | gram (b) detailed timing diagram for <i>even</i> -path with circuit delays    | 17 |
| 2.8 | Implementation of subtracters (a) offset amplifier (b) resettable-current-    |    |
|     | integrating (c) switched-capacitor                                            | 19 |
| 3.1 | Block diagram of quarter-rate 2-tap look-ahead decision feedback equal-       |    |
|     | izer                                                                          | 24 |
| 3.2 | Eye diagrams of received signal and four candidate signals with two           |    |
|     | post-cursors                                                                  | 25 |
| 3.3 | Timing diagram of quarter-rate 2-tap look-ahead decision feedback equal-      |    |
|     | izer (a) sampling stage (only USM11 signals shown) (b) multiplexing stage     | 28 |
| 3.4 | Example waveforms of quarter-rate 2-tap look-ahead decision feedback          |    |
|     | equalizer                                                                     | 29 |
| 3.5 | Block diagram of 2-tap quarter-rate look-ahead decision feedback equal-       |    |
|     | izer with eye-opening monitoring                                              | 32 |
| 3.6 | Example of look-ahead eye monitoring with recent bit pattern of "001".        | 33 |
| 3.7 | Timing diagram for eye monitoring in multiplexing stage                       | 34 |

| 3.8  | Process of drawing eye diagram (a) assumption on grid (b) flow chart,            |    |
|------|----------------------------------------------------------------------------------|----|
|      | where $N_S$ is a target duration, $n_s$ is a clock count, $n_h$ is the number of |    |
|      | "1" in $D_{o,EOM}$ , and H is a histogram matrix                                 | 37 |
| 3.9  | Example of processes to obtain cumulative and distribution histogram .           | 38 |
| 3.10 | Example of cumulative histogram and distribution histogram in 3-D plot           | 38 |
| 3.11 | Simplified flow chart of sequential pattern-filtered eye opening moni-           |    |
|      | toring (a) total flow chart (b) definition of pattern-filtered sample count      |    |
|      | function                                                                         | 42 |
| 3.12 | Estimated gate count and adaptation time for sequential and parallel             |    |
|      | pattern-filtered EOM                                                             | 43 |
| 4.1  | Overall configuration of prototype system                                        | 48 |
| 4.2  | Monitoring range                                                                 | 49 |
| 4.3  | Schematics of 5-bit digital-to-analog converter and replica bias genera-         |    |
|      | tor to ensure the range of output voltage of digital-to-analog converter         |    |
|      | (a) digital-to-analog converter (b) replica bias generator                       | 51 |
| 4.4  | Post-layout simulation results of DAC with replica bias generator when           |    |
|      | $V_{SW}=0.6$ (a) Output voltages versus DAC codes (b) integral non-linearity     | 52 |
| 4.5  | Block diagram of quarter-rate 2-tap look-ahead decision feedback equal-          |    |
|      | izer                                                                             | 53 |
| 4.6  | Schematic of unit sampler module                                                 | 55 |
| 4.7  | Schematic of offset amplifier                                                    | 57 |

| 4.8  | Post-layout simulation results of offset amplifier at $D_{in,com} = V_{th,com} =$ |    |
|------|-----------------------------------------------------------------------------------|----|
|      | 0.9V (a) DC simulation (b) DC simulation, difference-mode plot (c) AC             |    |
|      | and transient simulation                                                          | 58 |
| 4.9  | Schematic of clocked sense amplifier                                              | 59 |
| 4.10 | Post-layout simulation result of sampling path in unit sampler module .           | 60 |
| 4.11 | Block diagram of clock generator                                                  | 61 |
| 4.12 | Schematic of four-phase clock generator                                           | 63 |
| 4.13 | Delay versus $V_{DCON}$ for three process corners in post-layout simulations      | 63 |
| 4.14 | Schematic of phase interpolator                                                   | 66 |
| 4.15 | Schematic of differential-to-single-ended converter                               | 66 |
| 4.16 | Monotonic phase control using MUXs and PI                                         | 67 |
| 4.17 | Post-layout simulation results of clock generation using MUXs, PI and             |    |
|      | D2S (a) Output phase versus control code (b) INL (c) I/Q mismatch $\ . \ .$       | 68 |
| 4.18 | Schematic of clock trees                                                          | 69 |
| 4.19 | Code deserializer (a) Schematic (b) Timing diagram                                | 71 |
| 4.20 | Layout of designed chip (a) entire chip (b) core                                  | 72 |
| 4.21 | Post-layout simulation process for eye monitoring by sweeping $V_{EOM}$ .         | 74 |
| 4.22 | Eye monitoring result at the center phase of eye (a) cumulative histogram         |    |
|      | for "1" (b) normalized cumulative histogram for "1" (c) distribution his-         |    |
|      | togram for "1" (d) cumulative histogram for "0" (e) normalized cumula-            |    |
|      | tive histogram for "0" (f) distribution histogram for "0"                         | 76 |

| 4.23 | Distribution histograms for each pattern using pattern-filtered eye mon-  |    |
|------|---------------------------------------------------------------------------|----|
|      | itoring for patterns of (a) "111" (b) "101" (c) "011" (d) "000" (e) "010" |    |
|      | (f) "100"                                                                 | 77 |
| 4.24 | MATLAB process of drawing effective eye diagram using post-layout         |    |
|      | simulation results                                                        | 78 |
| 4.25 | Post-layout simulation result of DFE for 10-Gb/s PRBS7 input with         |    |
|      | channel bandwidth of 2.2-GHz (a) before DFE (b) after DFE (effective      |    |
|      | eye diagram after offset amplifier)                                       | 79 |
| 4.26 | Post-layout simulation result of DFE for 10-Gb/s PRBS7 input with         |    |
|      | channel bandwidth of 1.3-GHz (a) before DFE (b) after DFE (effective      |    |
|      | eye diagram after offset amplifier)                                       | 80 |
| 4.27 | Post-layout simulation result of DFE for 10-Gb/s PRBS7 input with         |    |
|      | channel bandwidth of 1.1-GHz (a) before DFE (b) after DFE (effective      |    |
|      | eye diagram after offset amplifier)                                       | 81 |
| 4.28 | Four-channel de-serialized output of DFE for 10-Gb/s PRBS7 input with     |    |
|      | channel bandwidth of 1.1-GHz                                              | 82 |
| 5.1  | Photographs of (a) fabricated chip (b) chip-on-board with bonding-wires   |    |
|      | (c) printed-circuit board under test                                      | 83 |
| 5.2  | Experimental setup to measure DC characteristic of DAC                    | 84 |
| 5.3  | Measured characteristic of DAC (a) Output voltages versus DAC codes       |    |
|      | (b) Integral non-linearity                                                | 85 |
| 5.4  | Experimental setup to measure eye diagram                                 | 86 |

| 5.5  | Obtained pattern-filtered eye-diagrams (a) 111 and 000 (b) 011 and 100       |     |
|------|------------------------------------------------------------------------------|-----|
|      | (c) 101 and 010 (d) (a)+(b)+(c)                                              | 87  |
| 5.6  | Measured $S_{21}$ parameters of PCB channels 10cm, 20cm, 30cm and 40cm       |     |
|      | long                                                                         | 88  |
| 5.7  | Measured eye diagrams after PCB channels (using oscilloscope) (a) 10cm       |     |
|      | (b) 20cm (c) 30cm (d) 40cm                                                   | 89  |
| 5.8  | Experimental setup for DFE test                                              | 90  |
| 5.9  | Drawn eye diagrams with EOM                                                  | 92  |
| 5.10 | Comparison of vertical eye openings versus DFE coefficients (a) 10cm         |     |
|      | (b) 20cm (c) 30cm (d) 40cm                                                   | 93  |
| 5.11 | Bathtub curves before and after equalization (a) 10cm (b) 20cm (c) 30cm      |     |
|      | (d) 40cm                                                                     | 94  |
| A.1  | (a) Photograph of data-input part of PCB ver.1 with inverted color (b)       |     |
|      | drawn eye diagram of 10-Gb/s PRBS-7 data                                     | 98  |
| A.2  | Illustration of TDR measurement setups of PCB ver. 1 (White area is          |     |
|      | the top copper.) (a) TDR setup #1 (trace-cut after $R_{term,in}$ and no chip |     |
|      | on board) (b) TDR setup #2 (whole PCB but no chip on board) (c) TDR          |     |
|      | setup #3 (whole PCB and chip on board with bonding wires)                    | 99  |
| A.3  | Measured TDR waveforms for three kinds of setups of PCB ver. 1               | 100 |
| A.4  | Transmission line model of PCB ver. 1 (a) segmentation of tapered line       |     |
|      | (b) ADS model with segmented CPWG                                            | 101 |
| A.5  | Simulated TDR waveform                                                       | 102 |

| A.6  | Photograph (left-side) and mask pattern (right-side) of data input part of   |     |
|------|------------------------------------------------------------------------------|-----|
|      | PCB ver. 2                                                                   | 103 |
| A.7  | Model of transmission line on PCB ver. 2 with segmented CPWG and             |     |
|      | DCPWG                                                                        | 104 |
| A.8  | Illustration of TDR measurement setups of PCB ver. 2 (White area is          |     |
|      | the top copper.) (a) TDR setup #1 (trace-cut after $R_{term,in}$ and no chip |     |
|      | on board) (b) TDR setup #2 (whole PCB but no chip on board) (c) TDR          |     |
|      | setup #3 (whole PCB and chip on board with bonding wires)                    | 104 |
| A.9  | Measured TDR waveforms for three kinds of setups of PCB ver. 2               | 105 |
| A.10 | Modified model of transmission line on PCB ver. 2 with additional par-       |     |
|      | asitic components                                                            | 106 |
| A.11 | Parametric sweep results for each parasitic elements (a) $C_{PCB}$ (b) $C_B$ |     |
|      | (c) $L_B$                                                                    | 107 |
| A.12 | Simulated versus measured TDR waveforms (a) PCB ver. 1 (b) PCB ver. 2        | 108 |
| A.13 | Simulated eye diagrams of 10-Gb/s PRBS-7 data (a) PCB ver. 1 (b) PCB         |     |
|      | ver. 2                                                                       | 108 |

### **List of Tables**

| 5.1 | Chip summary                              | 95 |
|-----|-------------------------------------------|----|
| 5.2 | Performance comparison with reported DFEs | 95 |

### ABSTRACT

### A 10-Gb/s Adaptive Decision Feedback Equalizer with On-Chip Eye Opening Monitoring

Chang-Kyung Seong Dept. of Electrical and Electronic Eng. The Graduate School Yonsei University

As semiconductor technology advances, the portion of test costs relatively has been increased while the manufacturing cost of chips continuously reduced. In order to save the test cost for high-speed receiver chips, on-chip eye-opening monitoring (EOM) circuits can be integrated in the chips. By monitoring eye diagrams of interested signals in the chip of itself using these, self-test can be achieved without complicated and lengthy test process.

In this dissertation, a new 10-Gb/s adaptive decision feedback equalizer (DFE) employing EOM scheme is proposed. The EOM scheme is used to make adapt the DFE to printed-circuit board (PCB) channels with various length as well as self-testing ability. The DFE compensates two post-cursors to improve eye-opening and bit-error rate performance without boosting of high-frequency noise. By fully unrolling feedback loops, high-speed operation of 10-Gb/s is achieved without stringent timing constraints. Moreover, four-way interleaving using quarter-rate clocks relieves the speed-burden of sampling and decision circuits.

For adaptivity, additional samplers are embedded in the DFE to get information of the equalized eye diagram. Since loop-unrolled DFEs naturally have no equalized waveform which exists on a single physical node, a new structure, named *look-ahead EOM*, to get *effective* eye diagram from the DFE is proposed. Whereas traditional EOM circuits use two reference voltages and clock signals, proposed EOM employs only one digitally-controlled reference voltage and clock signal. Instead using more analog circuits, digital-post-processes to draw two-dimensional eye-diagram are introduced.

Inter-symbol interference (ISI) of channel can be measured by the proposed adaptation algorithm using a part of information for EOM. While existing ISI monitoring circuit is realized in analog circuit using a switched-capacitor correlator, the proposed algorithm calculates the magnitude of ISI components from the measured eye-diagram using all-digital circuit. From the measured values, equalizer coefficients are finally calculated and applied to the DFE.

A prototype including DFE with EOM samplers, clock generator, and miscellaneous circuits was designed in 90-nm CMOS technology. In post-layout simulation, the functionality and performance of merged circuits and each building blocks were verified. Although digital controller was implemented in field-programmable gate array (FPGA) in this prototype, it can be integrated in CMOS chip with analog circuits. Fabricated chips are directly mounted and wire-bonded to PCB for test. In experiments, it is verified that eye diagrams of equalized signals are successfully obtained. Also, it is verified that the prototype system achieves successful adaptation and equalization for channel-length up to 40cm with 10-Gb/s data. The DFE core occupies  $110 \times 95 \mu m^2$  and consumes 11mW in 1.2V supply voltage.

Key words : High-speed data transmission, adaptive equalizers, decision feedback equalizer, eye-opening monitoring, ISI monitor

# Chapter 1 Introduction

### 1.1 High-Speed Adaptive Equalizers

The increase in computing power and the demand for huge amount of data has driven transmission system to support higher data rate over several Gb/s. As required data rate continue to increase, the problems of limited bandwidth have arisen from several physical effects on channels. In a transmission link, transmitted signal is delivered to receiver without distortion if the frequency response of channel is perfectly flat. However, the channels generally suffer from different responses depending on frequency component, that is frequency-dependent loss. This results in non-zero residual tail on impulse response in time view.

Focusing on wire-lined channels, the frequency-dependent loss generally forms lowpass characteristic due to skin effect and dielectric loss [1]. Figure 1.1 and 1.2 show measured  $S_{21}$  parameters and symbol responses of 10cm, 20cm, 30cm and 40cm long printed-circuit board (PCB) channels, respectively. As the length of channel increases, the loss at high-frequency component increases and the shape of symbol response spreads



Figure 1.1: Measured  $S_{21}$  parameters of 10cm, 20cm, 30cm and 40cm long PCB channels each



Figure 1.2: Measured symbol responses of 10cm, 20cm, 30cm and 40cm long PCB channels each

in time-axis. When the spread response smears on the range of neighbor symbols, it acts as noise called inter-symbol interference (ISI).

Equalizing filters that selectively boost up high-frequency components have been widely used to increase the product of channel length and loss by resolving the problem. Figure 1.3(a) shows basic concept of equalizer to extend bandwidth. With properly designed equalizer, the transfer function of overall system comprising the channel and equalizer is able to extend bandwidth to the target bandwidth. Non-optimal equalizations, however, cause uneven transfer functions. Figure 1.3(b) shows three cases of equalization results: under-, optimum and over-equalization. With under-equalization, overall bandwidth is not extended enough to make eye open completely. On the other hand, when boosting gain of equalizer is excessive, peaking is made in transfer function and eye-opening is also degraded.

The necessity of adaptation ability in equalization arises from two points of view. First, it is desired that the effects of process, voltage and temperature (PVT) variations on implementing equalizing filter are tuned up. Second, it is required that a single equalizer automatically covers various channel environments. Figure 1.4 shows a block diagram of general adaptive equalizer. Adaptive equalizer consists of variable equalizing filter and adaptation block. The adaptation block measures the error between desired signal and equalized output and controls the filter so that the error is removed or reduced.



Figure 1.3: Concept of equalizer for band-limited channel (a) bandwidth extension using equalizer (b) under-, optimum and -over equalization depending on boosting gain of equalizer



Figure 1.4: Block diagram of general adaptive equalizer

#### 1.2 On-Chip Testability and Cost Reduction

Figure 1.5 is a roadmap of silicon manufacturing and test costs [2]. While the manufacturing cost tends to be rapidly reduced, test cost has been maintained or slightly increased. While high-speed transceivers have been widely studied and reported, the efforts to save test cost in this application are not enough.

A fundamental approach to check pass-fail of transmission system is to measure bit-error rate (BER) of the output data of receiver. However, it takes a long time and requires expensive equipment such as BER tester, resulting in the increase of test cost. At 10 Gb/s, for example, it takes at least 100 seconds to check BER of  $10^{-12}$ . One highcost equipment can check pass-fail of only 864 chips a day. Moreover, this approaches does not tell debuggers qualitative information on why that BER was achieved with the system under test.

On the other hand, self-test method using on-chip eye opening monitoring (EOM)



Figure 1.5: Cost of silicon manufacturing and test

technique which plots the eye diagram of signal waveform inside the chip, as shown in figure 1.6, can be a powerful solution to reduce the cost. As reported in [3], BER can be tested with a controllable rectangular mask. In this case, the test time can be saved by extrapolating where the lower BER points will take place as shown in figure 1.7 [4]. Moreover, the obtained eye diagram using EOM can give a intuition to debuggers on operating status of the chips.



Figure 1.6: Concept of on-chip eye-opening monitoring



Figure 1.7: BER estimation by extrapolation

#### **1.3** Outline of Dissertation

The goal of this work is to design high-speed adaptive decision feedback equalizer (DFE) with efficient adaptation algorithm based on EOM technique. For this, a 10-Gb/s adaptive DFE is proposed and its prototype is implemented in CMOS technology. In chapter 2, reported high-speed equalizers and adaptation techniques will be reviewed. Especially DFE and EOM techniques will be focused as a start point of design. The proposed DFE and adaptation algorithm are introduced in chapter 3. Details in circuit design and the results of post-layout simulation are shown in chapter 4. Finally, experimental results and conclusion are given in figure 5 and 6, respectively. Revision process on PCB is additionally described in appendix.

# Chapter 2 Backgrounds and Motivations

#### 2.1 Channel Model

Before the review of existing equalization and adaptation methods, channel model and related notations will be examined. Figure 2.1 shows an output waveform of a band-limited channel when a unit non-return-to zero (NRZ) symbol is applied, or *symbol response* of the channel. The peak of the symbol response is called *cursor* and its magnitude is denoted as  $c_0$ . Considering the channel as a linear time-invariant (LTI) system, the received signal is formed by superposition of the delayed versions of symbol response:

$$r(t) = \sum_{i=-\infty}^{\infty} d_{-i}h(t - iT), \qquad (2.1)$$

where *r* is the received signal,  $d_i$  is *i*-th symbol and  $d_i \in \{-1, 1\}$ , *h* is symbol response of channel, and *T* is symbol duration. The maximum of received signal occurs at the cursor and the decision of symbol should be made here accordingly. Let the phase of cursor be  $t_0$ , Eq. 2.1 can be re-written:

$$r(t_0) = \sum_{i=-\infty}^{\infty} d_{-i}h(t_0 - iT) = \sum_{i=-\infty}^{\infty} d_{-i}\alpha_i,$$
(2.2)

where  $\alpha_i = h(t_0 - iT)$ . We call  $\alpha_i$  post-cursor for positive *i* and pre-cursor for negative, according to its relative position. If the finite number of pre- and post-cursors are considered, the symbol response is truncated:

$$r = \sum_{i=-a}^{b} d_{-i}\alpha_i, \qquad (2.3)$$

where r is the received sample, and a and b are the numbers of pre- and post-cursors, respectively.



Figure 2.1: Band-limited symbol response and notation

#### 2.2 Review of Equalizing Filters

#### 2.2.1 Category of equalizing filters

Figure 2.2 shows a category of equalizing filters. The equalization can be achieved in either time or frequency domains. Since the frequency-domain equalizers require Fourier transform block followed by analog-to-digital coverters (ADC) and inverse Fourier transform block, hardware complexity and power consumption are generally high. With simple low-pass characteristic of wire-line channels, use of frequency-domain equalizers results in over-cost. Therefore, in most high-speed wire-lined applications, equalizations are performed in time domain.

Even in time-domain equalization, receivers employing ADC-based equalizer have been introduced as high-speed ADCs are reported[5, 6]. In spite of robustness and good performance with theoretical backgrounds, large power consumption and circuit area still make the use of ADC impractical due to enormous numbers of pipelining and interleaving to realize high aggregate sampling-rate[7].

Time-domain equalizers without ADC can be divided into two types again: continuousand discrete-time filters. Continuous-time (CT) analog filters partially boost high-frequency components by placing zeros as shown in figure 2.3(a). With design techniques to extend bandwidth, this type of equalizers shows the fastest operation among unit filters [1, 8, 9, 10, 11, 12]. However, the disadvantages are PVT variations on analog devices and large area occupied by passive components such as capacitors and/or inductors. Another type of CT filters are finite-impulse response (FIR) filters with analog tap delay elements shown in figure 2.3(b). Although these have been widely used due to its simplicity and ease of implementation[13, 14, 15], errors on delay and gain of analog tap delay elements due to PVT variations cause inaccurate transfer function.

On the other hand, discrete-time (DT) filters utilize delay elements such as trackand-hold switches (TH), latches, and D-flipflops (DFF). Since the elements are triggered by clock signals of fixed frequencies, robustness and exact delay are achievable. The most widely used types of DT filters are FIR[16, 17] and decision feedback equalizers (DFE). In DT FIR filters shown in figure 2.3(c), the trade-off between the speed and accuracy of T/H is an obstacle in high-speed implementation. When DT FIR filter is placed in transmitter, latches or DFFs can be used as delay elements instead of T/H[18]. Equalization in transmitter, called *pre-emphasis*, can be easily implemented but suffers from reduced voltage swing due to supply voltage limitation.

Above-mentioned CT filters and DT FIR filters are linear systems and do equalize frequency response by either boosting up high-frequency component or attenuating low-frequency component. High-frequency noise power is also boosted in the former case and low-frequency signal power is reduced without change on high-frequency noise power in the latter case, resulting in degradation of signal-to-noise ratio (SNR).



Figure 2.2: Category of widely used equalizers



Figure 2.3: Structures of widely-used equalizing filters (a) continuous-time analog filter (b) FIR filter with analog tap-delay (c) FIR filter with discrete-time tap-delay (d) decision-feedback equalizer

#### 2.2.2 Decision Feedback Equalizers

As shown in figure 2.3(d), DFE basically uses the decided data to eliminate ISI terms. If the gain of decision circuit is sufficient, the decided data must be stuck to one of two voltage levels and incident noises are removed. Therefore no effect of noise is fed back in equalizing process. Because of this good noise performance, DFEs are very interested and widely employed in spite of limitation that it naturally can never deal with pre-cursors.

As is well known, the most challenging aspect in designing DFE is the stringent timing constraint in the first feedback path [19]. That is,

$$T/2 > t_{C2Q} + t_{comb},$$
 (2.4)

where *T* is symbol duration,  $t_{C2Q}$  is clock-to-Q delay of latch and  $t_{comb}$  is settling-time of combiner. To mitigate the lack of timing margin, look-ahead DFEs, also referred to as speculative, loop-unrolled, or partial-response DFEs, have been widely used[]. Figure 2.4 shows a block diagram of 1-tap look-ahead DFE using a full-rate clock. Supposing a channel with only one post-cursor  $c_1$ , the eye diagram of received signal, denoted as r, has four split levels as drawn in figure 2.5(a). Each magnitude of signal at the center phase of eye comes from Eq. 2.3. By adding  $-\alpha_1$  or  $\alpha_1$  to r, two *candidate signals* denoted as  $y_0$  and  $y_1$  are prepared as shown in the block diagram, respectively. Since  $y_1 = r - \alpha_1$ , entire signal level goes down in the node of  $y_1$  as much as  $\alpha_1$ , resulting in zero crossing of two traces of which the previous data were "1" as shown in figure 2.5(b). And *vice versa* in the node of  $y_0$ . Then, data decision is separately made for two candidate signals. One of two candidate data  $S_0$  and  $S_1$  is finally selected by a 2-to-1 multiplexer (MUX). Since the criterion of selection is the previous data, 1-bit delayed data is fed back to the MUX. Assuming that  $t_{C2Q}$  of the three latches are the same, three inputs simultaneously arrive at the MUX. Therefore the timing constraint is relieved as follows [19].

$$T > t_{C2Q} + t_{MUX} + t_{SU}, (2.5)$$

where  $t_{MUX}$  is gate-delay of the MUX and  $t_{SU}$  is setup time of the latch.



Figure 2.4: Block diagram of full-rate 1-tap look-ahead DFE



Figure 2.5: Simplified eye-diagrams of (a) received signal (b) candidate signal  $y_1$  (c)  $y_0$ 

With the loop-unrolling, interleaving scheme is also widely employed to alleviate burden in high-speed clocking. Figure 2.6 is a block diagram of 1-tap look-ahead DFE using half-rate clocks. After making candidate signals in the same way of full-rate case, those are sampled by the rising and falling edges of half-rate clock. Each two candidate data at the same phase are applied to one of two MUXs, respectively. Finally, the decided data from the opposite phase is fed back to the MUX.

The timing diagram of the DFE is shown in figure 2.7. The timing constraint of half-rate DFE does not change from the full-rate case as presented in Eq. 2.5. Although hardware complexity becomes double roughly, two-way interleaving relieves operation speed of sampling elements and MUXs. As the data-rate increases beyond several Gb/s, most DFEs adopt both look-ahead and interleaving structures.



Figure 2.6: Block diagram of half-rate 1-tap look-ahead DFE



Figure 2.7: Timing diagram of half-rate 1-tap look-ahead DFE (a) entire timing diagram (b) detailed timing diagram for *even*-path with circuit delays

Besides, there are several variations on circuit style in implementing the adder (or subtracter) at the front-end of DFE. First, offset amplifiers have been used as a analog adder [20, 21, 22]. Figure 2.8(a) is a schematic of DFE using offset amplifiers without loop-unrolling. Since the settling-time of the amplifier is limited by RC time constant, it is difficult to meet the timing requirement in Eq. 2.4. To reduce the RC time constant, resettable-current-integrating DFE uses clocked PMOS loads instead of resistors as shown in figure(b) [23, 26]. This has advantage of non-settling-time requirement over conventional structure relieving the timing margin. Switched-capacitor also can be used as clocked-subtracter with low power consumption. The schematic of figure (c) is one switched-capacitor section used in quarter-rate 1-tap look-ahead DFE[25]. Note that these types perform the sampling and subtraction at the same time.



Figure 2.8: Implementation of subtracters (a) offset amplifier (b) resettable-current-integrating (c) switched-capacitor
# 2.3 Review of Adaptation Methods

Adaptation algorithms comprise two functions: error-measuring and control. The first thing to be considered is how to define the error. The error can be defined in different ways. That is, the difference of low- and high-frequency powers or amplitudes, mean-squared-error, or eye opening can be considered as the error. Then, adaptation algorithm recursively or instantly controls the equalizer to the direction that the measured error gets smaller or is eliminated.

The adaptation can be designed in either analog or digital. The representative method in analog way is to compare the powers of low- and high-frequency components[1, 11]. Inherent disadvantage of analog circuits, that is sensitiveness to PVT variation, arises in this method too. Wrong cut-off frequencies of low- and high-pass filters may result in non-optimal measure of error.

Excluding the use of ADC, it is difficult to measure the magnitude of error digitally. Alternatively, only the sign of error is frequently utilized in digital adaptation. Sign-sign least-mean-square (SSLMS) algorithm is representative. The sign of input data is multiplied by the sign of error, using exclusive-OR gate, and accumulated on equalizer code[27, 28, 29]. The drawback of SSLMS is the bounce of equalizer code even after completion of adaptation, resulting in non-constant state of equalization. Besides, if multiple coefficients are controlled, SSLMS may result in convergence to wrong point. In other methods, only the sign of error has been employed. In [12], the magnitudes of high- and low-frequency sinusoidal waveform after channel are measured and the gain of equalizer is tuned by one step until those are balanced. In this scheme, however,

pilot sinusoidal signals are required in initial operation. Moreover, whenever channel is changed, the initialization process should be performed.

On-chip EOM technique has been widely used to observe the quality of signals inside chips[30]. Especially in optical communications, this technique has been used to search the optimal decision points [31] or to measure the eye-opening for dispersion compensation[32, 33, 34]. In [33] and [34], analog voltages and manually-tuned phase of clock are used to measure eye-opening. Also, in [32] and [3], a rectangular mask, enclosed by two digitally-controlled voltages and sampling clocks, is set to check whether the target signal violates the mask or not in order to measure mask-error rate. On-chip EOM has been further used in manual adaptation of equalizer by monitoring ISI [35].

The on-chip EOM is the most straight-forward way to observe the current state of receiver. Systematic management on the receivers as well as on-chip test is available by utilizing it. Focusing on equalizer adaptation, the on-chip EOM can be helpful for robust convergence without initial pilots. In severe channel, for example, the DFE using traditional LMS or SSLMS may not be converged without initial pilot due to DFE's error-propagation property. Since the on-chip EOM can notice whether the eye is opened or not, it is possible to tune the DFE up until its eye is opened without pilot. In this way, the EOM can provide flexibility to the receivers.

# 2.4 Aim of Dissertation

In this dissertation, a 10-Gb/s 2-tap DFE with automatic adaptation is designed and implemented in 90nm CMOS technology. All of two feedback loops in the DFE are unrolled and interleaving with quarter-rate clocks are used to achieve high-speed operation

of 10-Gb/s; Namely, a quarter-rate 2-tap look-ahead DFE is proposed. For robustness adaptation and testability, on-chip EOM circuit is employed. Since the existing EOM schemes are not suitable to look-ahead DFEs, a new EOM circuit adoptable to this kind of DFE is proposed. Finally, efficient adaptation algorithm for the proposed hardware is considered.

# Chapter 3 Proposed System

## 3.1 Structure of Proposed Decision Feedback Equalizer

Figure 3.1 shows block diagram of quarter-rate 2-tap look-ahead DFE. The DFE is divided into two stages for convenience, sampling and multiplexing stages. The sampling stage consists of four unit sampler modules (USM):  $USM_{11}$ ,  $USM_{01}$ ,  $USM_{10}$ and  $USM_{00}$ . These share the received signal, *r*, and quarter-rate four-phase clock signals,  $CK_{I+}$ ,  $CK_{Q+}$ ,  $CK_{I-}$  and  $CK_{Q-}$ , but accept different equalizer coefficients each,  $V_{eq11}$ ,  $V_{eq01}$ ,  $V_{eq10}$  and  $V_{eq00}$ .

The USM consists of one subtracter, four samplers and four comparators. In each USM, the differences between r and its equalizer coefficient  $V_{eq}$  is obtained by the sub-tracter. The resulted candidate signal, y, is sampled at four phases by quarter-rate clocks, and samples are digitized to either "high" or "low", respectively. As a result, each USM produces four candidate data for one  $V_{eq}$ , and total sixteen candidate data are generated in sampling stage accordingly. Figure 3.2 shows simplified eye diagrams of r and four candidate signals,  $y_{11}$ ,  $y_{01}$ ,  $y_{10}$  and  $y_{00}$ , assuming two post-cursors. By Eq. 2.3, the



Figure 3.1: Block diagram of quarter-rate 2-tap look-ahead decision feedback equalizer



Figure 3.2: Eye diagrams of received signal and four candidate signals with two postcursors

received samples are split into eight-levels:

$$\begin{array}{c} L_{000} \\ L_{001} \\ L_{001} \\ L_{010} \\ L_{010} \\ L_{011} \\ L_{100} \\ L_{101} \\ L_{101} \\ L_{111} \end{array} \right) = \begin{pmatrix} -1 & -1 & -1 \\ -1 & -1 & +1 \\ -1 & +1 & -1 \\ +1 & -1 & -1 \\ +1 & -1 & +1 \\ +1 & +1 & -1 \\ +1 & +1 & +1 \end{array} \right) \begin{pmatrix} \alpha_2 \\ \alpha_1 \\ \alpha_0 \end{pmatrix},$$
(3.1)

where  $L_{d_{-2}d_{-1}d_0}$  is the magnitude of received sample with bit-pattern of " $d_{-2}d_{-1}d_0$ ". If  $V_{eq11}$  is properly set, for example, two thick traces of which the previous data were "11" have zero-crossing in  $y_{11}$ . In the same manner, each of other candidate signals has zero-crossing of traces with particular past bit-pattern. After making decisions on all candidate signals, only the results of thick traces should be selected in the multiplexing stage for the best BER.

Figure 3.3(a) is a timing diagram of the sampling stage. Only signals of USM11 are shown in the figure and other USMs behave in the same timing. The indexes of symbols are denoted by bracketed numbers. In the front-end of USM, candidate signal  $y_{11[n]}$  is obtained by subtracting  $V_{eq11}$  from  $r_{[n]}$ . Resulted  $y_{11[n]}$  is sampled by four clocks which have equally spaced phases and each sample is digitized by comparator.

In figure 3.1, it is shown that sixteen samples from sampling stage are tied into four groups which are related to the identical equalizer coefficient. In multiplexing stage, these are re-arranged to four other groups generated at the same phases. Figure 3.3(b) presents timing relationship of input and output signals of four MUXs. For example, one of  $S_{I-,11[n]}$ ,  $S_{I-,01[n]}$ ,  $S_{I-,10[n]}$  and  $S_{I-,00[n]}$  is selected as  $D_{O,I-[n]}$  by  $MUX_{I-}$ . Previous two decision data come from  $D_{O,I+[n-2]}$  and  $D_{O,Q+[n-1]}$ , respectively. Since the phases of input data of the MUX should be aligned,  $D''_{O,I+[n-2]}$  and  $D'_{O,Q+[n-1]}$ , which are delayed versions of  $D_{O,I+[n-2]}$  and  $D_{O,Q+[n-1]}$  by 2UI and 1UI each, are used in selection. In the same manner, the rest three channels of equalized data are generated.

Simplified waveforms are exampled in figure 3.4. The received signal r is simply represented by a bold solid-line without any delay and transition effects. Expected magnitude of counterpart symbol, which mean "0" when "1" is transmitted and vice versa, are drawn as thin solid-lines within each symbol duration. Also, the center lines of rand the counterparts are shown as dotted lines. For the optimum equalization, equalization coefficients should be the same with these threshold lines. Focusing on samples by  $CK_{I-}$ , two previous data from other groups are sequentially 11, 10, 01 and 00. Therefore, selected samples by the MUX are sequentially from  $S_{I+,11}$ ,  $S_{I+,10}$ ,  $S_{I+,01}$  and  $S_{I+,00}$  which are colored with gray.

Gray boxes in candidate signals are the regions that will be selected as optimal conditions. An *effective* equalized waveform can be imagined by picking up and aligning waveforms in the boxed regions, even though it does not exist on any physical node. Note that circuits to measure this effective equalized waveform will be introduced in the next sub-section.

| Sampled received signal             | $(r_{l0l})$ $(r_{l1l})$ $(r_{l2l})$ | r <sub>[3]</sub> r <sub>[4]</sub>              | r <sub>[5]</sub>       | r <sub>[6]</sub> r <sub>[7]</sub> | $r_{l_{B_l}}$ $r_{l_{B_l}}$ $r_{l_{B_l}}$ | $r_{l11}$            | r <sub>[12]</sub> r <sub>[13]</sub> | r <sub>114</sub> |
|-------------------------------------|-------------------------------------|------------------------------------------------|------------------------|-----------------------------------|-------------------------------------------|----------------------|-------------------------------------|------------------|
| Candidate signal                    | V1100 V1111 V110                    | 21 V11131 V111                                 | 41 V1151               | V11161 V1177 V                    | 11/BI V11/91 V11                          | 100 V11111           | V11/121 V11/13                      | 3) V110          |
| CK                                  |                                     | 1                                              |                        |                                   | 1                                         |                      |                                     |                  |
| Candidate data by CK                | Su 1                                | 101                                            | S                      | 4 1 1 1 1                         | Su                                        | 14[0]                |                                     | 11[12]           |
|                                     |                                     |                                                |                        | +,11[4]                           | ↓ ••• +,                                  |                      |                                     |                  |
| Candidate data by CK <sub>2</sub>   |                                     | S                                              |                        | S                                 |                                           | S                    |                                     |                  |
|                                     |                                     | SQ+,11[1]                                      |                        | SQ+,11[5]                         |                                           | JQ+,11[9]            |                                     |                  |
| CK <sub>I-</sub>                    |                                     |                                                |                        |                                   | -                                         |                      |                                     | =                |
| Candidate data by CK <sub>I-</sub>  |                                     | S <sub>I-,1</sub>                              | 1[2]                   | SI SI                             | .,11[6]                                   | S                    | I-,11[10]                           | _                |
| CK <sub>Q-</sub>                    |                                     |                                                |                        |                                   |                                           |                      |                                     |                  |
| Candidate data by CKQ-              |                                     |                                                | S <sub>Q-,11[3]</sub>  |                                   | S <sub>Q-,11[7]</sub>                     |                      | S <sub>Q-,11[</sub>                 | 11]              |
|                                     | 101                                 |                                                |                        |                                   |                                           |                      |                                     |                  |
|                                     |                                     | (                                              | (a)                    |                                   |                                           |                      |                                     |                  |
|                                     |                                     | ·                                              | ()                     |                                   |                                           |                      |                                     |                  |
| I/O signals of MUX <sub>I+</sub>    |                                     |                                                | 0                      |                                   | 0                                         |                      |                                     |                  |
| /                                   | 0 +,11[0]<br>C                      |                                                | Su                     |                                   | Si, orr                                   |                      |                                     |                  |
| Candidate data by CK <sub>I+</sub>  | SI+,01[0]                           |                                                | SIL                    |                                   | St. 40(2)                                 |                      |                                     |                  |
|                                     | Superior                            |                                                | Super-                 |                                   | St. pores                                 | ╡                    |                                     |                  |
|                                     | J+,00[0]                            |                                                | 01+,00[4]              |                                   | JI+,00[8]                                 |                      |                                     |                  |
| 1-bit previous data                 |                                     |                                                | D' <sub>O,Q-[3]</sub>  |                                   | D' <sub>O,Q-[7]</sub>                     |                      |                                     |                  |
| 2-bit previous data                 |                                     |                                                | D'' <sub>0,I-[2]</sub> |                                   | D'' <sub>O,I-[6]</sub>                    |                      |                                     |                  |
| Decided data                        |                                     |                                                | D <sub>O,I+[4]</sub>   |                                   | D <sub>O,I+[8]</sub>                      |                      |                                     |                  |
| I/O signals of MUX                  |                                     |                                                |                        |                                   |                                           |                      |                                     |                  |
| (                                   | So+                                 | 11[1]                                          | So                     | + 11[5]                           | So+ 110                                   |                      |                                     |                  |
|                                     | S <sub>0+</sub>                     | 01[1]                                          | So                     | + 01(5)                           | SO+ 01/9                                  |                      |                                     |                  |
| Candidate data by CK <sub>Q+</sub>  | So+ 10[1]                           |                                                | S <sub>Q+,10[5]</sub>  |                                   | S <sub>Q+,10[9]</sub>                     |                      |                                     |                  |
|                                     | S <sub>Q+00[1]</sub>                |                                                | S <sub>Q+ 00[5]</sub>  |                                   | S <sub>Q+,00[9]</sub>                     |                      | ••                                  |                  |
| 1 bit provious data                 |                                     |                                                | D,                     |                                   | D'                                        |                      |                                     |                  |
| 2 bit previous data                 |                                     |                                                | D''                    | 0,1+[4]                           | D 0,1+[8]                                 |                      |                                     |                  |
| 2-bit previous data                 |                                     |                                                | 0                      | 0,Q-[3]                           | D 0,Q-[7                                  |                      |                                     |                  |
| Decided data                        |                                     | (                                              | Dc                     | ,Q+[5]                            | D <sub>O,Q+[9]</sub>                      |                      |                                     |                  |
| I/O signals of MUX <sub>I</sub> .   |                                     |                                                |                        |                                   |                                           |                      |                                     |                  |
| (                                   |                                     | S <sub>I-,11[2]</sub>                          |                        | S <sub>I-,11[6]</sub>             | s                                         | I-,11[10]            |                                     |                  |
| Candidate data by CK.               |                                     | S <sub>I-,01[2]</sub><br>S <sub>I-,00[2]</sub> |                        | S <sub>I-,01[6]</sub>             | S                                         | I-,01[10]            |                                     |                  |
| Candidate data by CN.               |                                     |                                                |                        | S <sub>I-,10[6]</sub>             | S                                         | I-,10[10]            |                                     |                  |
| \                                   |                                     |                                                |                        | S <sub>I-,00[6]</sub>             | S                                         | I-,00[10]            | $\supset \cdots$                    |                  |
| 1-bit previous data                 |                                     |                                                |                        | D' <sub>O,Q+[5]</sub>             | D                                         | O,Q+[9]              |                                     |                  |
| 2-bit previous data                 |                                     |                                                |                        | D'' <sub>O,I+[4]</sub>            |                                           | ,<br>O,I+[8]         | 5                                   |                  |
| Decided data                        |                                     |                                                | _                      | Dourm                             |                                           |                      |                                     |                  |
|                                     |                                     |                                                |                        | 0,1-[0]                           |                                           |                      |                                     |                  |
| I/O signals of MUX <sub>Q</sub> .   | ,                                   |                                                |                        |                                   |                                           |                      |                                     |                  |
| Candidate data by CK <sub>Q</sub> . |                                     | S <sub>Q-1</sub>                               | 1[3]                   | SQ-,1                             | 1[7]                                      | S <sub>Q-,11[</sub>  |                                     |                  |
|                                     |                                     | SQ-,01[3]                                      |                        | SQ-,0                             | 1[7]                                      | S <sub>Q-,01[0</sub> | 1]                                  |                  |
|                                     |                                     | 5 <sub>Q-,1</sub>                              | 0[3]                   | 5 <sub>Q-,1</sub>                 |                                           | SQ-,10[*             |                                     |                  |
|                                     | \                                   | 5 <sub>Q-,0</sub>                              | 10[3]                  | <u> </u>                          | 0[7]                                      | SQ-,00[0             | •                                   | ••               |
| 1-bit previous data                 |                                     |                                                |                        | D'0.                              | -[6]                                      | D' <sub>0,I-[1</sub> | 0]                                  |                  |
| 2-bit previous data                 |                                     |                                                |                        | D''o,0                            | 2+[5]                                     | D'' <sub>O,Q1</sub>  | [9]                                 |                  |
| Decided data                        | <b>•</b> ,,,,►                      |                                                |                        | D <sub>0.0</sub>                  | -[7]                                      | D <sub>0,Q-[1</sub>  | 1]                                  |                  |
|                                     |                                     |                                                |                        |                                   |                                           |                      |                                     |                  |

Figure 3.3: Timing diagram of quarter-rate 2-tap look-ahead decision feedback equalizer (a) sampling stage (only *USM11* signals shown) (b) multiplexing stage



Figure 3.4: Example waveforms of quarter-rate 2-tap look-ahead decision feedback equalizer

# 3.2 Look-ahead Eye-Opening Monitoring

As previously mentioned, look-ahead DFEs naturally have no *equalized* waveform existing on a single node. DFEs that unrolls N taps generally prepare  $2^N$  candidate signals in different nodes and one of them is selected according to past bit-pattern. This is why conventional on-chip EOM can not be adopted by look-ahead DFEs. Since unselected candidate signals are dropped out in multiplexing process, past bit-pattern also should be considered in eye monitoring. Therefore, a new method named *look-ahead EOM* is proposed.

First of all, two-dimensional scanning function is basically necessary to obtain the effective eye diagram in look-ahead DFE. Since vertical and horizontal axes are voltage and time, respectively, adjustable reference voltages and phases are required. Figure 3.5 is a block diagram of the DFE with EOM, representing additional elements in bold fonts and thick lines. While a variable reference voltage  $V_{EOM}$  is for vertical scan, a new clock signal  $CK_{EOM}$  that has adjustable phase range of 1-UI is for horizontal scan. And the phase of  $CK_{EOM}$ , denoted by  $\theta_{EOM}$ , leads or lags  $\theta_{I-}$  by 0.5-UI in maximum, respectively.

One more subtracter, sampler and comparator are added in modified USM. By additional subtracter the difference of y and  $V_{EOM}$  is obtained. At target phase for EOM, the resulted waveform is sampled and digitized. Four digital values from four unit sampler modules are applied additional MUX. Since monitoring is performed around the phase of  $CK_{I-}$ ,  $D_{O,I+}$  and  $D_{O,Q+}$  are used as selection data. Figure 3.6 shows an example when last two bits were both "0" and present bit is "1" with arbitrary  $V_{EOM}$  and  $\theta_{EOM}$ . Although different results are mixed up, "High" value from  $y_{00}$  is selected due to "00" pattern.

Timing problem should be considered in eye monitoring. While  $\theta_{EOM}$  is variable, the phases of other clocks are fixed. Therefore the phase skews between candidate data varies from -0.5UI to 0.5UI. Figure 3.7 shows a timing diagram for eye monitoring in multiplexing stage, focusing on  $CK_{I-}$ . The samples by  $CK_{EOM}$  have jitter of 1-UI and timing margin of 3-UI in the view of  $CK_{I-}$ . When these are multiplexed by two previous data, result data has also the jitter of 1-UI. In circuit implementation,  $CK_{EOM}$ of half frequency is used to resolve this problem. The detail is described in section 4.









Figure 3.6: Example of look-ahead eye monitoring with recent bit pattern of "001"

| Sampled Received signal ( <u>r</u> [2] ( <u>r</u> [3] ( <u>r</u> [4] ( <u>r</u> [5] ( <u>r</u> [6] ( <u>r</u> [7] ( <u>r</u> [8] ( <u>r</u> [9] ( <u>r</u> [10] ( <u>r</u> [11] ( <u>r</u> [12] ( <u>r</u> [13] ( <u>r</u> [14] |                                                        |                         |     |                         |  |                         |  |     |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|-------------------------|-----|-------------------------|--|-------------------------|--|-----|
| Candidate signal                                                                                                                                                                                                                | $\begin{array}{ c c c c c c c c c c c c c c c c c c c$ |                         |     |                         |  |                         |  |     |
| CK <sub>I-</sub>                                                                                                                                                                                                                |                                                        |                         |     |                         |  |                         |  |     |
| Candidate data by CK                                                                                                                                                                                                            |                                                        | S <sub>I-,11[2]</sub>   |     | S <sub>I-,11[6]</sub>   |  | S <sub>I-,11[10]</sub>  |  |     |
| CK <sub>EOM</sub>                                                                                                                                                                                                               |                                                        |                         |     |                         |  |                         |  |     |
|                                                                                                                                                                                                                                 |                                                        | S <sub>EOM,11[2]</sub>  |     | S <sub>EOM, 11[6]</sub> |  | S <sub>EOM,11[10]</sub> |  |     |
| Candidate data by CK-                                                                                                                                                                                                           |                                                        | S <sub>EOM,01[2]</sub>  |     | S <sub>EOM,01[6]</sub>  |  | S <sub>EOM,01[10]</sub> |  |     |
| Calluluate data by CREOM                                                                                                                                                                                                        |                                                        | S <sub>EOM, 10[2]</sub> |     | S <sub>EOM, 10[6]</sub> |  | S <sub>EOM,10[10]</sub> |  |     |
|                                                                                                                                                                                                                                 |                                                        | S <sub>EOM,00[2]</sub>  |     | S <sub>EOM,00[6]</sub>  |  | S <sub>EOM,00[10]</sub> |  | ••• |
| 1-bit previous data                                                                                                                                                                                                             |                                                        | D ' <sub>O,Q+[1]</sub>  |     | D ' <sub>O,Q+[5]</sub>  |  | D ' <sub>O,Q+[9]</sub>  |  |     |
| 2-bit previous data                                                                                                                                                                                                             |                                                        | D´´ <sub>O,I+[0]</sub>  |     | D´´ <sub>O,I+[4]</sub>  |  | D ′ <sub>O,I+[8]</sub>  |  |     |
| Decided data                                                                                                                                                                                                                    |                                                        | D <sub>O,EOM[2]</sub>   |     | D <sub>O,EOM[6]</sub>   |  | D <sub>0,EOM[10]</sub>  |  |     |
|                                                                                                                                                                                                                                 |                                                        |                         | 1UI | 3UI                     |  |                         |  |     |
|                                                                                                                                                                                                                                 | Phase variation Timing margin                          |                         |     |                         |  |                         |  |     |

Figure 3.7: Timing diagram for eye monitoring in multiplexing stage

The flow of drawing eye diagram is shown in figure 3.8. Suppose an eye diagram of received data in one UI as drawn in figure 3.8(a). The step size of grid is  $\Delta V_{EOM}$  in vertical and  $\Delta \theta_{EOM}$  in horizontal, and the ranges are from -1 to 1 in vertical and from 0 to 1 in horizontal, respectively.

Two-dimensional scanning is performed by triple-loop process as presented in a flow-chart in figure 3.8(b). When the process begins, both  $V_{EOM}$  and  $\theta_{EOM}$  are initially set to the minimum. The most inner loop is to count  $n_h$  up during a given clock counts  $N_S$ . For lower-half in eye diagram domain ( $V_{EOM} < 0$ ),  $n_h$  is increased by 1 whenever the present DFE output  $D_{o,I-}$  is 0 and  $D_{o,EOM}$  is 1. For upper-half, vice versa.  $D_{o,EOM}$  goes to high when the received signal is larger than  $V_{EOM}$  at the moment of sampling. Therefore,  $n_h$  means how many times the received signal passes above  $V_{EOM}$  at  $\theta_{EOM}$  during  $N_S$  in the lower-half case, and vice versa in the upper-half case. With this separated operation, continuous CH can be obtained as shown in figure 3.9. At the end of this loop,  $n_h$  is stored in a proper position of a histogram matrix H.

The middle loop is for vertical sweep by increasing  $V_{EOM}$  by one step. In this loop, therefore, the most inner loop is repeated for all possible  $V_{EOM}$ 's. Whenever the loop is completed, a column of H that corresponds to the present  $\theta_{EOM}$  is filled with measured histogram. At the end of this loop, all of  $V_{EOM}$ ,  $n_h$  and  $n_s$  are reset.

The most outer loop is for horizontal sweep by increasing  $\theta_{EOM}$  by one step. The middle loop is repeated for all possible  $\theta_{EOM}$ 's again. After all process in the flow chart is completed, a set of  $n_h$  is obtained. We call it *cumulative histogram* (CH). It seems to be similar with Cumulative Distribution Function (CDF) in probability theory. By differentiating obtained CH in the direction of digital-to-analog converter (DAC) code

and take the absolute values of those, *distribution histogram* (DH) is acquired as shown in figure 3.9.

Three-dimensional plots in figure 3.10 are examples of CH and DH, respectively, which are drawn by a real CMOS chip in the past work. A single UI is divided into 16 phases and 32 voltage levels, respectively, and  $N_S$ =8191. A basin shape in the DH is an eye-opening of the received signal. Note that the non-monotonicity of CH causes non-zero values in the eye-opening region even with perfectly opened eye diagram. This problem comes from the use of only one voltage and clock for scanning unlike two voltages and clocks forming a variable mask in [3, 32]. In this work, the problem is resolved by post-process in adaptation algorithm without additional analog circuit. It will be introduced in the next section.



Figure 3.8: Process of drawing eye diagram (a) assumption on grid (b) flow chart, where  $N_S$  is a target duration,  $n_s$  is a clock count,  $n_h$  is the number of "1" in  $D_{o,EOM}$ , and H is a histogram matrix



Figure 3.9: Example of processes to obtain cumulative and distribution histogram



Figure 3.10: Example of cumulative histogram and distribution histogram in 3-D plot

# 3.3 Adaptation of Decision Feedback Equalizer Using ISI Monitor

The magnitude of post-cursors are shown in Eq. 3.1. The ISI of channel can be found by solving the equation. Although a number of combination are available, arbitrary three equations are selected in this work:

$$L_{011} = -\alpha_2 + \alpha_1 + \alpha_0, L_{101} = +\alpha_2 - \alpha_1 + \alpha_0, L_{111} = +\alpha_2 + \alpha_1 + \alpha_0.$$
(3.2)

By solving these,

$$\alpha_2 = (L_{111} - L_{011})/2, \alpha_1 = (L_{111} - L_{101})/2,$$
(3.3)

that is, if the signal levels for particular patterns are measured, the symbol response can be known.

Then, the DFE coefficient  $V_{eq11}$  should be set as the median of two received signal levels  $L_{111}$  and  $L_{110}$ :

$$V_{eq11} = \frac{L_{111} + L_{110}}{2} = \alpha_2 + \alpha_1.$$
(3.4)

In the same manner, the rest coefficients can be obtained:

$$\left. \begin{array}{l} V_{eq01} = +\alpha_2 - \alpha_1 \\ V_{eq10} = -\alpha_2 + \alpha_1 \\ V_{eq00} = -\alpha_2 - \alpha_1. \end{array} \right\}$$
(3.5)

In [35], ISI is monitored using a switched-capacitor correlator. Since the ISI is measured in analog voltage, it is difficult that the outputs are post-processed and utilized to adapt equalizer automatically. For this reason, the outputs of ISI monitor were only used to manually tune equalizer. The proposed scheme, on the other hand, digitally measures the ISI and the output digital codes can be directly applied to the DFE.

Assume that  $L_{011}$ ,  $L_{101}$ , and  $L_{111}$  are digitally measured and thus mapped to corresponding digital codes  $C_{011}$ ,  $C_{101}$ , and  $C_{111}$ , respectively. Then, Eq. 3.3 is rewritten as follows.

$$c_{2} = (C_{111} - C_{011})/2, c_{1} = (C_{111} - C_{101})/2.$$
(3.6)

where  $c_1$  and  $c_2$  are digital codes mapped to post-cursors,  $\alpha_1$  and  $\alpha_2$ , respectively.

By denoting digital codes mapped to  $V_{eq11}$ ,  $V_{eq10}$ ,  $V_{eq01}$  and  $V_{eq00}$  to  $C_{eq11}$ ,  $C_{eq10}$ ,  $C_{eq01}$  and  $C_{eq00}$ , adaptation process can be considered in digital manner. That is,

$$\begin{array}{c}
C_{eq11} = +c_2 + c_1, \\
C_{eq01} = +c_2 - c_1, \\
C_{eq10} = -c_2 + c_1, \\
C_{eq00} = -c_2 - c_1.
\end{array}$$
(3.7)

In order to measure  $C_{011}$ ,  $C_{101}$ , and  $C_{111}$ , the DHs for each symbol pattern should be obtained separately. We call this method *pattern-filtered EOM*. Figure 3.11 (a) shows flow chart of sequential pattern-filtered EOM process. At the start of the process,  $C_{EOM}$ which is the digital code mapped to  $V_{EOM}$  is initialized to the minimum and  $\theta_{EOM}$  is set to the sampling phase; This is because the pattern-filtered EOM is performed only at the sampling phase ( $\theta_{I-}$ ). A function named *pattern-filtered sample count (PFSC)* is defined in figure (b). This function counts up  $D_{o,EOM}$  when recent three bit pattern matches to the target pattern for  $N_S$  times. For lower-half of  $C_{EOM}$ , it counts up when  $D_{o,EOM}$  is high and the recent three bit pattern is complement of target pattern. But vice versa for upper-half.

In real, DHs may be dispersed due to unwanted side effects such as thermal noise, residual ISI components, and jitter on sampling clock. Therefore, it is proper to use the

mean of DH as the signal level. At the end of the function, the product of  $C_{EOM}$  and the difference of  $n_h$  and  $n'_h$  (buffered version of  $n_h$ ) is accumulated to m instead of storing  $n_h$  to memory array. That is,

$$C_{\times \times 0} = \frac{m}{N_S} = \frac{\sum_{C_{EOM}=min.}^{half} C_{EOM} \cdot |n_h - n'_h|}{N_S}$$
(3.8)

for lower-half of EOM range, and

$$C_{\times \times 1} = \frac{m}{N_S} = \frac{\sum_{C_{EOM} = half + 1}^{max.} C_{EOM} \cdot |n_h - n'_h|}{N_S}$$
(3.9)

for upper-half.

Pattern-filtered EOM can also be performed in parallel for three kinds of bit patterns. In this case, hardware complexity becomes roughly triple but adaptation time is reduced to one third compared to the sequential case.

Estimated gate count and adaptation time is compared in figure 3.12, assuming 255 samples per one point and the use of 5-bit DACs. These parameters come from discussions in following section 3.4 and chapter 5.



Figure 3.11: Simplified flow chart of sequential pattern-filtered eye opening monitoring (a) total flow chart (b) definition of pattern-filtered sample count function

|                                  |              |                           | Sequential     |                            | Parallel       |     |  |
|----------------------------------|--------------|---------------------------|----------------|----------------------------|----------------|-----|--|
|                                  |              |                           | Spec.<br>(bit) | Ea.                        | Spec.<br>(bit) | Ea. |  |
| Hardware                         | Registers    | Ns                        | 8              | 1                          | 8              | 3   |  |
|                                  |              | n <sub>s</sub>            | 8              | 1                          | 8              | 3   |  |
|                                  |              | n <sub>h</sub>            | 8              | 1                          | 8              | 3   |  |
|                                  |              | n' <sub>h</sub>           | 8              | 1                          | 8              | 3   |  |
|                                  |              | m                         | 14             | 1                          | 14             | 3   |  |
|                                  | Multiplier   |                           | 8×5            | 1                          | 8×5            | 1   |  |
|                                  | Addei        | 14×8                      | 1              | 14×8                       | 1              |     |  |
|                                  | Subtracter ( | 8×8                       | 1              | 8×8                        | 3              |     |  |
|                                  | Total        | 500 gates<br>+ Multiplier |                | 1332 gates<br>+ Multiplier |                |     |  |
| Adaptation time<br>(at 312.5MHz) |              |                           | ~626µs         |                            | ~208µs         |     |  |

\* 1 Flipflop = 8 NANDs \* *N*-bit Adder = *N* full-adders = *N*\*6 NANDs

Figure 3.12: Estimated gate count and adaptation time for sequential and parallel patternfiltered EOM

# 3.4 Decision of design parameters

#### 3.4.1 Resolution of DAC

DACs are used to generate DFE coefficients and  $V_{EOM}$ . As DAC resolution increases, the accuracies in EOM and equalization are improved. However, since high resolution DACs occupy large area, proper number of bit should be considered.

In EOM process, a comparator compares one of candidate signals and  $V_{EOM}$  from DAC. And the minimum voltage difference that the comparator can distinguish within specified time is limited. Even if the resolution of DAC is higher than that of comparator, the comparator is not able to sense all changes of DAC outputs.

In this work, clocked-sense amplifier with resolution of 14mV is used as comparators and DACs with the output range of 600mV are designed; The structure of circuits and simulation result will be shown in chapter 4. Therefore, DAC resolution is determined as 5 bits, or 32 levels.

#### 3.4.2 Number of samples

Larger number of samples yields smaller variance of sample mean but more hardware complexity and longer acquisition time. Following calculation can be a guide to determine the number of samples.

For simple analysis, it is assumed that one pattern-filtered DH has Gaussian distribution. With the 99% confidence interval,

$$\mu_0 - 2.58 \frac{\sigma_0}{\sqrt{N_S}} \le \mu \le \mu_0 + 2.58 \frac{\sigma_0}{\sqrt{N_S}},\tag{3.10}$$

where  $\mu_0$  and  $\sigma_0$  are the mean and standard deviation of a population, respectively,  $\mu$  is the mean of samples of the population, and  $N_S$  is the number of samples. Therefore, the range of sample mean,  $\Delta\mu$ , can be represented as

$$\Delta \mu \ge 2 \cdot 2.58 \frac{\sigma}{\sqrt{N_S}}.\tag{3.11}$$

Here, standard deviation of samples,  $\mu$ , can take place of  $\mu_0$  in general, assuming large number of samples enough. Then, the minimum  $N_S$  can be calculated as follows.

$$N_S \ge 26.63 (\frac{\sigma}{\Delta \mu})^2. \tag{3.12}$$

If it is desired that  $\Delta \mu$  does not exceed the minimum step of DAC output voltage, or 1 LSB,  $\sigma$  is normalized to  $\sigma_{LSB}$ .

$$N_S \ge 26.63\sigma_{LSB}^2,\tag{3.13}$$

Therefore, the minimum number of samples can be determined by the degree of dispersion.

# **Chapter 4**

# **Circuit-Level Design and Post-Layout Simulation Results**

# 4.1 Configuration of System

#### 4.1.1 Overall structure

A configuration of prototype system is shown in figure 4.1. Quarter-rate 2-tap DFE with EOM, clock generator, DACs, and bias generator are designed in 90nm CMOS technology. In this prototype, however, digital controller is implemented in field programmable gate array (FPGA) for flexibility in designing and testing. For cooperation between CMOS and FPGA parts, miscellaneous circuits such as code deserializer, decimator and input/output signal-ended buffers are additionally designed.

The DFE equalizes 10-Gb/s input signal it using quarter-rate four-phase clocks from a clock generator and four different equalizer coefficients from 5-bit DACs, respectively. Also, its eye diagram is scanned using  $CK_{EOM}$  and  $V_{EOM}$ . Each phase of two kinds of clock signals is controlled by 18-bit phase code from digital controller in FPGA. And, output voltages of five DACs are determined by 5-bit DAC codes in the same way. Due to the lack of pin counts, it is difficult to apply tens of bits for digital control in parallel from controller to the CMOS chip. Instead, code deserializer is used to parallelize serially transmitted code with only four pins.

From the DFE, four-channel 2.5-Gb/s parallel data and 1.25-Gb/s sample for eye monitoring are generated. Only one channel of DFE outputs is selected by 4-to-1 multiplexer and passes through Current-Mode Logic (CML) output buffer for observation. The channel selection is done by 2-bit code named *OUTSEL*. A decimator is used because of the speed limit in link from chip to FPGA. The output data from DFE are down-sampled into the rate of 78.125 Mb/s by  $CK_D$ , which is made in the clock generator. The resulted data are transmitted to FPGA through single-ended CMOS output buffers which convert the swing level from 1.2V to 2.5V. From the FPGA, inversely, the serial code and related signals ingress into the CMOS chip through single-ended CMOS input buffers which convert from 2.5V to 1.2V. Between the CMOS chip and FPGA, LVDS-to-CMOS and CMOS-to-LVDS buffers are inserted in order to help high-speed interconnection, respectively.



Figure 4.1: Overall configuration of prototype system

#### 4.1.2 Signal range and bias scheme

Before mentioning building blocks in detail, a target signal range and bias scheme should be considered. As shown in figure 4.2, the range of eye monitoring has to cover the range of target signal. The target signal that will be monitored comes out from offset amplifiers, which will be introduced in the next subsection. On the other hand, scanning voltage  $V_{EOM}$  comes from the DAC. Therefore the output range of two circuits are closely related.



Figure 4.2: Monitoring range

In this work, replica-bias scheme is used to control vertical monitoring range. In figure 4.3, schematics of 5-bit DAC and replica bias generator are shown. The DAC comprises 31 current paths with cascoded current sources and switches, and a resistor. A 5-bit binary-weighted DAC code linearly controls the number of current sources turned on, thus the total current and voltage drop on the resistor are also linearly changed in turn. The minimum output voltage occurs when all NMOS switches are turned on and

the maximum current flows.

The replica bias generator is constructed by replicating the DAC with all current paths activated. Negative feedback loop generates a proper bias voltage  $V_{bias}$  that makes  $V_{OUT,min}$  same to a target swing voltage  $V_{SW}$ , which is applied from outside the chip. That is, the following equation is always satisfied.

$$V_{OUT,min} = V_{SW} = V_{DD} - R_0 I_0, (4.1)$$

where  $R_0$  and  $I_0$  are the load resistance and the maximum bias current of DAC, respectively. By applying the generated  $V_{bias}$  to the DAC, the overall output range of the DAC becomes from  $V_{SW}$  to  $V_{DD}$ .

Figure 4.4 (a) shows a plot of output voltages versus DAC codes in a post-layout simulation with nominal  $V_{SW}$  of 0.6V. The output voltage is proportional to the DAC code and the minimum output voltage is 0.6017V. Figure (b) is simulated integral non-linearity (INL). Since the absolute voltage level is interested in the eye monitoring application, INL error is more important than differential non-linearity (DNL) error. In the result, the INL errors are within the range from +0.155 to -0.09 LSB.



Figure 4.3: Schematics of 5-bit digital-to-analog converter and replica bias generator to ensure the range of output voltage of digital-to-analog converter (a) digital-to-analog converter (b) replica bias generator



Figure 4.4: Post-layout simulation results of DAC with replica bias generator when  $V_{SW} = 0.6$  (a) Output voltages versus DAC codes (b) integral non-linearity

### 4.2 Decision Feedback Equalizer and Its Building Blocks

Figure 4.5 shows a schematic of quarter-rate 2-tap look-ahead DFE. Four details are added compared to the block diagram in figure 3.5. First, received signal r is differential. Second, four equalizer coefficients also have pseudo-differential relationship.  $V_{eq11}$ and  $V_{eq00}$  are controlled in the opposite direction with a certain common-mode level, and so are  $V_{eq01}$  and  $V_{eq10}$ . Third, three kinds of clocks are shown for each phase. Finally, delay elements in the multiplexing stage are eliminated by using inherent delays of 4-to-1 MUX.



Figure 4.5: Block diagram of quarter-rate 2-tap look-ahead decision feedback equalizer

#### 4.2.1 Unit sampler module

Schematic of the unit sampler module is shown in figure 4.6. The difference of input data  $D_{in+/-}$  and a pair of equalizer coefficient  $V_{eq+/-}$ , denoted as  $y_{+/-}$ , is differentially generated by offset amplifier. Although resettable-current-integrating combiner or switched-capacitor in figure 2.8 are efficient in power consumption, those are not suitable for EOM-compatible DFE. Recall that these types perform the sampling and subtraction at the same time. In order to monitor two-dimensional eye of equalized waveform, the candidate signals should exist in analog waveform and the subtraction should be prior to the sampling accordingly. Otherwise only vertical eye-opening at the decision phase can be monitored. Therefore the offset amplifier is chosen as a subtracter in the proposed DFE. Since fully loop-unrolled structure is employed, the circuit delay of offset amplifier is not important.

In order to sample the difference and make decisions, five branches which consist of T/H, Clocked-Sense Amplifier (CSA), and CMOS DFF are attached to the output of offset amplifier. After  $y_+$  and  $y_-$  are held by T/H at each sampling time, CSAs make decisions to either "1" or "0". Note that the sampling time is the moment that the hold process begins in T/H, not the sensing begins in CSA. Therefore, it is required to match loading on all five clock signals for T/H. Four-phase clocks for T/H have the same loads, which are two gate nodes of NMOS and PMOS each. Since  $CK_{EOM+,TH}$ and  $CK_{EOM-,TH}$  drive only two gate nodes of NMOS and two gate nodes of PMOS, respectively, dummy switches including two NMOS and PMOS transistors are added.



Figure 4.6: Schematic of unit sampler module
### 4.2.2 Offset amplifier

Figure 4.7 shows schematic of the offset amplifier. An ideal offset amplifier performs differential subtraction for two input signals. That is,

$$V_{out+} - V_{out-} = (D_{in+} - D_{in-}) - (V_{eq+} - V_{eq-}).$$
(4.2)

The offset amplifier is designed to have the identical maximum output swing with the DAC using replica bias scheme (refer to Eq. 4.1):

$$R_{amp}I_{amp} = R_0 I_0, \tag{4.3}$$

where  $R_{amp}$  and  $I_{amp}$  are the load resistance and total bias current of offset amplifier, respectively. Figure 4.8(a) and (b) shows the results of DC sweep with various equalizer coefficient in post-layout simulation. The common-mode voltages of both two input signals are assumed as 900mV. In difference-mode plot as shown in figure (b), although the transfer curves show non-linearity especially for large swing of input signals, the zero-crossing points are exactly apart from the origin by the differences of equalizer coefficients.

Figure 4.8(c) shows the results of AC and transient simulation with actual loads as shown in figure 4.6. The 3dB-bandwidth is measured as about 4GHz and clean eye diagram is observed.



Figure 4.7: Schematic of offset amplifier



Figure 4.8: Post-layout simulation results of offset amplifier at  $D_{in,com} = V_{th,com} = 0.9V$  (a) DC simulation (b) DC simulation, difference-mode plot (c) AC and transient simulation

#### 4.2.3 Track-and-hold switch and clocked-sense amplifier

Figure 4.9 shows a schematic of CSA. After the T/H switch holds the output of offset amplifier, the CSA amplifies it with large voltage gain. When clock goes to "low", two current-paths begin to flow different amount of current according to the sign of input differential voltage. The difference of current makes the output voltages of cascoded latch split into opposite directions. These outputs are buffered by each single-ended inverter. On the other hand, when clock goes to "high", both output voltages are stuck to *VDD* by pull-up transistors, resulting in equalized output.

Post-layout simulation of the sampling path in USM is performed. Entire topology of USM in figure 4.6 is used in this simulation. Figure 4.10 shows the result. When the output swing of offset amplifier is 14mV, the CSA amplifies it with 2.5-GHz clock signal.



Figure 4.9: Schematic of clocked sense amplifier



Figure 4.10: Post-layout simulation result of sampling path in unit sampler module

# 4.3 Clock Generator and Its Building Blocks

Block diagram of clock generator is shown in figure 4.11. The clock generator receives 2.5-GHz differential reference clock and generates the clock signals of certain target phases. It consists of four-phase clock generator, 2-to-1 MUXs, phase interpolators (PI), differential-to-single-ended converter (D2S), and clock trees.



Figure 4.11: Block diagram of clock generator

First, the four-phase clock generator makes quadrature-phase clock signals using the reference clock. Buffered and distributed four-phase clocks are used to synthesize a new target phase. In each clock paths, two phases are selected by two MUXs, respectively.

The followed PI generates a new phase by mixing them up with a certain ratio. Finally the D2S converts PI output to single-ended signals having CMOS swing-level.

For MUXs in the path of  $CK_{Q+}$  and  $CK_{Q-}$ , the order of applied phases are shifted by 90° comparing with the path of  $CK_{I+}$  and  $CK_{I-}$ . Therefore the phases of  $CK_{Q+}$ and  $CK_{Q-}$  automatically lag than those of  $CK_{I+}$  and  $CK_{I-}$  by 90° even with the same control code  $C_{PH,D}$ , respectively.  $CK_{EOM+}$  and  $CK_{EOM-}$  are made by independent control code  $C_{PH,EOM}$ .

The clock trees divide each clock into several versions such as  $CK_{TH}$ ,  $CK_{CSA}$  and  $CK_{DFF}$  and distribute them to the DFE.

## 4.3.1 Four-phase clock generator

The four-phase clocks are generated by sequentially delaying the reference clock. In figure 4.12, the amount of delay is controlled by capacitance of NMOS transistor. The variation of  $V_{DCON}$  effects on voltage drop across the gate and active region of transistor, leading to variation on capacitance and delay. Note that thick-gate transistors are used to reduce leakage currents. By post-layout simulation, delay versus  $V_{DCON}$  for three process corners are estimated as shown in figure 4.13. For all corners the range of delay from 88.8ps to 126ps is achieved.



Figure 4.12: Schematic of four-phase clock generator



Figure 4.13: Delay versus  $V_{DCON}$  for three process corners in post-layout simulations

#### 4.3.2 Phase interpolator and differential-to-single-ended converter

The PI is an analog adder for two input signals with variable weight:

$$V_{PI,out} = \alpha \cdot V_{PI,in,I} + (1 - \alpha) \cdot V_{PI,in,Q}, \tag{4.4}$$

where  $V_{PI,out}$  is an output of PI,  $V_{PI,in,I}$  and  $V_{PI,in,Q}$  are four-phase sinusoidal input signals of PI, and  $\alpha$  is a PI coefficient within 0 to 1. For the sake of easy control,  $\alpha$ is generally controlled by digital code instead of analog voltage. Figure 4.14 shows a schematic of 16-level PI which is digitally controlled with thermometer code. It has sixteen current switches of which compromising one current source and two switches each. Only one of current path is available depending on whether control code *C* is high or low so that current ratio on two resistor becomes complementary. If *N* bits are high,

$$V_{PI,out} = \frac{N}{16} \cdot V_{PI,in,I} + (1 - \frac{N}{16}) \cdot V_{PI,in,Q}.$$
(4.5)

And the phase of PI output is monotonically controlled by N as follows [36].

$$\theta_{PI,out} = tan^{-1} \frac{16 - N}{N}.$$
(4.6)

Figure 4.15 is a schematic of D2S. Since the rising and falling transition of the PI output are not sufficiently steep, a CML buffer is placed in front-end of D2S. The differential outputs of the buffer are applied to two identical pseudo-differential amplifiers in opposite direction to amplify signal swing levels.

Figure 4.16 shows how phase is monotonically produced according to phase control code. When phases of  $0^{\circ}$  and  $90^{\circ}$  or  $180^{\circ}$  and  $270^{\circ}$  are selected by the MUXs, the code for PI should be increase to make phase lead, and *vice versa*.

The chain structure of MUXs, PI and D2S in figure 4.11 are simulated after layout. Figure 4.17 (a), (b) and (c) are simulated output phase, INL and I/Q mismatch versus control code, respectively. With a proper code generation shown above, output phase of PI is monotonically generated as shown in the figure on top. Comparing to a solid-line which is an ideal reference phase, dotted line slightly dithers. The errors between these, or INL of PI, are within about 6ps for all cases as shown in figure (b). Therefore the time axis of eye image drawn by this system will be distorted as much as 6% at most. I/Q mismatch is defined as the unexpected phase error between in-phase and quadrature clock signal. Since two clock signals are expected to have a normal phase difference of 100ps, I/Q mismatch is calculated by eliminating this normal value from the phase difference. In the result, its range is from -0.4ps to 0.4ps as shown in figure (c).



Figure 4.14: Schematic of phase interpolator



Figure 4.15: Schematic of differential-to-single-ended converter



Figure 4.16: Monotonic phase control using MUXs and PI



Figure 4.17: Post-layout simulation results of clock generation using MUXs, PI and D2S (a) Output phase versus control code (b) INL (c) I/Q mismatch

## 4.3.3 Clock trees

The structure of clock trees is shown in figure 4.18. Three kinds of clocks per each phase, with subscripts of T/H, CSA and DFF, are provided to the DFE. Since the moment when T/H circuit begins "hold" is the sampling time, T/H clocks are the most important and should be carefully generated. It is considered in design that all clocks for T/Hs have identical loading and parasitic in layout to maintain the original phase difference. Also, a number of dummy buffers are inserted to match fanouts for other clocks. Both  $CK_{EOM,CSA}$  and  $CK_{EOM,DFF}$  are generated from frequency divider to have half-frequencies. The other clocks connected to the decimator.



Figure 4.18: Schematic of clock trees

# 4.4 Miscellaneous Blocks

#### 4.4.1 Code deserializer

Serially received digital codes from the digital controller are parallelized by code deserializer and these are provided to the circuits in the chip. Figure 4.19 shows schematic and timing diagram. When *serial code input* is received with *clock*, shift-register sequentially stores the code during *writing* phase. A *load* signal activated after receiving. The code stored in the shift-register are transfered to the register array and the outputs of this are connected to circuits. Until the next bunch of codes are loaded, the code stored in this register array maintains.

In order to check that the stored codes are right, *reading-back* phase can be followed. If the *clock* has one more rising edge during *load* is high, the stored codes are transfered to multiplexed shift-register. The multiplexed shift-register accepts codes from the register array when *load* is high but serially pulls out contained codes to *serial code output* when it is low. With this function the digital controller can confirm that there is no error in writing.



Figure 4.19: Code deserializer (a) Schematic (b) Timing diagram

# 4.5 Post-Layout Simulation of Modules

Figure 4.20(a) shows layout of the entire chip. The core area in the entire layout is zoomed-in and shown in figure 4.20(b). The size of the DFE including five DACs is  $110 \times 95 \mu m^2$ . Simulation results in this section are obtained from this layout after parasitic extraction process.



Figure 4.20: Layout of designed chip (a) entire chip (b) core

First of all, eye monitoring operation is verified by sweeping  $V_{EOM}$  as shown in figure 4.21. No channel effect and input data swing of 300mV are applied. In the first row of the plot, waveforms of  $V_{EOM}$  and one of offset amplifier outputs, or y, are shown. Note that all offset amplifier outputs are the same because all DFE codes are set to 16, resulting in no equalizer operation. And the peak-to-peak swing of y is about 196mV, corresponding to about 10 LSBs, due to the gain of offset amplifier. When  $V_{EOM}$  is near VDD or the minimum,  $D_{o,EOM}$  stays in low or high except for initial garbage value, respectively. As the level of  $V_{EOM}$  passes through the swing range of y,  $D_{o,EOM}$ begins to dither between high and low values.



Figure 4.21: Post-layout simulation process for eye monitoring by sweeping  $V_{EOM}$ 

In order to look into the dithering of  $D_{o,EOM}$ , some post-processes using MATLAB are followed by migrating the waveforms of  $D_{o,EOM}$  and related signals to MATLAB. The occurrences of "high" in  $D_{o,EOM}$ , that is CH, are separately counted whether the present bit is determined to "1" or "0" for each DAC code value, as shown in figure 4.22(a) and (d). Since no digital controller is included in the simulation, it is not available to get a fixed number of samples, or *Ns*. Therefore the CHs are normalized with  $n_s$ which are counted by MATLAB for each case. The normalized CHs are plotted in figure (b) and (e). By differentiating them, respectively, DHs can be obtains as shown in figure (c) and (f). The locations of the peaks are calculated by interpolating the magnitudes of bars, which are 20.5 and 11, respectively. As a result, the swing of received signal is measured as 9.5 LSB.

The next simulation is to verify the pattern-filtered EOM operation. Assuming a channel having a pole at 2.2-GHz, 10-Gb/s PRBS7 input data of 600mV swing with common-mode level of 900mV is applied to the DFE. And all coefficients for DFE are set to 0. By the similar process given the previous simulation, that is the counting and normalizing, the DHs for interested patterns are obtained as shown in figure 4.23. The locations of the peaks are calculated with the interpolation method, resulted in as written on each figure. Referring to Eq. 3.5, equalization coefficients are finally obtained.

As the next step, the operation of DFE is verified with these DFE coefficients for the same channel. Prior to discussing the result, MATLAB process, which is for drawing effective eye diagram from simulation result, is described in figure 4.24. First four rows show candidate signals from four different offset amplifiers. One of four segments in each symbol duration is emphasized with bold line. These are the candidate signals that



Figure 4.22: Eye monitoring result at the center phase of eye (a) cumulative histogram for "1" (b) normalized cumulative histogram for "1" (c) distribution histogram for "1" (d) cumulative histogram for "0" (e) normalized cumulative histogram for "0" (f) distribution histogram for "0"



Figure 4.23: Distribution histograms for each pattern using pattern-filtered eye monitoring for patterns of (a) "111" (b) "101" (c) "011" (d) "000" (e) "010" (f) "100"

will be selected in multiplexing stage. By superposing these, eye diagram of the effective equalized signal can be plotted.



Figure 4.24: MATLAB process of drawing effective eye diagram using post-layout simulation results

Figure 4.25 shows the effect of the proposed adaptation on the DFE. By applying the calculated coefficient to the DFE, the measured peak-to-peak jitter is reduced from 47% to 5%. The same processes are repeated for other channel environments. Even with the received signal having completely closed eye-diagram in figure 4.27, effectively opened eye diagram with 32% jitter is achieved.

Bit-error test is exercised for this case, by comparing four-channel decided data with transmitted sequence. As shown in figure 4.28 each group of four symbols in serial should match to four-bit data in parallel with some delay. From the post-layout simulation results with channel bandwidth of 1.1-GHz, this process is automatically performed by MATLAB for 8,000 bit durations, or 800ns. It is checked that no bit-error is resulted in for three process corners: SS, TT and FF.



Figure 4.25: Post-layout simulation result of DFE for 10-Gb/s PRBS7 input with channel bandwidth of 2.2-GHz (a) before DFE (b) after DFE (effective eye diagram after offset amplifier)



Figure 4.26: Post-layout simulation result of DFE for 10-Gb/s PRBS7 input with channel bandwidth of 1.3-GHz (a) before DFE (b) after DFE (effective eye diagram after offset amplifier)



Figure 4.27: Post-layout simulation result of DFE for 10-Gb/s PRBS7 input with channel bandwidth of 1.1-GHz (a) before DFE (b) after DFE (effective eye diagram after offset amplifier)



Figure 4.28: Four-channel de-serialized output of DFE for 10-Gb/s PRBS7 input with channel bandwidth of 1.1-GHz

# Chapter 5 Experimental Results

Figure 5.1 (a) is a micro-photograph of fabricated chip. The chips are mounted on PCB with bonding-wires as shown in figure (b). Photograph of the PCB with implemented chip (device under test or DUT) is shown in figure (c).



Figure 5.1: Photographs of (a) fabricated chip (b) chip-on-board with bonding-wires (c) printed-circuit board under test

First of all, characteristic of DAC is measured using simple experimental setup shown in figure 5.2. A controller implemented in FPGA generates a sequence of code which contains DAC code for test. The output voltage of the DAC is manually measured outside the chip for increasing DAC code. Figure 5.3 (a) and (b) show measured DAC output voltages versus input codes and INL. The measured result shows degradation in INL performance compared with the simulated one. Since the DAC is controlled by binary-weighted code, deterministic INL seems to degrade the performance. To avoid this, DAC should be controlled in thermometer code.



Figure 5.2: Experimental setup to measure DC characteristic of DAC

Figure 5.4 shows an experimental setup to measure eye diagram of 10-Gb/s input random data. Pattern generator transmits 10-Gb/s PRBS-7 data to the DUT and frequency-synchronized 2.5-GHz (a single quarter-rate) clock to clock generator. The clock generator converts the received clock to a pair of differential clocks with a certain time delay. To check the functionality, a controller to draw pattern-filtered eye-diagrams is configured in the FPGA. When a start signal is applied to the programmed FPGA, a sequential process to draw the diagrams is performed. Finally, measured data to draw diagrams are exported to a computer through Xilinx ChipScope<sup>TM</sup> and the pattern-filtered



Figure 5.3: Measured characteristic of DAC (a) Output voltages versus DAC codes (b) Integral non-linearity

eye-diagrams are plotted using MATLAB.



Figure 5.4: Experimental setup to measure eye diagram

Pattern-filtered eye-diagrams and sum of those are shown in figure 5.5. Since figure (a) is the eye diagram for patterns of 111 and 000, there is no transition from the previous bits and thus the left-side of the eye is opened. On the other hand, the eye diagram for patterns of 101 and 010 shown in figure (c) has only crossing because there are always transition from 0 to 1 or 1 to 0.

Note that these pattern-filtered eye diagrams are not always required in EOM for test and DFE adaptation. In order to get full information to draw pattern-filtered eyediagrams, several times of registers are required.

The performance and adaptivity of implemented DFE are verified with PCB channels



Figure 5.5: Obtained pattern-filtered eye-diagrams (a) 111 and 000 (b) 011 and 100 (c) 101 and 010 (d) (a)+(b)+(c)

which are 10cm, 20cm, 30cm and 40cm long, respectively. Figure 5.6 shows measured  $S_{21}$  parameters of the channels. It is shown that loss at the half of baud frequency linearly increases in logarithmic scale as channel length linearly increases with the ratio of about 2.2dB per 10cm.

Measured eye diagrams after four channels are shown in figure 5.7. Since the eye diagrams are drawn by oscilloscope, eye diagrams obtained by the on-chip EOM must be worsen than these due to parasitics in additional PCB trace, bonding wire, and pad on the chip.



Figure 5.6: Measured  $S_{21}$  parameters of PCB channels 10cm, 20cm, 30cm and 40cm long





Figure 5.7: Measured eye diagrams after PCB channels (using oscilloscope) (a) 10cm (b) 20cm (c) 30cm (d) 40cm

Figure 5.8 shows experimental setup for DFE test. One of PCB channels are inserted between the pattern generator and device under test. One of four channels of DFE outputs is selected and applied to BER tester. Measured signal levels such as  $C_{111}$ ,  $C_{000}$ ,  $C_{101}$ ,  $C_{010}$ ,  $C_{011}$  and  $C_{100}$  and calculated coefficients are checked using LCD module connected to the FPGA.



Figure 5.8: Experimental setup for DFE test

Obtained eye diagrams before and after equalization for four PCB channels are shown in figure 5.9. The DFE adaptation is made by the proposed algorithm. The number of samples for pattern-filtered EOM,  $N_S$  is chosen as 255 because the standard deviation of samples does not exceed 3 LSBs. To verify adequacy of calculated coefficients, the amounts of vertical eye opening at the sampling phases are measured for four DFE coefficients around the coefficient by automatic adaptation process. The vertical eye opening is determined by extrapolating two adjacent sample values of opening area. Four plots in figure 5.10 show the results for each channel length.  $\Delta C_1$  and  $\Delta C_2$  in the plots mean the errors from calculated  $C_1$  and  $C_2$ , respectively. For three shorter channels, the results by proposed algorithm shows the best vertical eye opening. In the last case, it is guessed that the third post-cursor disturbs accurate adaptation.

Bathtub curve, the plot of sampling phase versus BER, is measured to verify BER performance of the system. As shown in figure 5.11, horizontal eye openings are improved after equalization with automatic adaptation. For all cases error-free (under  $10^{-12}$ ) ranges are extended by equalization, compared with non-equalized results.

Table 5.1 summarizes the performance of the DFE. The DFE consumes 11mW from1.2V supply voltage.

Table 5.2 shows performance comparison with reported DFEs.


Figure 5.9: Drawn eye diagrams with EOM



Figure 5.10: Comparison of vertical eye openings versus DFE coefficients (a) 10cm (b) 20cm (c) 30cm (d) 40cm



Figure 5.11: Bathtub curves before and after equalization (a) 10cm (b) 20cm (c) 30cm (d) 40cm

| Process           | TSMC 90nm CMOS                             |  |  |  |
|-------------------|--------------------------------------------|--|--|--|
| Supply voltage    | 1.2V                                       |  |  |  |
| Data-rate         | 10-Gb/s                                    |  |  |  |
| Power consumption | 11mW (DFE core including circuits for EOM) |  |  |  |
| Die area          | $110 	imes 95 \ \mu m^2$                   |  |  |  |
| Bit error rate    | $< 10^{-12}$                               |  |  |  |

Table 5.1: Chip summary

Table 5.2: Performance comparison with reported DFEs

|                   | [24]            | [25]             | [26]            | This work        |
|-------------------|-----------------|------------------|-----------------|------------------|
| Process           | 90nm CMOS       | 90nm CMOS        | 45nm CMOS       | 90nm CMOS        |
| Data rate         | 6Gb/s           | 10Gb/s           | 12Gb/s          | 10Gb/s           |
| Circuit style     | Soft            | Switched-        | Current-        | Look-            |
|                   | decision        | capacitor        | integration     | ahead            |
| Interleaving      | 1/4             | 1/4              | 1/2             | 1/4              |
| Number of taps    | 2               | 1                | 5               | 2                |
| Power consumption | 5mW             | 6mW              | 11mW            | 11mW             |
| Core area         | 4,410 $\mu m^2$ | 10,500 $\mu m^2$ | $3,650~\mu m^2$ | 10,450 $\mu m^2$ |
| Adaptation        | _               | -                | _               | EOM              |

# Chapter 6 Conclusion

As the portion of test cost keeps increasing in manufacturing high-speed transceiver chips, self-test methods using on-chip EOM technique can be powerful solutions to reduce the cost. In this dissertation, it is shown that the on-chip EOM technique can be utilized for adaptation of high-speed equalizer in addition to self-test.

A 10-Gb/s adaptive 2-tap DFE for PCB trace channels with the on-chip EOM is proposed and implemented. Look-ahead and interleaving are widely used structures in implementing high-speed DFEs. The designed DFE has fully-look-ahead structure to relieve stringent timing margin in the feedback loops and operates in quarter rate with less speed burden of samplers and decision circuits. While existing on-chip EOM approaches can not deal with the equalized waveform by look-ahead DFE, the proposed EOM named look-ahead EOM is able to monitor it.

In order to adapt the DFE to unknown channels automatically, the magnitudes of post-cursors are measured using pattern-filtered EOM and used to calculate DFE coefficients. It is shown that the number of samples for pattern-filtered EOM can be chosen as to target confidence interval.

Analog circuits of prototype system including the DFE and clock generator are designed using 90nm CMOS technology. The functionality and performance of the modules and building blocks were verified with post-layout simulation. The fabricated chip was mounted and wire-bonded to PCB for test. With digital controller configured in FPGA, it is verified that the prototype system can monitor eye diagrams of equalized waveforms inside the chip. In experiments with PCB channels, the DFE successfully improves BER performance with automatic adaptation for channel lengths of 10cm, 20cm, 30cm and 40cm. The DFE core occupies  $110 \times 95 \mu m^2$  and consumes 11mW from 1.2V supply voltage.

## Appendix

### A. Design and revision of printed circuit board

PCB is also important design elements within the design of a high-speed data transmission system. As data rate increases, wavelengths of the signals on PCB traces become shorten and thus details of PCB layout should not be ignored. Since the wavelength of 10-GHz signal is about 1.6cm on FR4 ( $\epsilon_r$ =4.6), for example, physical dimension of even several millimeters can affect signal integrity.



Figure A.1: (a) Photograph of data-input part of PCB ver.1 with inverted color (b) drawn eye diagram of 10-Gb/s PRBS-7 data

Figure A.1 (a) is a photograph of 10-Gb/s data-input part of PCB, which is designed at the first time and named PCB ver. 1. Figure A.1 (b) shows drawn eye diagram of 10-Gb/s PRBS-7 data transmitted through transmission line (TL) on this PCB. The diagram shows almost closed eye even after a couple of centimeters long TL.

In order to find out causes of the eye-closing, investigation using time-domain reflectometry (TDR) on the PCB was performed. The TL for data input is designed to have characteristic impedance of 50 $\Omega$  and input-terminated with shunt 50 $\Omega$  resistor,  $R_{term,in}$ . Since the width of bonding pads on PCB (100 $\mu$ m) is much smaller than that of TL (939.8 $\mu$ m), the TL is designed to be narrowed down step by step, or tapered.



Figure A.2: Illustration of TDR measurement setups of PCB ver. 1 (White area is the top copper.) (a) TDR setup #1 (trace-cut after  $R_{term,in}$  and no chip on board) (b) TDR setup #2 (whole PCB but no chip on board) (c) TDR setup #3 (whole PCB and chip on board with bonding wires)

Figure A.2 (a), (b) and (c) show setups for TDR measurement named setup #1, #2, and #3, respectively. In the setup #1, trace after input termination resistor is cut out to exclude the effect of tapering in measurement. In the setup #2, whole TL geometry is conserved without bonding wires and chip on board. In the setup #3, finally, whole elements including bonding wire, and chip are conserved.



Figure A.3: Measured TDR waveforms for three kinds of setups of PCB ver. 1

Ripples are observed in TDR measurement results as shown in figure A.3. All three TDR waveforms show under-shoot right after SubMiniature-version-A (SMA) connector. Therefore, it seemed that the step signal meets the first discontinuity near the connector. Since TDR waveform with setup #2 shows larger and longer ripple than that

with setup #1, TL on PCB itself suffers from problem such as impedance mismatch or unwanted parasitic elements. TDR waveform with setup #3 also shows worse result than that with setup #2. It means that bonding wire and the chip also degrade signal integrity. First of all, modeling is performed to investigate the effect of tapering between  $R_{term,in}$ and the end of TL.



Figure A.4: Transmission line model of PCB ver. 1 (a) segmentation of tapered line (b) ADS model with segmented CPWG

For modeling, tapered TL is divided into five segments as shown in figure A.4 (a). It is modeled with segmented CoPlanar Waveguides with Ground (CPWG) in ADS as shown in figure (b). In the model, input termination resistor of  $50\Omega$  is placed between Seg #1 and #2. Also high-impedance resistor of  $1M\Omega$  is attached at the end of the model.

Figure A.5 is simulated waveform and shows obvious ringing even with only tapered TL.



Figure A.5: Simulated TDR waveform

Second version of PCB was designed to have less discontinuity in TL. Figure A.6 shows its photograph and mask pattern. In order to reduce the width of TL maintaining characteristic impedance, the thickness of dielectric material (FR4) between upper copper and lower ground plane is reduced from  $540\mu m$  to  $150 \mu m$ . As a result, narrower TL width of 280  $\mu m$  can be achieved. Also some efforts was made in designing connector part on PCB to reduce non-ideal effects. In detail, the width of input pads of connectors are set to the same with that of TL to eliminate discontinuity. Ground pads of SMA connectors are totally merged with the top copper plane and connected lower ground planes with ten vertical vias in order to make more tight connection.

Even with narrower width of TL, some discontinuities could not be avoided because input termination resistor having larger physical size should be placed as close as possible to the chip. In order to simulate the discontinuity effect, new TL was modeled in ADS simulator as shown in figure A.7. It contains a pair of CPWG segments, three pieces of differential CPWGs (DCPWG), input differential termination resistor, and high-impedance resistors.

Since it is found that scarcely any reflection (under 10mV for 1V step input) occurs in simulation, the designed PCB was fabricated and its characteristic was tested with TDR measurement. Figure A.8 (a), (b) and (c) show test setups #1, #2 and #3 for TDR measurements, respectively. Note that this version of PCB was designed to have the input termination resistor of 100 $\Omega$  across the differential TL. To examine single-ended TDR test, a single 50 $\Omega$  shunt resistor were soldered across one of resistor pad and the top copper plane as drawn in the figures.



Figure A.6: Photograph (left-side) and mask pattern (right-side) of data input part of PCB ver. 2



Figure A.7: Model of transmission line on PCB ver. 2 with segmented CPWG and DCPWG



Figure A.8: Illustration of TDR measurement setups of PCB ver. 2 (White area is the top copper.) (a) TDR setup #1 (trace-cut after  $R_{term,in}$  and no chip on board) (b) TDR setup #2 (whole PCB but no chip on board) (c) TDR setup #3 (whole PCB and chip on board with bonding wires)

In the results which are plotted in A.9, it is shown that under-shoot near SMA connector is reduced. Also, amplitude and length of ringings are also reduced even in setup #2 and #3 comparing to the result of PCB ver. 1. Although improvement of signal integrity is expected, it is still observed that under-shoot exists right after input termination resistor.



Figure A.9: Measured TDR waveforms for three kinds of setups of PCB ver. 2

Since this effect was not observed in ADS simulation, additional modeling for parasitic elements was tried. Figure A.10 is revised ADS model including parasitic capacitances and inductance. Pads on PCB and chip are modeled as shunt capacitances ( $C_{PCB}$ and  $C_{PAD}$ ), respectively, and bonding wire is modeled as series inductance ( $L_B$ ) followed by shunt capacitance ( $C_B$ ).



Figure A.10: Modified model of transmission line on PCB ver. 2 with additional parasitic components

Parameters of parasitic elements were fitted using measured TDR waveforms step by step. First, simulated TDR waveforms with parametric sweep on  $C_{PCB}$  were compared with measured one with setup #2 in figure A.9. Note that  $L_B$ ,  $C_B$  and  $C_{PAD}$ , of course, were excluded in this simulation. The result with  $C_{PCB}$  of 260fF shows the best matching to the measured one as shown in figure A.11 (a). Second,  $C_B$  was parametric-swept with fixed  $C_{PCB}$  of 260fF,  $L_B$  of 1nH, and  $C_{PAD}$  of 100fF. Since the value of inductance does not significantly effect on the amplitude of under-shoot, it was reasonably assumed as 1nH. And, the value of  $C_{PAD}$  was extracted from post-layout simulation using Graphic Data System (GDS) information and technology parameters. As a result,  $C_B$  of 400fF shows the best matched under-shoot amplitude as shown in figure A.11 (b). Finally, several  $L_B$  was simulated and the inductance of 1nH shows well-matched amplitude of following over-shoot after the first under-shoot as shown in figure A.11 (c).











(c)

Figure A.11: Parametric sweep results for each parasitic elements (a)  $C_{PCB}$  (b)  $C_B$  (c)  $L_B$ 

Figure A.12 (a) and (b) compare simulated and measured TDR waveforms of PCB ver. 1 and 2, respectively. These shows that results of both cases are well matched except for the discontinuity near SMA connector in the case of PCB ver. 1 and thus the modelings are successful.

Simulated eye diagrams with 10-Gb/s PRBS-7 data via PCB ver. 1 model shows poor eye-opening in figure A.13 (a) as expected. On the other hand, much improved eye-opening is achieved in the same simulation with PCB ver. 2 model as shown in figure (b).



Figure A.12: Simulated versus measured TDR waveforms (a) PCB ver. 1 (b) PCB ver. 2



Figure A.13: Simulated eye diagrams of 10-Gb/s PRBS-7 data (a) PCB ver. 1 (b) PCB ver. 2

### References

- J. Lee, "A 2OGb/s Adaptive Equalizer in 0.13 μm CMOS Technology," *IEEE J.* Solid-State Circuits, vol. 41, no. 9, pp.2058-2066, Sep. 2006
- [2] Laung-Terng Wang, Charles E. Stroud and Nur A. Touba, System-on-chip test architectures: nanometer design for testability, Elsevier, Burlington, MA, 2008
- [3] Behnam Analui, Alexander Rylyakov, Sergey Rylov, Mounir Meghelli, and Ali Hajimiri, "A 10-Gb/s Two-Dimensional Eye-Opening Monitor in 0.13-um Standard CMOS," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2689-2699, Dec. 2005
- [4] Tektronix, "Application Note: Bridging the Gap Between BER and Eye Diagrams A BER Contour Tutorial," Sep. 2010
- [5] Oscar E. Agazzi, *el al.*, "A 90nm CMOS DSP MLSD Transceiver with Integrated AFE for Electronic Dispersion Compensation of Multimode Optical Fibers at 10Gb/s," *IEEE J. Solid-State Circuits*, vol. 43, No. 12, pp. 2939-2957, Dec. 2008
- [6] Aida Varzaghani and Chih-Kong Ken Yang, "A 4.8 GS/s 5-bit ADC-Based Re-

ceiver With Embedded DFE for Signal Equalization," *IEEE J. Solid-State Circuits*, vol. 44, No. 3, pp. 901-915, Mar. 2009

- [7] Chih-Kong Ken Yang and E-Hung Chen, "ADC-based Serial I/O Receivers," in *Proc. IEEE Custom Integrated Circuits Conference (CICC)*, pp. 323-329, Sep. 2009
- [8] Jian-Hao Lu and Shen-Iuan Liu, A 50-Gb/s 10-mW Analog Equalizer Using Transformer Feedback Technique in 65-nm CMOS Technology *IEEE Trans. Circuit and Systems*, vol. 56, no. 10, pp. 783-787, Oct. 2009
- [9] Chih-Fan Liao and Shen-Iuan Liu, "A 40Gb/s CMOS Serial-Link Receiver with Adaptive Equalization and CDR," *ISSCC Dig. Tech. Papers*, pp. 100-101, Feb. 2008
- [10] Yasuo Hidaka, Weixin Gai, Takeshi Horie, Jian Hong Jiang, Yoichi Koyanagi, and Hideki Osone, "A 4-Channel 10.3Gb/s Backplane Transceiver Macro with 35dB Equalizer and Sign-Based Zero-Forcing Adaptive Control," *ISSCC Dig. Tech. Papers*, pp. 188-190, Feb. 2009
- [11] S. Gondi, J. Lee, D. Takeuchi, et al., "A 10Gb/s CMOS Adaptive Equalizer for Backplane Applications," ISSCC Dig. Tech. Papers, pp. 328-329, Feb. 2005
- [12] H. Uchiki, Y. Ota, M. Tani, Y. Hayakawa, and K.Asahina, "A 6Gb/s RX Equalizer Adapted Using Direct Measurement of the Equalizer Output Amplitude," *ISSCC Dig. Tech. Papers*, pp. 104-105, Feb. 2008

- [13] H. Wu, J. A. Tierno, P. Pepeljugoski, *et al.*, "Integrated Transversal Equalizers in High-Speed Fiber-Optic Systems," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2131-2137, Dec. 2003
- [14] David Hernandez-Garduno, Jose Silva-Martinez, "A CMOS 1Gb/s 5-Tap Transversal Equalizer Based on Inductorless 3rd-Order Delay Cells," *ISSCC Dig. Tech. Papers*, pp. 232-233, Feb. 2007
- [15] Hao Liu, Jin Liu, Robert Payne, Cy Cantrell, and Mark Morgan, "A 18mW 10Gbps Continuous-Time FIR Equalizer for Wired Line Data Communications in 0.12μm CMOS," in *Proc. IEEE Custom Integrated Circuits Conference (CICC)*, pp. 113-116, Sep. 2009
- [16] Xiaodong Wang, and Richard R. Spencer, "A Low-Power 170-MHz Discrete-Time Analog FIR Filter," *IEEE J. Solid-State Circuits*, vol. 33, no. 3, pp. 417-426, Mar. 1998
- [17] Nathaniel J. Guilar, Frank (Pak-Kim) Lau, Paul J. Hurst, and Stephen H. Lewis, "A Passive Switched-Capacitor Finite-Impulse-Response Equalizer," *IEEE J. Solid-State Circuits*, vol. 42, no. 2, pp. 400-409, Feb. 2007
- [18] James F. Buckwalter, Mounir Meghelli, Daniel J. Friedman, and Ali Hajimiri,
   "Phase and Amplitude Pre-Emphasis Techniques for Low-Power Serial Links," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp.1391-1399, Jun. 2006
- [19] Young-Soo Sohn, Seung-Jun Bae, Hong-June Park, Chang-Hyun Kim and Soo-In Cho, "A 2.2 Gbps CMOS look-ahead DFE receiver for multidrop channel with

pin-to-pin time skew compensation," Proc. IEEE Custom Integrated Circuits Conference (CICC), pp. 473-476, Sep. 2003

- [20] V. Stojanovic, A. Ho, B. Garlepp, F. Chen, J.Wei, E. Alon, C.Werner, J. Zerbe, and M. A. Horowitz, "Adaptive equalization and data recovery in a dual-mode (PAM2/4) serial link transceiver," in *Symp. VLSI Circuits 2004 Dig. Tech. Papers*, 2004, pp. 348351.
- [21] T. Beukema, M. Sorna, K. Selander, S. Zier, B. L. Ji, P. Murfet, J. Mason, W. Rhee, H. Ainspan, B. Parker, and M. Beakes, "A 6.4-Gb/s CMOS SerDes core with feed-forward and decision-feedback equalization," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 26332645, Dec. 2005.
- [22] R. Payne, P. Landman, B. Bhakta, S. Ramaswamy, S.Wu, J. D. Powers, M. U. Erdogan, A.-L. Yee, R. Gu, L. Wu, Y. Xie, B. Parthasarathy, K. Brouse, W. Mohammed, K. Heragu, V. Gupta, L. Dyson, and W. Lee, "A 6.25-Gb/s binary transceiver in 0.13-μm CMOS for serial data transmission across high loss legacy backplane channels," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 26462657, Dec. 2005.
- [23] Matt Park, John F. Bulzacchelli, Michael Beakes, and Daniel J. Friedman, "A 7Gb/s
  9.3mW 2-Tap Current-Integrating DFE Receiver," *ISSCC Dig. Tech. Papers*, pp. 230-231, Feb. 2007
- [24] Koon-Lun Jackie Wong, Alexander Rylyakov, and Chih-Kong Ken Yang, "A 5mW 6-Gb/s Quarter-Rate Sampling Receiver With a 2-Tap DFE Using Soft Decisions," *IEEE J. Solid-State Circuits*, vol. 42, No. 4, pp. 881-888, Apr. 2007

- [25] Azita Emami-Neyestanak, Aida Varzaghani, John F. Bulzacchelli, Alexander Rylyakov, Chih-Kong Ken Yang, and Daniel J. Friedman, "A 6.0-mW 10.0-Gb/s Receiver With Switched-Capacitor Summation DFE," *IEEE J. Solid-State Circuits*, vol. 42, No. 4, pp. 889-896, Apr. 2007
- [26] Timothy O. Dickson, John F. Bulzacchelli, and Daniel J. Friedman, "A 12-Gb/s 11-mW Half-Rate Sampled 5-Tap Decision Feedback Equalizer With Current-Integrating Summers in 45-nm SOI CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 44, No. 4, pp. 1298-1305, Apr. 2009
- [27] Tai-Cheng Lee and Behzad Razavi, "A 125-MHz mixed-signal echo canceller for gigabit ethernet on copper wire," *IEEE J. Solid-State Circuits*, vol. 36, No. 2, pp. 366-373, Feb. 2001
- [28] Jinwook Kim, Jeongsik Yang, Sangjin Byun, Hyunduk Jun, Jeongkyu Park, Cormac S. G. Conroy, and Beomsup Kim, "A Four-Channel 3.125-Gb/s/ch CMOS Serial-Link Transceiver With a Mixed-Mode Adaptive Equalizer," *IEEE J. Solid-State Circuits*, vol. 40, No. 2, pp. 462-471, Feb. 2005
- [29] Ahmed Adel Abd El-Fattah, Ahmed Mohamed Arafa, Dina Reda Abd El-Hay, Fady Atef Naguib Mohamed, Marwa Mostafa Ahmed, and Mohamed Omar Abd EL-Aziz, "Equalizer Implementation for 10 Gbps Serial Data Link in 90 nm CMOS Technology," *Microelectronics, 2007. ICM 2007. Internatonal Conference on*, , Dec. 2007
- [30] Kwang-Ting (Tim) Cheng and Hsiu-Ming (Sherman) Chang "Test Strategies

for Adaptive Equalizers," in Proc. IEEE Custom Integrated Circuits Conference (CICC), pp. 597-603, Sep. 2009

- [31] M. Kawai, H. Watanabe, T. Ohtsuka, and K. Yamaguchi, "Smart Optical Receiver With Automatic Decision Threshold Setting and Retiming Phase Alignment," J. Lightwave Technol., vol. 7. no. 11, Nov. 1989
- [32] Tobias Ellermeyer, Ulrich Langmann, Berthold Wedding, and Wolfgang Pöhlmann, "A 10-Gb/s Eye-Opening Monitor IC for Decision-Guided Adaptation of the Frequency Response of an Optical Receiver," *IEEE J. Solid-State Circuits*, vol. 35, no. 12, pp. 1958-1963, Dec. 2000
- [33] F. Buchali, S. Lanne, J. -P. Thiéry W. Baumert, and H. Bülow, "Fast Eye Monitor for 10 Gbit/s and its Application for Optical PMD Compensation," *Tech. Digest OFC*, 2001, TuP5
- [34] F. Buchali, W. Baumert, H. Bülow, and J. Poirrer, "A 40Gb/s Eye Monitor and Its Application to Adaptive PMD Compensation," *Tech. Digest OFC*, 2002, WE6
- [35] Yasumoto Tomita, Masaya Kibune, Junji Ogawa, William W. Walker, Hirotaka Tamura, and Tadahiro Kuroda, "A 10-Gb/s Receiver With Series Equalizer and On-Chip ISI Monitor in 0.11-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 986-993, Apr. 2005
- [36] Chang-Kyung Seong, "A 1.25-Gb/s Digitally-Controlled Dual-Loop Clock and Data Recovery Circuit with Enhanced Phase Resolution." Master's thesis, Yonsei Unvertisy, 2006, pp. 14.

#### 국문요약

### 온 칩 아이 오프닝 모니터링을 이용한

### 10Gb/s 적응형 Decision Feedback Equalizer

반도체 공정이 진화함에 따라 칩의 생산 비용이 점차 감소함에도 불구하고, 테스트 비용이 차지하는 비중은 상대적으로 증가하는 추세이다. 고속 수신기 칩의 테스트 비용을 절감하기 위하여 온 칩 아이 오프닝 모니터링 (EOM) 회로가 수신기 칩에 집적될 수 있다. 이러한 기법을 활용하여 특정 신호의 아이 다이어그램을 칩 내에서 스스로 관찰함으로써 복잡하고 긴 테스트 과정을 없이 자동 테스트가 이루어질 수 있다.

본 논문에서는 EOM 기법을 적용한 새로운 구조의 10-Gb/s decision feedback equalizer (DFE)를 제안한다. 제안된 EOM 방식은 자동 테스트 뿐만 아니라 다양한 길이의 인쇄 회로 기판 (PCB) 채널에 DFE를 적응화한다.

제안된 DFE 는 고주파 노이즈를 증폭하지 않으면서 아이 오프닝과 비트 에러 성능을 향상하기 위하여 두 포스터 커서를 상쇄한다. 궤환 루프를 완전히 제거함으로써 속도 제한을 완화하여 10Gb/s 의 고속으로 동작할 수 있다. 또한, 사분율 클럭을 이용하여 네 개의 경로로 인터리빙 함으로써 샘플링 및 decision 회로의 속도 부담을 경감한다. DFE 의 적응 기능을 추가하기 위하여 등화된 아이 다이어그램에 대한 정보를 얻기 위한 추가적인 샘플러를 탑재하였다. 궤환 루프가 제거된 DFE 구조에서는 등화된 파형이 하나의 물리적 노드에 실제로 존재하지 않기 때문에, 이를 극복하기 위하여 등화된 유효 아이 다이어그램을 얻기 위한 look-ahead EOM 기법을 제안한다. 기존의 EOM 회로가 두 개의 기준 전압과 두 개의 클릭 신호를 사용하는 데에 비하여, 제안된 회로는 하나의 기준 전압과 클릭 신호만을 사용한다. 이처럼 보다 많은 아날로그 회로를 사용하는 대신에, 디지털 처리 기법을 사용하는 방식을 제안한다.

채널의 인접 심볼간 간섭 (ISI) 성분은 제안된 EOM 기법을 이용하여 측정될 수 있다. 기존에 발표된 ISI 모니터링 기법이 switched-capacitor correlator 를 이용한 아날로그 회로로 구현된 데 반해, 제안된 알고리즘은 ISI 성분들을 디지털 방식으로 통하여 계산한다. 이러한 방식으로 측정된 ISI 성분으로부터 DFE 의 등화 계수가 계산되어 DFE 에 인가된다.

DFE 와 EOM 용 샘플러, 클럭 생성기 및 기타 회로를 포함한 prototype 칩은 90nm CMOS 공정을 통하여 설계 및 제작되었다. Post-layout 시뮬레이션에서 각각의 구성 회로와 전체 회로의 동작 및 성능을 검증하였다. 이 prototype 에서는 디지털 제어기가 field-programmable gate array (FPGA)에 구현되어 CMOS 칩과 연동하는 방식을 택하였지만, 실제로 아날로그 회로와 함께 집적될 수 있다. 제작된 칩은 검증을 위하여 PCB에 직접 장착 및 연결되었다. 측정 결과, 등화된 신호의 아이 다이어그램이 성공적으로 그려짐이 확인되었다. 또한, 10Gb/s의 데이터에 대하여 최대 40cm의 PCB 채널 길이까지 성공적으로 등화 및 적응화 되는 것을 확인하였다. DFE 코어는 110×95µm<sup>2</sup>의 면적을 차지하고, 1.2V 전원 전압으로부터 11mW를 소모한다.

핵심되는 말 : 고속 정보 전송, 적응형 등화기, decision feedback equalizer, 아이 오프닝 모니터링, 심볼 간 간섭 측정