DIGITAL PREDISTORTION LINEARIZATION AND CREST FACTOR REDUCTION FOR WIDEBAND APPLICATIONS

by

Wan-Jong Kim
M.Sc. Kwangwoon University, 2001
B.Sc. Kwangwoon University, 1999

A THESIS SUBMITTED IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in the School
of
Engineering Science

© Wan-Jong Kim 2006
SIMON FRASER UNIVERSITY
Fall 2006

All rights reserved. This work may not be
reproduced in whole or in part, by photocopy
or other means, without the permission of the author.
APPROVAL

Name:  Wan-Jong Kim

Degree:  Doctor of Philosophy

Title of thesis:  Digital Predistortion Linearization and Crest Factor Reduction for Wideband Applications

Examining Committee:  Dr. Paul K. M. Ho
Chair

Dr. Shawn P. Stapleton, Senior Supervisor

Dr. James K. Cavers, Supervisor

Dr. Marek Syrzycki, Supervisor

Dr. Dong-In Kim, Internal Examiner

Dr. Michael Faulkner, External Examiner
Professor, Electrical and Computer Engineering
Victoria University of Technology, Australia

Date Approved:  December 8, 2006
DECLARATION OF PARTIAL COPYRIGHT LICENCE

The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.

The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection (currently available to the public at the “Institutional Repository” link of the SFU Library website <www.lib.sfu.ca> at: <http://ir.lib.sfu.ca/handle/1892/112>) and, without changing the content, to translate the thesis/project or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work.

The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.

It is understood that copying or publication of this work for financial gain shall not be allowed without the author’s written permission.

Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.

The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.

Simon Fraser University Library
Burnaby, BC, Canada

Revised: Fall 2006
Abstract

Power amplifiers are essential components in wireless communication systems and are inherently nonlinear. This nonlinearity generates spectral regrowth beyond the signal bandwidth, which in turn interferes with adjacent channels. Wideband code division multiple access (WCDMA) and orthogonal frequency division multiplexing (OFDM) systems are particularly vulnerable to nonlinear distortions; this is due to their high peak-to-average power ratios (PAPRs), which require a stringent linearity. One way to achieve the required linearity is to back-off the input signal. However, in the case of high PAPR signals, the efficiency of the power amplifier will be very low.

In this dissertation, we are concerned with achieving high linearity and high efficiency. We first propose a predistorter based on piecewise pre-equalizers, for use in multi-channel wideband applications. This predistortion linearizer consists of piecewise pre-equalizers, along with a lookup table (LUT) based digital predistorter; together they compensate for nonlinearities, as well as memory effects of power amplifiers. Taking advantage of the multiple finite impulse response (FIR) filters, the complexity is significantly reduced when compared to memory polynomial methods. Furthermore, experimental results obtained when two WCDMA carriers were applied verified that our proposed method provides improvements comparable to those seen using the memory polynomial approach.

Secondly, a unique baseband derived radio frequency (RF) predistortion system is presented, which uses LUT coefficients extracted at baseband to directly RF envelope modulate a quadrature vector modulator. The primary advantage of this architecture is that it combines the narrowband benefit of envelope predistortion with the accuracy of baseband predistortion.

Finally, a novel efficient crest factor reduction technique for wideband applications is
described. The technique uses peak cancellation to reduce the PAPR of the input signal. Conventional iterative peak cancellation requires several iterations to converge to the targeted PAPR, since filtering causes peak re-growth. The proposed algorithm eliminates several iterations and subsequently saves hardware resources. A direct performance comparison between a digitally predistorted and a feed-forward linearized Doherty amplifier is provided, under various crest factor reduction levels.

Keywords: wireless communication systems, digital predistortion, memory effects, crest factor reduction.
To my family and my wife. So Young
“I keep the subject of my inquiry constantly before me, and wait till the first dawning opens gradually, by little and little, into a full and clear light.”

--- Isaac Newton (1642–1727)
Acknowledgments

First, I would like to thank my advisor, Prof. Shawn P. Stapleton, for providing constant encouragement, valuable advice, and an enjoyable working environment during my Ph.D. studies.

I would also like to thank my examining committee for providing constructive advice, which improved the quality of this manuscript.

Many thanks go to students and faculty in the RF/Microwave Mobile Communications Laboratory.

I would like to thank my family for their unconditional support. Last but not least, I would like to express particular gratitude to So Young, my lovely wife, for her love, encouragement, and unending support.
Contents

Approval ii
Abstract iii
Dedication v
Quotation vi
Acknowledgments vii
Contents viii
List of Tables xi
List of Figures xii
List of Abbreviations xvi

1 Introduction 1
1.1 Motivation ........................................ 1
1.2 History of digital predistortion ..................... 3
1.3 History of crest factor reduction .................. 4
1.4 Contribution ....................................... 5
   1.4.1 List of publications ........................... 6
1.5 Outline ........................................... 8

2 Background 9
2.1 Behavioral models of power amplifiers ................. 9
2.1 Memory effects ........................................... 10
2.1.1 Memoryless model ...................................... 9
2.1.2 Memory effects ......................................... 10
2.1.3 Volterra series .......................................... 19
2.1.4 Wiener model ........................................... 21
2.1.5 Hammerstein model .................................... 22
2.1.6 Parallel Hammerstein model ........................... 23
2.2 Power amplifier linearization ............................. 24
2.2.1 Feed-forward linearization ............................. 24
2.2.2 Digital predistortion .................................... 25
2.3 Crest factor reduction ..................................... 30
2.3.1 Crest factor effect on power amplifiers ............... 30
2.3.2 Crest factor effect on digital-to-analog converters ... 31
3 Piecewise Pre-equalized Linearization .................. 34
3.1 Introduction .............................................. 34
3.2 The proposed piecewise pre-equalized based LUT PD .... 36
3.3 The proposed predistorter algorithm .................... 41
3.4 PA behavioral modeling ................................... 43
3.5 Simulation results ....................................... 45
3.6 Experimental results .................................... 51
3.7 Complexity evaluation ................................... 53
  3.7.1 The piecewise pre-equalized based LUT PD .......... 53
  3.7.2 The memory polynomial PD ............................ 54
3.8 Conclusions ............................................. 54
4 Digital Baseband Derived RF Predistortion ............ 56
4.1 Introduction ............................................. 56
4.2 Digital baseband derived RF predistortion architecture 59
4.3 Delay effects and calibration ............................ 59
4.4 Experimental results .................................... 64
4.5 Conclusions ............................................. 73
5 Crest Factor Reduction and Linearization ............... 75
5.1 Introduction ............................................. 75
5.2 Crest factor reduction for wideband applications .................................................. 76
  5.2.1 A scaled peak cancellation method ................................................................. 76
  5.2.2 Peak windowing technique .............................................................................. 78
5.3 Simulation results ................................................................................................. 79
5.4 Experimental results ............................................................................................ 86
  5.4.1 A class AB power amplifier ............................................................................ 86
  5.4.2 A Doherty amplifier with digital predistortion and feed-forward linearization ..................... 89
5.5 Conclusions .......................................................................................................... 91

6 An FPGA Testbed for Digital Predistortion ................................................................. 93
  6.1 Introduction ........................................................................................................ 93
  6.2 Design flow ...................................................................................................... 94
  6.3 Digital predistortion test bed, using an Altera DSP development FPGA board 95
  6.4 Experimental results ...................................................................................... 97
  6.5 Conclusions ...................................................................................................... 99

7 Summary and Future Research ............................................................................... 100

Appendices ................................................................................................................. 102

A Predistortability Analysis of the Proposed Predistortion ............................................ 102

Bibliography .............................................................................................................. 103
List of Tables

2.1 The number of parameters in the complex-valued Volterra model of (2.16), as a function of order and memory length. ........................................ 21

3.1 Summary of the ACLR simulations for the different PDs. ..................... 50
3.2 Summary of the ACLR measurements for the different PDs. .................. 52
3.3 Complexity estimation of the proposed piecewise pre-equalized PD. ....... 53
3.4 Complexity estimation of the memory polynomial PD. .......................... 54

4.1 Summary of the ACLR performance of the proposed system. ............... 73
4.2 Comparison of the three digital PD architectures. ............................... 74

5.1 The setup for a four-carrier TM1 signal. .......................................... 79
5.2 Performance for different carrier numbers. ........................................ 84
5.3 Measurement results for a fixed 20-W output power. ........................... 86
5.4 Measurement results with a first fixed ACLR. .................................... 88
5.5 Measurement results with a second fixed ACLR. ................................ 88

6.1 Stratix device features. ................................................................. 95
List of Figures

2.1 Time domain plots of a PA with memory, for a single WCDMA carrier for:
   (a) AM/AM, and (b) AM/PM. ........................................ 11
2.2 Asymmetry of the PA with memory for three WCDMA carriers in the frequency domain. ........................................ 12
2.3 Typical location of memory effects in the FET power amplifiers [2]. ........ 13
2.4 Measured impedance of a MESFET amplifier at the: (a) fundamental frequency, (b) second harmonic frequency, and (c) envelope frequency (base-band) [78]. ............................ 14
2.5 Typical locations of the memory effects in the bias network of bipolar and FET amplifiers [10]. ........................................ 15
2.6 Memory effects due to the mismatching of even harmonics [38, 78, 2]. ..... 16
2.7 Memory effects due to power supply modulation effects [38, 25, 2]. ........ 17
2.8 Electrical circuit model of heat flow from the active device [79]. ........... 18
2.9 Block diagram of a Wiener model. ........................................ 22
2.10 Block diagram of a Hammerstein model. ........................................ 22
2.11 Block diagram of a parallel Hammerstein model. ................................ 23
2.12 Feed-forward architecture. ........................................ 24
2.13 Digital predistorter followed by a power amplifier. ........................... 25
2.14 Power amplifier $P_{out}$ versus $P_{in}$ curve and digital predistortion. ...... 26
2.15 Block diagram of memoryless digital predistortion. ........................... 27
2.16 The indirect learning architecture for the predistorter. ......................... 28
2.17 Definition of the output back-off. ........................................ 31
2.18 Power amplifier efficiency vs. output back-off. .................................. 32
3.1 Block diagram of the piecewise pre-equalized LUT based PD. ................. 37
3.2 Block diagram of the proposed PD with the LUT replaced by a polynomial equation. ........................................... 39
3.3 Piecewise pre-equalized LUT PD graphical expressions: (a) complex gain adjuster response, (b) piecewise equalizer response, (c) response of the cascaded complex gain adjuster and piecewise equalizers, (d) power amplifier response, and (e) desired response from (c) and (d). ......................... 40
3.4 The indirect learning algorithm for the predistorter of the power amplifier. . 42
3.5 Line-up for a 300-W PEP Doherty power amplifier. .......................... 44
3.6 The test bench set-up for modeling of the power amplifier. .................. 44
3.7 In-phase signal modeling results. ........................................... 46
3.8 Quadrature signal modeling results. ........................................ 46
3.9 Frequency domain modeling results. ....................................... 47
3.10 Linearization with a memory-less LUT PD. ................................ 47
3.11 Linearization with the LUT Hammerstein PD. ............................... 48
3.12 Linearization with the proposed piecewise pre-equalized PD. ............... 49
3.13 Linearization with the memory polynomial PD. ............................. 49
3.14 Linearization: (a) without PD, (b) LUT PD, (c) Hammerstein PD with a 5-tap FIR filter, (d) proposed piecewise pre-equalized PD with 2 taps, and (e) memory polynomial PD of 5th order with 2 memory terms. ........ 50
3.15 Experimental results: (a) without PD, (b) LUT PD, (c) Hammerstein PD with a 5-tap FIR filter, (d) proposed piecewise pre-equalized PD with 2 taps, and (e) memory polynomial PD of 5th order with 2 memory terms. .... 52
4.1 Digital baseband predistortion architecture. ............................... 57
4.2 RF envelope digital predistortion architecture. .............................. 58
4.3 Digital basedband derived RF predistortion architecture. .................. 60
4.4 Representation of vector summation. ....................................... 62
4.5 IM3 cancellation performance as a function of $\tau_d$. ....................... 63
4.6 IM5 cancellation performance as a function of $\tau_d$. ....................... 63
4.7 Delay calibration procedure. .............................................. 64
4.8 Test bench for the proposed predistortion system. .......................... 65
4.9 The captured spectrum before linearization, at 47 dBm of the average output power. .................................................... 66
4.10 The captured spectrum after linearization, at 47 dBm of the average output power.

4.11 Measured spectral results for delay dependence of the proposed system: (a) without predistortion, (b) with one sample advance, (c) with one sample delay, and (d) with coarse delay match.

4.12 Measured spectral results for the proposed system, with fractional delay compensation at 44 dBm of the average output power: (a) without predistortion, (b) with predistortion and coarse delay match, and (c) with predistortion and fractional delay match.

4.13 Measured spectral results for the different fractional delays, at 44 dBm of the average output power: (a) with coarse delay, (b) with 3 fractional delay, (c) with 4 fractional delay, and (d) with 5 fractional delay.

4.14 Measured spectral results for the proposed system, with fractional delay compensation at 46 dBm of the average output power: (a) without predistortion, (b) with predistortion and coarse delay match, and (c) with predistortion and fractional delay match.

4.15 Measured spectral results for the different fractional delays, at 46 dBm of the average output power: (a) with coarse delay, (b) with 3 fractional delay, (c) with 4 fractional delay, and (d) with 5 fractional delay.

5.1 Block diagram of the scaled peak cancellation technique.

5.2 Block diagram of the noise shaper for a multi-carrier WCDMA system.

5.3 Block diagram of the peak windowing method.

5.4 The CCDF plot for four WCDMA carriers, before and after the setup in Table 5.1.

5.5 PAPR versus EVM for RPC and PW, with four WCDMA carriers.

5.6 PAPR versus EVM for SRPC with four WCDMA carriers.

5.7 ACLR versus PAPR for four WCDMA carriers.

5.8 PDF of the PW for four WCDMA carriers.

5.9 PDF of the single PC for four WCDMA carriers.

5.10 PDF of the RPC for four WCDMA carriers.

5.11 The test bench for the CFR algorithm.

5.12 Output spectrum of the PA for four WCDMA carriers with different PAPRs.
5.13 Experimental results using the 9.8 dB PAPR signal. .......................... 89
5.14 Experimental results using the 6.5 dB PAPR signal. .......................... 90
5.15 Experimental results using the 5.5 dB PAPR signal. .......................... 90

6.1 An FPGA test bench for digital predistortion. ............................... 94
6.2 Stratix EP1S80 DSP development board components and interfaces. .... 96
6.3 Measurement results at the average 43 dBm output power: (a) without PD,
    (b) with PD, and (c) with a linear output. ........................................ 97
6.4 Measurement results at the average 45 dBm output power: (a) without PD,
    (b) with PD, and (c) with a linear output. ........................................ 98
## List of Abbreviations

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>3G</td>
<td>third generation</td>
</tr>
<tr>
<td>3GPP</td>
<td>third generation partnership project</td>
</tr>
<tr>
<td>ACLR</td>
<td>adjacent channel leakage ratio</td>
</tr>
<tr>
<td>ACP</td>
<td>adjacent channel power</td>
</tr>
<tr>
<td>ACPR</td>
<td>adjacent channel power ratio</td>
</tr>
<tr>
<td>ADC</td>
<td>analog-to-digital converter</td>
</tr>
<tr>
<td>ADS</td>
<td>advanced design system</td>
</tr>
<tr>
<td>AM/AM</td>
<td>amplitude modulation/amplitude modulation</td>
</tr>
<tr>
<td>AM/PM</td>
<td>amplitude modulation/phase modulation</td>
</tr>
<tr>
<td>ASIC</td>
<td>application specific integrated circuit</td>
</tr>
<tr>
<td>CCDF</td>
<td>complimentary cumulative distribution function</td>
</tr>
<tr>
<td>CDMA</td>
<td>code division multiple access</td>
</tr>
<tr>
<td>CF</td>
<td>crest factor</td>
</tr>
<tr>
<td>CFR</td>
<td>crest factor reduction</td>
</tr>
<tr>
<td>DAC</td>
<td>digital-to-analog converter</td>
</tr>
<tr>
<td>dB</td>
<td>decibel</td>
</tr>
<tr>
<td>dBC</td>
<td>decibel relative to a carrier level</td>
</tr>
<tr>
<td>dBm</td>
<td>decibel relative to a milliwatt</td>
</tr>
<tr>
<td>DFFLPA</td>
<td>Doherty feedforward linear power amplifier</td>
</tr>
<tr>
<td>DPA</td>
<td>Doherty power amplifier</td>
</tr>
<tr>
<td>DPC</td>
<td>dedicated physical channel</td>
</tr>
<tr>
<td>DPD</td>
<td>Doherty predistortion</td>
</tr>
<tr>
<td>DSP</td>
<td>digital signal processing</td>
</tr>
<tr>
<td>DUT</td>
<td>device under test</td>
</tr>
</tbody>
</table>

xvi
<table>
<thead>
<tr>
<th>Acronym</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>EDGE</td>
<td>enhanced data rates for GSM evolution</td>
</tr>
<tr>
<td>ESG</td>
<td>electronic signal generator</td>
</tr>
<tr>
<td>EVM</td>
<td>error vector magnitude</td>
</tr>
<tr>
<td>FET</td>
<td>field effect transistor</td>
</tr>
<tr>
<td>FFLPA</td>
<td>feed-forward linear power amplifier</td>
</tr>
<tr>
<td>FIR</td>
<td>finite impulse response</td>
</tr>
<tr>
<td>FM</td>
<td>frequency modulation</td>
</tr>
<tr>
<td>FPGA</td>
<td>field programable gate array</td>
</tr>
<tr>
<td>GMSK</td>
<td>gaussian minimum shift keying</td>
</tr>
<tr>
<td>GSM</td>
<td>global system for mobile communications</td>
</tr>
<tr>
<td>HDL</td>
<td>hardware description language</td>
</tr>
<tr>
<td>HPA</td>
<td>high power amplifier</td>
</tr>
<tr>
<td>IIR</td>
<td>infinite impulse response</td>
</tr>
<tr>
<td>IM3</td>
<td>3rd order intermodulation distortion</td>
</tr>
<tr>
<td>IMD</td>
<td>intermodulation distortion</td>
</tr>
<tr>
<td>LAN</td>
<td>local area network</td>
</tr>
<tr>
<td>LDMOS</td>
<td>laterally diffused metal oxide semiconductor</td>
</tr>
<tr>
<td>LINC</td>
<td>linear amplification using nonlinear components</td>
</tr>
<tr>
<td>LMS</td>
<td>least mean square</td>
</tr>
<tr>
<td>LPF</td>
<td>low pass filter</td>
</tr>
<tr>
<td>LTI</td>
<td>linear time-invariant</td>
</tr>
<tr>
<td>LUT</td>
<td>lookup table</td>
</tr>
<tr>
<td>MESFET</td>
<td>metal semiconductor FET</td>
</tr>
<tr>
<td>OBO</td>
<td>output back-off</td>
</tr>
<tr>
<td>OFDM</td>
<td>orthogonal frequency division multiplexing</td>
</tr>
<tr>
<td>PA</td>
<td>power amplifier</td>
</tr>
<tr>
<td>PAE</td>
<td>power added efficiency</td>
</tr>
<tr>
<td>PAPR</td>
<td>peak-to-average power ratio</td>
</tr>
<tr>
<td>PD</td>
<td>predistortion</td>
</tr>
<tr>
<td>PDF</td>
<td>probability density function</td>
</tr>
<tr>
<td>PEP</td>
<td>peak envelope power</td>
</tr>
<tr>
<td>PW</td>
<td>peak windowing</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Description</td>
</tr>
<tr>
<td>-------------</td>
<td>-----------------------------------------</td>
</tr>
<tr>
<td>QPSK</td>
<td>quadrature phase-shift keying</td>
</tr>
<tr>
<td>RF</td>
<td>radio frequency</td>
</tr>
<tr>
<td>RLS</td>
<td>recursive least square</td>
</tr>
<tr>
<td>RPC</td>
<td>repeated pulse cancellation</td>
</tr>
<tr>
<td>SQRT</td>
<td>square root</td>
</tr>
<tr>
<td>SRPC</td>
<td>scaled repeated pulse cancellation</td>
</tr>
<tr>
<td>TS</td>
<td>test specification</td>
</tr>
<tr>
<td>UMTS</td>
<td>universal mobile telecommunications system</td>
</tr>
<tr>
<td>VSA</td>
<td>vector signal analyzer</td>
</tr>
<tr>
<td>WCDMA</td>
<td>wideband code division multiple access</td>
</tr>
</tbody>
</table>
Chapter 1

Introduction

1.1 Motivation

Reliable cellular service requires clean and consistent transmission from base stations, under widely and rapidly changing conditions. The base station's radio frequency (RF) power amplifiers (PAs) are key in guaranteeing this reliability. Spectral efficiency has always been important in mobile communications. Now, modern second- and third-generation digital systems demand that PA linearity and efficiency also be included as crucial performance requirements. These amplifiers are found in cellular base stations that support the code division multiple access (CDMA) family of wireless standards (e.g. cdma2000, 3rd Generation Partnership Project (3GPP), or wideband CDMA (WCDMA)), as well as improvements to existing standards (e.g. enhanced data rates for global system for mobile communications (GSM) evolution (EDGE)). Due to the use of quadrature modulation and multiple carriers, the signal power in many of these applications fluctuates significantly over time. This means that the signal has a high peak-to-average-power ratio (PAPR) when compared with analog frequency modulation (FM) or Gaussian minimum shift keying (GMSK) modulation, as used in GSM.

Although the aforementioned systems maintain good spectral efficiency, the varying envelope of the signal generates spectral re-growth in the adjacent channels and in-band distortion. When amplified, this leads to a degradation in the error vector magnitude (EVM), since RF PAs are inherently nonlinear. A tradeoff exists between linearity and efficiency; power efficiency is very low when the amplifier operates in its linear region and increases as the amplifier is driven into its compression region. In order to enhance both
linearity and efficiency at the same time, one of the various linearization techniques should be applied to a efficient RF PA. A Doherty power amplifier (DPA), for example, is able to achieve higher efficiency than traditional PAs, although at the expense of linearity.

Many systematic methods for reducing nonlinear distortion, often called linearization, have been developed. The feed-forward linearization method is the most well known for providing good linearity; however, the technique has poor efficiency and an undesirable reliance on complex, expensive and potentially difficult to maintain analog hardware. Feed-forward linearization first samples the output of the amplifier and reduces it to the same level as the input signal. The reduced value is then subtracted from the input, leaving only the distortion generated by the amplifier. The distortion signal is increased by a separate amplifier, in order to obtain the same level as the main output, and is then subtracted from the original amplifier output signal. The result is a linearly amplified version of the input signal. Feed-forward linearization has been successfully employed in many communication systems. However, it is difficult to apply to existing amplifiers and can only provide good results if the RF PA output back-off level is at least in the order of 6 to 7 dB.

In addition to feed-forward, several other linearization methods have been used. These include analog predistortion, linear amplification using nonlinear components (LINC), and cartesian feedback. As with feed-forward methods and its variants, these techniques involve a considerable amount of added analog hardware. In addition, they may require the use of nonlinear components, the characteristics of which are difficult to control to the degree of precision necessary to achieve the desired improvements.

The most promising and cost-effective linearization technique is adaptive baseband digital predistortion (PD), which has recently demonstrated notable success in correcting the nonlinearity of RF PAs. Due to PD’s digital implementation, it concurrently benefits from the continuous improvements of digital signal processing (DSP) and field-programmable gate array (FPGA) circuitry. Thus, PD provides significant accuracy and flexibility, has better power efficiency, and reduced implementation complexity.

Ideally, in the case of memory-less amplifiers, the nonlinearity of an RF PA should be the same for all signals, at all frequencies. Unfortunately, most RF PAs have some degree of memory, making their output dependent on not only the current input signal, but also the previous input signals. In other words, the output signal will be influenced by the frequency of the signal’s envelope, the frequency of the signal itself, and the temperature [10, 33]. These memory effects substantially limit the maximum achievable cancellation performance.
CHAPTER 1. INTRODUCTION

of memory-less digital PD systems for wideband applications.

Using a DPA along with digital PD presents a system to achieve both high efficiency and linearity. However, due to the high PAPR of the input signal, a limitation still exists concerning efficiency. This high PAPR sets a maximum correctable point when using PD. Therefore, it is desirable to apply one of the PAPR reduction techniques, in order to drive the PA closer to its saturation power level. The use of deliberate envelope clipping to digitally distort the signal, while maintaining the signal quality at a sufficient level, is a simple and practical way to decrease PAPR. Moreover, reducing the PAPR via clipping presents the possibility of utilizing the dynamic range of the digital-to-analog converter (DAC) more efficiently. Therefore, combining digitally pre-distorted DPA with a crest factor reduction (CFR) technique has the potential of maximizing system linearity and overall efficiency.

1.2 History of digital predistortion

The maturity of this research area has resulted in an abundance of research papers on digital PD over the last twenty years. The first practical implementation of a gain based digital predistorter was proposed by James Cavers [17] in 1990. Prior to this method, the majority of digital PDs were based on the mapping predistorter principle, in which each possible signal level was directly mapped to an output level [53].

Linear distortions (impairments) in the forward or feedback paths of the digital predistorter can affect its performance and therefore, must be reduced. Major sources of these linear distortions are the quadrature modulator and demodulator and the reconstruction and anti-alias filters. The first method to analyze and correct the quadrature modulator was developed by Faulkner et al. [32] in 1991. In 1993, Cavers et al. [19] proposed the adaptive compensation method for errors in direct conversion transceivers, followed by an in depth analysis in 1997 [18]. The effects of reconstruction filters were first analyzed by Sundstrom et al. [71], without providing a solution to the problem. More recently, digital up and down converters have been implemented, which do not exhibit the impairments of the analog modulator and demodulator: the converter is a popular choice for the system due to advances in DSP technology.

The memory effects exhibited by the wideband transmitter (PA) significantly limit the ability of the memory-less predistorter to suppress the spectrum re-growth [39]. Therefore, different predistorter architectures, which are intended to compensate for the nonlinearity
as well as the memory effects, have been reported in the literature. For example, a Volterra predistorter using an indirect learning algorithm was proposed in [30]. In order to reduce the number of coefficients to be estimated, a memory polynomial predistorter, a simplified version of the Volterra predistorter, was implemented to address these effects [41]. However, the polynomial based memory predistorter suffers from a numerical instability when higher order polynomial terms are included, since a matrix inversion is needed for determination of the polynomial coefficients [28]. Alternatively, Raich et al. [63] employed orthogonal polynomials to alleviate the numerical instability problem associated with traditional polynomials.

Two-box based predistorters are another type of common PD architectures, which are referred to as either Hammerstein or Wiener predistorters according to the cascading order of the nonlinear and linear blocks. For example, a Hammerstein predistorter, which is a cascade of a memory-less nonlinear block followed by a linear filter, has been used to compensate for the nonlinearity as well as the memory effects of a PA [37, 29, 27]. Wang and How [80] have demonstrated the compensation performance of a Wiener predistorter, used to linearize a high power amplifier (HPA) with memory effects in an orthogonal frequency-division multiplexing (OFDM) transmitter, while considering the HPA as a Hammerstein nonlinear system. In these two examples, the memory-less nonlinearity was represented by a complex high order memory-less polynomial. In addition, the identification of the memory-less nonlinear block coefficients and the linear filter taps are concurrently resolved by means of complicated algorithms, which are applied in either the time domain [37, 27] or the frequency domain [80].

1.3 History of crest factor reduction

The various PAPR techniques can be categorized into two groups, depending on whether they use linear techniques (modulation- and coding-dependent) or nonlinear techniques (modulation- and coding-independent). Methods that use linear techniques for OFDM systems do not distort the signal in the time domain and therefore, the spectral properties are not altered [59, 13, 43]. Conversely, nonlinear techniques modify the envelope of the time domain signal and are mainly based on clipping-filtering and windowing [60, 77]. To suppress peak re-growth when filtering the out-of-band distortion of the clipped signal, iterative clipping and filtering methods for OFDM systems have been proposed in [3] and [46].
These works suggested that iterative clipping and filtering of the clipped pulses would reduce the convergence rate to the targeted PAPR. However, repeated clipping and filtering techniques that have been implemented for OFDM systems require several iterations to converge to the desired PAPR level, implying that it is not an efficient algorithm for hardware implementation.

1.4 Contribution

This thesis presents several original contributions to the field of computationally efficient digital PD architecture design. The research takes into account memory effects, structurally efficient digital PD systems, and crest factor reduction. Details of the original contributions are given below.

First, a new digital predistorter for memory effect compensation was proposed, which applied multiple lookup tables (LUTs). The contributions are as follows:

- The new predistorter based on piecewise pre-equalizers was developed for use in multi-channel wideband applications, in order to compensate for envelope memory effects.
- A novel LUT-based method for envelope memory effects was implemented, which was able to compensate for not only the nonlinearity of the RF PAs but also the envelope memory effects.
- The computational complexity was analyzed and compared with the conventional memory polynomial PD approach: it was found that the proposed structure is very efficient in computation.

Second, we implemented a baseband derived RF digital PD system, which offers several advantages over conventional digital baseband PD and RF envelope digital PD.

- A digital PD architecture using a vector modulator was implemented and derived from LUT coefficients at digital baseband. This structure reduces the bandwidth requirements of the digital-to-analog converters (DACs) and reconstruction filters, when compared to the digital baseband predistorter. It also removes the inaccuracy of the RF detector and the large RF delay lines in the RF envelope digital PD system.
• A new delay calibration method using correlation was developed, in order to synchronize the two paths of the system.

Third, a novel crest factor algorithm, which saves hardware resources and is easy to implement, was developed and applied to a PA, a Doherty feed-forward linear power amplifier (DFFLPA), and a digital PD, respectively.

• A novel crest factor reduction (CFR) algorithm was developed based on iterative peak cancellation. This new CFR method saves hardware resources by means of a scaling factor.

• The new CFR algorithm was applied to a class AB, a DFFLPA, and a digital PD, in order to maximize the efficiency of the system. In addition, the efficiency of a DFFLPA was compared to the efficiency of a digital PD using the proposed algorithm.

1.4.1 List of publications

The following list details the publications that have resulted from the work presented in this thesis:

**Journal Papers**


CHAPTER 1. INTRODUCTION


Conference Papers


CHAPTER 1. INTRODUCTION

1.5 Outline

The remainder of thesis is organized as follows:

Chapter 2 introduces basic theories and gives a literature review of the modeling and predistortion of power amplifiers.

Chapter 3 describes a predistorter based on piecewise pre-equalizers for use in multi-channel wideband applications. It takes advantage of multiple finite impulse response (FIR) filters, which significantly reduce the complexity when compared to memory polynomial methods.

In Chapter 4, a unique baseband derived RF predistortion system is presented, which uses LUT coefficients extracted at baseband to directly RF envelope modulate a quadrature vector modulator. The primary advantage of this architecture is that it combines the narrowband benefit of envelope predistortion with the accuracy of baseband predistortion.

Chapter 5 presents a novel efficient crest factor reduction technique for wideband applications. The technique is based on using peak cancellation to reduce the peak-to-average power ratio (PAPR) of the input signal. This technique is applied to a class AB power amplifier, a Doherty power amplifier, a Doherty feed-forward linear power amplifier, and a digitally predistorted Doherty power amplifier, in order to illustrate the efficiency enhancement.

In Chapter 6, a digital predistortion test-bed, which uses a field programmable gate array (FPGA) board, is described.

Finally, Chapter 7 summarizes the dissertation and provides future research directions.
Chapter 2

Background

In this chapter, we review models of a power amplifier and linearization for memory-less and memory affected systems.

2.1 Behavioral models of power amplifiers

2.1.1 Memoryless model

In the passband, a memoryless power amplifier can be described as a nonlinear function. This memory-less nonlinearity can be approximated by a power series

\[ \tilde{y}(t) = \sum_{p=1}^{P} \tilde{b}_p \tilde{x}^p(t). \]  

(2.1)

where \( \tilde{b}_p \) are real-valued coefficients, \( \tilde{x}(t) \) is the passband power amplifier input, and \( \tilde{y}(t) \) is the passband power amplifier output. In the baseband, (2.1) becomes [9]

\[ y(t) = \sum_{\substack{p=1 \\ \text{odd}}}^{P} b_p x(t)|x(t)|^{p-1}. \]  

(2.2)

where

\[ b_p = 2^{1-p} \left( \frac{p}{p-1} \right) \tilde{b}_p. \]  

(2.3)

\( x(t) \) is the baseband power amplifier input, and \( y(t) \) is the baseband power amplifier output. Note that (2.2) only contains odd order terms, as the signals generated from the even order
terms in (2.1) are far from the carrier frequency. Thus, they do not contribute to the baseband output $y(t)$. The coefficients $b_p$ are complex valued and thus introduce amplitude and phase distortion to the input signal; this gives rise to the amplitude modulation/amplitude modulation (AM/AM) and amplitude modulation/phase modulation (AM/PM) conversion of the power amplifiers. In other words, the AM/AM conversion is the nonlinear function mapping from $|x(t)|$ to $|y(t)|$; the AM/PM conversion is the nonlinear function mapping from $|x(t)|$ to the output phase deviation $\angle y(t) - \angle x(t)$. Expressing $x(t) = |x(t)|e^{j\phi x(t)}$, we can rewrite (2.2) as

$$y(t) = e^{j\phi x(t)}F(|x(t)|),$$

(2.4)

where

$$F(|x(t)|) = \sum_{k=1, \text{odd}}^{K} b_k |z(t)|^k.$$  

(2.5)

From (2.4) and (2.5), it follows that $|y(t)| = |F(|x(t)|)|$, $\angle y(t) - \angle x(t) = \angle F(|x(t)|)$; in other words, $|F(\bullet)|$ is the AM/AM response and $\angle F(\bullet)$ is the AM/PM response.

### 2.1.2 Memory effects

In recent years, memory effects have been the subject of intensifying investigation. In the time domain and for a single WCDMA carrier, the memory effect phenomena can be illustrated as dynamic AM/AM and AM/PM; refer to Figure 2.1. In the frequency domain and for a multi-carrier WCDMA signal, the memory effect produces an asymmetric intermodulation distortion (IMD); refer to Figure 2.2.

In the literature, memory effects [78] have been referred to as bandwidth-dependent distortion [14] (more specifically slow or long-term memory effects [67]), low-frequency memory [34] effects, dynamic system effects [61], slow dynamic effects [50], and rate-dependent effects [58]. The memory effects are generally classified into two groups [78].

- **High frequency memory effects (Short term memory effects)**

  The output response depends on the actual value of its input and on the past samples at the RF time scale. In this case, the impulse response has a short time period. Possible sources include charge storage in semiconductor devices, transit times in semiconductor devices, and mismatch at fundamental and harmonic frequencies.

- **Envelope frequency memory effects (Long term memory effects)**
Figure 2.1: Time domain plots of a P1 with memory, for a single WCDMA carrier for: (a) AM/AM, and (b) AM/PM.
Figure 2.2: Asymmetry of the PA with memory for three WCDMA carriers in the frequency domain.
Figure 2.3: Typical location of memory effects in the FET power amplifiers [2].

The output response depends on the actual value of its input and on the past samples at the envelope time scale. In this case, the impulse response presents a long time tail behavior. Possible sources include dynamic self-heating in semiconductor devices, bias-line coupling, and dynamic trapping effects in field effect transistors (FETs).

In practice, there are numerous possible sources at the device-level which contribute to the memory effects. These memory effects have been extensively analyzed in the literature for various transistor technologies; research indicates that the internal device components show dispersion and thermal effects at low frequencies, ranging from 10 KHz to several MHz [56, 4, 57, 69, 58, 16, 15].

Moreover, these effects are compounded by amplifier nonlinearity and additional mismatching, impedance variations and decoupling effects in the circuits external to the amplifier. This results in an output signal that is not only a function of the instantaneous input signal, but also a function of previous values.

Bias networks

Generally, bias networks are required to supply an appropriate (and constant) voltage or current to the gate or base of the active device. In other words, the bias network isolates the RF signal so it can be fed into the bias supply, avoiding active device instability [36, 25].

Traditionally, the bias circuit is usually designed to provide a relatively high impedance at the RF frequency, when compared to the impedance of the input and output of the
device and matching circuits. In the case of the input signal with non-constant envelope, the envelope frequency variations cause changes in the impedance of the bias networks; in turn, these cause nonlinearity variations as a function of envelope frequency.

Vaiden and Blendon [78] measured the variation of the bias network's gate mode impedance for a metal-semiconductor FET (MESFET) amplifier, as a function of the envelope, fundamental, and second harmonic frequencies, refer to Figure 2.1. The center and maximum modulation frequencies are 1.8 GHz and 20 MHz, respectively; this means that the envelope frequency band is important up to 20 MHz or more. The fundamental band of interest is between 1.77 GHz and 1.83 GHz: the entire third order intermodulation (IM3) band of 60 MHz is relevant in terms of IM3 distortion. The second harmonic band lies between 3.58 GHz and 3.62 GHz. In this case, the fundamental impedance can easily be kept constant over the entire modulation frequency range, since it is just 0.3 % of the center frequency. In addition, the second harmonic band is quite narrow, making impedance matching simple, provided that no harmonic traps are used; the use of such traps causes tremendous impedance variations and may cause significant memory effects. Since the base-band impedance changes with frequency significantly, the level of IMBs will also change with the envelope frequency.

Brock and Gatti [10] demonstrated that the electrical memory effects can be changed by varying the values of the energy storing elements, such as capacitances and inductances in the biasing networks of the PA. Figure 2.5 shows the locations of these elements in typical
Mismatching of even harmonics

In the amplifier, the input/output matching circuits are designed to overcome the mismatch between the active nonlinear device and the input/output termination at the fundamental frequency. Due to its nonlinearity, when the two-tone input signal is driven, the signals generated from the active device at harmonic zones 0, 2, 3, ..., N are reflected back to the input side, rather than being absorbed by the input matching circuits; indeed, this property is sometimes used as a predistortion mechanism. These reflected signals are mixed again in the active device with the two-tone input signal. This generates an additional nonlinearity in the output signal due to modification of the output spectrum, as shown in Figure 2.6 [38, 78].
Figure 2.6: Memory effects due to the mismatching of even harmonics [38, 78, 2].

It is important to note that the active device two-tone input signal is mixed with the reflected signal at even harmonic zones. This results in a modification of the odd order IMD (zone 2 in Figure 2.6). Whereas, mixing the two-tone input signal with the reflected signal at odd harmonic zones will not result in a change of the odd order IMD.

As shown in Figure 2.6, the resulting 3rd order IMD is a vector summation of: a) the signals at IMD produced from the 3rd order nonlinearity, and b) the signals at IMD produced from the second order nonlinearity in the active device. Here, the mixing process has been confined to the 3rd order nonlinearity, since at the higher order, the mixing mechanism becomes very complicated [78].

In order to get rid of mismatching problems at second order frequencies, the input and output matching circuits should be designed not only at the amplifier’s RF frequency band, but also at the second harmonic and envelope frequency bands.

Power supply variations

The amplifier power supplies provide the necessary voltage or current to the gate and drain sides for active power device operation. Moreover, different amplifier operation classes give different output current waveforms, with different conduction angles [38, 26, 25].
Figure 2.7: Memory effects due to power supply modulation effects [38, 25, 2].

Particularly, in the case of the input signal with a non-constant envelope, a current variation drawn from the power supply causes voltage variations [38, 26, 25]. However, these voltage variations cause an additional amplitude modulation of the RF signal, as illustrated in Figure 2.7. In addition, when the impedance of the bias network is reactive, i.e. not short [21, 40], the extra amplitude modulation will be out of phase compared to the original RF signal. This results in envelope frequency dependent memory effects or asymmetry of the output signal.

To reduce power supply modulation effects, the bias circuit should be carefully designed in order to isolate the current variations drawn from the power supply. It has been shown that tuning the bias network in the video band can satisfactorily eliminate the asymmetrical effects [68].

Self-heating effects

Electrothermal memory effects, also called thermal memory effects, are caused by electrothermal couplings, which affect low modulation frequencies up to the megahertz range.
The dissipated power in the active devices is the main source of thermal memory effect, since the temperature changes which are dependent on the dissipated power do not occur instantaneously [79, 56]. As an example, let us consider the dissipated power in a FET

\[ P_{\text{Diss}} = V_{DS} \cdot I_D. \]  

(2.6)

where \( V_{DS} \) and \( I_D \) are the drain-source voltage and drain current, respectively. The temperature variations caused by the dissipated power are determined by the thermal impedance \( (Z_{TH}) \), which describes the ratio between temperature rise and heat flow from the device.

The heat flow from the chip, packaging, and heat sink to the surrounding environment can be modeled as shown in Figure 2.8. The effects of the package and heat sink are important when considering the thermal resistance, which determines the average temperature rise caused by self-heating. For a FET on a silicon substrate, the silicon surface reacts to the dissipated power quickly. This results in chip temperature fluctuation of up to several degrees over an input signal bandwidth of several hundreds of \( k\text{Hz} \). Hence, the chip temperature is envelope-frequency-dependent and takes the following simple form [79]

\[ T_j = T_{\text{amb}} + R_{TH} \cdot P_{\text{Diss}}(\omega e) + Z_{TH}(\omega_2 - \omega_1) \cdot P_{\text{Diss}}(\omega_2 - \omega_1). \]  

(2.7)

where \( T_{\text{amb}} \) is the ambient temperature, \( R_{TH} \) is the thermal resistance resulting in the average temperature rise caused by self-heating, \( P_{\text{Diss}}(\omega e) \) and \( P_{\text{Diss}}(\omega_2 - \omega_1) \) are the dissipated power at \( \omega e \) and envelope frequency, respectively, and \( Z_{TH} \) is the thermal impedance of the chip at the envelope frequency. It is interesting to note that the third term in (2.7)
includes frequency, which means that the temperature variations at the surface of the chip also depend on the bandwidth of the signal. $Z_{TH}$ is the main cause of the thermal memory effects in the active power devices.

Since some of the transistor parameters, such as drain-source current, output conductance, and capacitance, are temperature dependent \[78\], these dynamic self-heating cause distortions. The mechanism in which dynamic self-heating causes electrical distortion is known as thermal power feedback \[45\].

Therefore, theoretically all power transistors exhibit thermal memory effects. The dissipated power at high frequencies (\(>1MHz\)) is too fast to influence the instantaneous temperature of the silicon chip and packaging. This implies that the effects of the transistor's self-heating phenomenon are more important under narrow-band signals, e.g., enhanced data rates for global system for mobile communications (GSM) evolution (EDGE) or GSM), than under signals with wide modulation bandwidths, e.g., multi-carrier third generation CDMA2000 and UMTS) \[79, 34, 12\].

### 2.1.3 Volterra series

The Volterra model is used to describe the response of the nonlinear system and is valid for weakly nonlinear time invariant systems with fading memory. An example of which is power amplifiers in telecommunication applications. Volterra theory describes a nonlinear system, $H$, with input, $x(t)$, and output, $y(t)$, as a "Taylor series with memory" \[8, 66\]. In this section, $\hat{y}(t)$, $\theta(t)$, and $b(t)$ are real valued.

\[
\hat{y}(t) = H[x(t)] = \hat{y}_1(t) + \hat{y}_2(t) + \hat{y}_3(t) + \cdots \quad \text{(2.8)}
\]

where

\[
\hat{y}_n(t) = H_n(\eta(t)) = \int h_n(\tau) x(t - \tau) d\tau \quad \text{(2.9)}
\]

is the linear term, with $h_n(\tau)$ being the first order time domain response function (i.e., the impulse response) or Volterra kernel. The higher order terms are given by

\[
\hat{y}_n(t) = H_n[x(t)] = \int \cdots \int h_n(\tau_1, ..., \tau_n) x(t - \tau_1) x(t - \tau_2) \cdots x(t - \tau_n) d\tau_1 \cdots d\tau_n, \quad \text{(2.10)}
\]

where $h_n(\tau_1, ..., \tau_n)$ is the $n$th order time domain response function.
A discrete-time truncated Volterra model can be described as [75]
\[
\hat{y}(n) = \sum_{i=0}^{\infty} h_i(m) x(n-i) + \sum_{m_1=0}^{M_1} \sum_{m_2=0}^{M_2} h_{2}(m_1, m_2) x(n-m_1) x(n-m_2) + \ldots \\
= \sum_{i=0}^{\infty} \sum_{m_1=0}^{M_1} \ldots \sum_{m_k=0}^{M_k} h_k(m_1, \ldots, m_k) \prod_{j=1}^{k} x(n-m_j) + \varepsilon(n),
\]
(2.11)
where \(n\) is the discrete-time, \(h_k(m_1, \ldots, m_k)\) is the \(k\)th order Volterra kernel associated with the system's \(k\)th order nonlinearities, and \(\varepsilon(n)\) is the modeling error. \(O\) and \(M\) are the truncated model order and the "memory length", respectively [86, 65]. In order to avoid redundancy, the \(O\)th order symmetric Volterra model can be described by [74]
\[
\hat{y}(n) = \sum_{i=0}^{\infty} h_i(m) x(n-i) + \sum_{m_1=0}^{M_1} \sum_{m_1 \leq m_2 \leq M_2} h_{2}(m_1, m_2) x(n-m_1) x(n-m_2) + \ldots + \sum_{m_1=0}^{M_1} \sum_{m_1 \leq m_2 \leq M_2} \ldots \sum_{m_k=0}^{M_k} h_k(m_1, \ldots, m_k) x(n-m_1) x(n-m_2) \ldots x(n-m_k) + \varepsilon(n).
\]
(2.12)
The number of parameters typically increases with the nonlinear order, \(O\), and memory length, \(M\), and is given by [74]
\[
O = \sum_{k=1}^{M} \binom{M + k}{k} - \sum_{k=1}^{M} \binom{M + k}{k-1}.
\]
(2.13)

Only odd order terms result in an output signal in the fundamental zone, i.e. around \(f\) [25]. The input and output signals can be described by their complex envelopes, \(x(t)\) and \(y(t)\), where \(y(t) = \|y(t)e^{j\omega t}\|\) and \(y(t) = \|y(t)e^{j\omega t}\|\). The discrete-time, odd order, infinite memory, complex envelope Volterra model can be described as
\[
\hat{y}(n) = \sum_{i=0}^{\infty} h_i(m) x(n-i) + \sum_{m_1=0}^{M_1} \sum_{m_2=0}^{M_2} h_{2}(m_1, m_2) x(n-m_1) x(n-m_2) x^*(n-m_1) x^*(n-m_2) + \ldots \\
= \sum_{i=0}^{\infty} \sum_{m_1=0}^{M_1} \ldots \sum_{m_k=0}^{M_k} h_k(m_1, \ldots, m_k) x(n-m_1) x(n-m_2) \ldots x(n-m_k) + \varepsilon(n),
\]
(2.14)
where \(\cdot^*\) denotes the complex conjugate. In (2.14), the even-order kernels are removed, as their effects can be omitted in band-limited modulation systems. If the redundant items
Table 2.1: The number of parameters in the complex-valued Volterra model of (2.16), as a function of order and memory length.

<table>
<thead>
<tr>
<th>Model Order (O)</th>
<th>M=0</th>
<th>M=1</th>
<th>M=2</th>
<th>M=3</th>
<th>M=4</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>2</td>
<td>8</td>
<td>21</td>
<td>44</td>
<td>80</td>
</tr>
<tr>
<td>5</td>
<td>3</td>
<td>20</td>
<td>81</td>
<td>214</td>
<td>405</td>
</tr>
<tr>
<td>7</td>
<td>4</td>
<td>40</td>
<td>234</td>
<td>944</td>
<td>3055</td>
</tr>
<tr>
<td>9</td>
<td>5</td>
<td>70</td>
<td>546</td>
<td>2904</td>
<td>11875</td>
</tr>
</tbody>
</table>

due to kernel symmetry also are removed, the Volterra model becomes

\[
g(n) = \sum_{m=0}^{M} b_m(n)x(n-m) \\
\sum_{m_1=0}^{M} \sum_{m_2=0}^{M} \sum_{m_3=0}^{M} b_3(m_1, m_2, m_3) x(n-m_1)x(n-m_2)x(n-m_3) + \ldots \\
\sum_{m_1=0}^{M} \sum_{m_2=0}^{M} \sum_{m_3=0}^{M} \sum_{m_4=0}^{M} \sum_{m_5=0}^{M} b_5(m_1, m_2, m_3, m_4, m_5) x(n-m_1)x(n-m_2)x(n-m_3)x(n-m_4)x(n-m_5) + \ldots \\
\times x(n-m_1)x(n-m_2)x(n-m_3)x(n-m_4)x(n-m_5) + \ldots x(n-m_{O+1})x(n-m_{O+2})x(n-m_{O+3}) \ldots x(n-m_O),
\]

where \(O\) is the odd order of the truncated model. The number of complex-valued parameters of the Volterra model given in (2.15) can be calculated as

\[
\text{Number of parameters} = \sum_{p=0}^{O} \binom{O}{2p+1} (M+p)!^2 (M+p+1) - (M)!^2 (p+1)\frac{2}{2p+1}.
\]

Note that when the memory length is equal to \(M\), the "memory span" is equal to \(M+1\).

Table 2.1 shows how the number of parameters increases with order and memory length. From Table 2.1, we can conclude that it is not realistic to use the Volterra model for higher orders of nonlinearity, together with long memory length.

### 2.1.4 Wiener model

In order to avoid the complexity issue associated with the Volterra series representation, other models have been offered; one example is the Wiener model. The Wiener model is a linear time-invariant (LTI) system, \(H(\cdot)\), followed by a memory-less nonlinearity, \(X(\cdot)\), as shown in Figure 2.9 [10, 24].
The output, \( y(n) \), is given by

\[
y(n) = N[H(z)x(n)] = N \left[ \sum_{m=0}^{M} h_m x(n-m) \right]
\]

\[
= \sum_{p=1}^{P} b_{2p-1} \left[ \sum_{m=0}^{M} h_m x(n-m) \right] \left[ \sum_{m=0}^{M} h_m x(n-m) \right]^{2(p-1)}
\]  \hspace{1cm} (2.17)

where \( P \) denotes the maximum polynomial order, \( 2P - 1 \), \( M \) is the number of previous samples considered, i.e. the memory length of the mode, and \( N(\bullet) \) is defined as an odd polynomial model with order \( 2p - 1 \), i.e. \( \sum_{p=1}^{P} b_{2p-1} x(n)x(n)^{2(p-1)} \). Note that a finite impulse response filter is used to represent \( H(z) \). In [24], Clark et al. used a Wiener model to capture the nonlinear memory effects in the power amplifier associated with wideband signals.

### 2.1.5 Hammerstein model

The Hammerstein model is another nonlinear model with memory, which offers a complexity similar to that of the Wiener model. The Hammerstein system is a memoryless nonlinearity, \( N(\bullet) \), followed by an LTI system, \( H(z) \); refer to Figure 2.10 [54, 27].

The output, \( y(n) \), of a Hammerstein system can be written as

\[
y(n) = H(z)N[x(n)]
\]

\[
= \sum_{m=0}^{M} h_m \sum_{p=1}^{P} b_{2p-1} x(n-m)|x(n-m)|^{2(p-1)}. \hspace{1cm} (2.18)
\]
where $M$ is the number of previous samples considered, i.e. the memory length of the model, and $N(\bullet)$ is defined as an odd polynomial model with order $2p - 1$, i.e. $\sum_{p=1}^{P} b_{2p-1} x(n)|x(n)|^{2(p-1)}$. In this case, a finite impulse response filter is used to represent $H(z)$. The output of the Hammerstein model is linear with respect to the LTI parameters, as opposed to the Wiener model which is not.

### 2.1.6 Parallel Hammerstein model

In Figure 2.11, a block diagram of the parallel Hammerstein model is presented, which consists of multiple Hammerstein branches with a common input and an output comprised of the summation of the individual branch outputs. This system can be represented by

\[
y(n) = \sum_{p=1}^{P} H_{2p-1}(z)x(n)|x(n)|^{2(p-1)} \\
= \sum_{p=1}^{P} \sum_{m=0}^{M} b_{m,2p-1} x(n-m)|x(n-m)|^{2(p-1)}.
\]

(2.19)

which in [41, 28] is referred to as a *memory polynomial*. Similar to how the Hammerstein model has polynomial nonlinearity, the memory polynomial is linear with respect to its coefficients, $b_{m,2p-1}$. This property allows the use of linear techniques to identify a power amplifier using the memory polynomial model. Compared to the Wiener and Hammerstein models, this model is more general and therefore, can provide a more accurate model of the power amplifier.

![Figure 2.11: Block diagram of a parallel Hammerstein model.](image)


2.2 Power amplifier linearization

Section 2.1 described how power amplifier models, with and without memory, can aid in selecting a PA to match a given application. To ensure linear amplification of the signal, a PA with a higher than required power is usually selected, such that the input signal falls within the linear region of the PA. Although during this "back-off" approach the PA is using a high DC power, it only utilizes a small portion of its allowed input range; this can result in a significant PA power specification increase and reduced efficiency. The linearization approach offers a remedy to this problem. PA linearization can be implemented using various different architectures. In this research, we consider the two most well known architectures: feed-forward linearization and digital predistortion.

2.2.1 Feed-forward linearization

The feed-forward linearization technique, which is currently used in the mobile communication systems' base stations, is the most well known technique and achieves high linearization performance. The drawbacks of this system are the low power efficiency, due to the high power requirement of the class A mode error amplifier, and losses due to couplers and delay lines in the system. In Figure 2.12, the feed-forward linearization architecture is presented.
CHAPTER 2. BACKGROUND

2.2 Digital predistortion

A digital PD attempts to linearize the nonlinear response of a PA over an operating region. The PD employs digital signal processing techniques, in order to predistort a baseband signal prior to modulation, up-conversion and amplification by the PA. As a result, the cascade of the digital PD and the PA responses produces the desired linear response. Figure 2.13 shows the simplified block diagram. The gain, \( G \), of the PA is modeled as a function of the magnitude of the PA input signal, \( V_p \). In this case, the function \( G \) is memory-less and nonlinear in both amplitude and phase. The use of a memoryless model, which is dependent only on the input signal magnitude, is a simplification of a typical PA’s actual response. Other variables will impact the PA response; most notably, these include the frequency and instantaneous operating temperature. Similar to \( G \), the transfer function of the PD circuit in Figure 2.13, \( F \), is designed to be a function of the digital PD input signal magnitude, \( V_i \). Thus, the cascade of the predistorter and amplifier will result in the desired linear response, when \( F(|V_i|)G(|V_p|) = k \), where \( k \) is a constant and \( V_p = V_iF(|V_i|) \). Digital PD operation is depicted in Figure 2.14, which illustrates the typical relationship between the
input and output power ($P_{in}$ and $P_{out}$) of a PA. The thick curve shows that in the absence of digital PD, the PA's $P_{out}$ versus $P_{in}$ curve is highly nonlinear. Through the introduction of a digital PD, the $P_{out}$ versus $P_{in}$ curve obtains a linear response over a large range of input power levels. The desired linear response of the PA is illustrated by the linear Output curve. The linear Output curve's slope indicates the desired linear gain for the PA.

When the amplifier is operating in compression, the $P_{out}$ versus $P_{in}$ curve falls below the linear Output curve; hence, the actual output power of the PA is not sufficient for linear operation. The inclusion of PD prior to the PA introduces expansion; the amplitude of the input signal is increased so that the desired output power (falling on the Linear Output curve) is achieved. The expansion effect of digital PD can be observed in Figure 2.14. The input power, $P_{in}$, resulting in $P_{out}$ before PD, is increased to $P_{in}$-PD; this effectively raises the PA output power to $P_{out}$-PD, which coincides with the Linear Output curve. Note that the region of the $P_{out}$ versus $P_{in}$ curve that can be linearized using digital PD is limited.

A block diagram of an adaptive memoryless digital PD system is shown in Figure 2.15. With the inclusion of PD, the digital complex baseband input signal samples are predistorted prior to the digital-to-analog converter (DAC). The adaptation algorithm determines the PD function by comparing the feedback signal with a delayed version of the input signal.
Memoryless digital predistortion

There are two main types of memoryless digital PDS: the LUT based predistorter [37, 72, 52] and the polynomial predistorter [50, 5, 7]. For the LUT based predistorter, the PD coefficients for all input values are stored in a LUT; the incoming signal is multiplied on a sample by sample basis with these coefficients. In the polynomial case, however, the characteristics of the PA and the predistorter are described by polynomial functions. The polynomial coefficients of the predistorter are adjusted to fit the PA and to result in a linear system. Theoretically, the LUT based predistorter can linearize a PA very precisely, whereas performance with a polynomial predistorter depends on the order of the polynomial. For example, in the case of adjacent and alternate channel emission suppression, at least a 5th order polynomial predistorter is required. Such a predistorter will be able to reduce the 3rd and 5th order IMD products at the PA output [50].

Predistortion with memory

For PAs with memory effects, memoryless predistortion achieves only very limited linearization performance [28, 29]. Thus, a digital predistorter is also required to have memory structures. As mentioned in Section 2.1, behavioral models of the PA with memory effects can be used for compensating the memory effects of the PA. The models that have
be considered for these predistorters include the Volterra series [29, 30], the Hammerstein model [37], and the memory polynomial model [21, 41, 28]. The Wiener and Hammerstein models only measure nonlinearity at the center frequency, with linear filters capturing the memory. However, these models cannot predict the interactions between the instantaneous tone, nor can they describe the change of shape in AM/AM and AM/PM functions dependent on tone spacing [24]. The parallel Hammerstein model, i.e. memory polynomial model, is simple compared to the general Volterra series and complex compared to the Wiener and Hammerstein models. In addition, this model compensates for the drawbacks of the Volterra, Wiener, and Hammerstein models, can quantify the memory effects in PAs, and can be applied to a linearizer design.

There are two approaches to construct digital predistorters with memory structures. The first approach, which was used in [37], identifies the PA and then finds the inverse of the PA. However, obtaining the inverse of a nonlinear system with memory is generally a difficult task. The second approach is to use an indirect learning architecture to design the predistorter directly, as adopted in [29, 21, 28]. This approach offers the advantages of eliminating the need for model assumption and parameter estimation of the power amplifier.

A block diagram of the indirect learning structure is shown in Figure 2.16. The feedback path labeled “Predistorter Training (A)” has $y(n)/G$ as its input, where $G$ is the intended power amplifier gain, and $\hat{x}(n)$ as its output. The actual predistorter is an exact copy of the feedback path (copy of A); it has $u(n)$ as its input and $x(n)$ as its output. Ideally, we would like $y(n) = Gx(n)$, which requires $x(n) = \hat{x}(n)$ and the error term $e(n) = 0$. Given
y(n) and x(n), this structure enables us to find the parameters of block A directly, yielding the predistorter. The algorithm converges when the error energy $||e(n)||^2$ is minimized.

In the training branch (refer to Figure 2.16), the memory polynomial can be described by

$$
\hat{x}(n) = \sum_{p=1}^{P} \sum_{m=0}^{M} a_{m,p} y(n-m)[y(n-m)]^{p-1},
$$

(2.20)

where $y(n)$ and $x(n)$ are the input and output of the predistorter in the training branch, respectively, and $a_{m,p}$ are the coefficients of the predistorter. Since the model in (2.20) is linear with respect to its coefficients, $a_{m,p}$ can be directly obtained using the least-squares method. First we define the new sequence

$$
u_{mp}(n) = \frac{y(n-m)}{G} \left| \frac{y(n-m)}{G} \right|^{p-1}.
$$

(2.21)

In matrix form

$$\hat{x} = Ua,
$$

(2.22)

where

$$\hat{x} = [\hat{x}(0) \cdots \hat{x}(N-1)]^T,$$

$$U = [u_{01} \cdots u_{0P} \cdots u_{M1} \cdots u_{MP}],
$$

(2.23)

$$u_{mp} = [u_{mp}(0) \cdots u_{mp}(N-1)]^T,$$

and

$$a = [a_{0,1} \cdots a_{0,P} \cdots a_{M,1} \cdots a_{M,P}]^T.$$

The least-squares solution for (2.22) is

$$\hat{a} = (U^H U)^{-1} U^H \hat{x}.
$$

(2.24)

where $(\bullet)^H$ denotes the complex conjugate transpose. The accuracy and stability of the solutions $\hat{a}$ are directly related to the numerical condition of the matrix $U^H U$. A good indication is the condition number of the matrix [51], i.e.,

$$\kappa(U^H U) = \frac{\lambda_{max}}{\lambda_{min}}.
$$

(2.25)

where $\kappa(\bullet)$ is the condition number and $\lambda_{max}$ and $\lambda_{min}$ are the largest and smallest eigenvalues of $U^H U$, respectively. The matrix $U^H U$ generally has a high condition number; this means that there is a high correlation between the columns of this matrix. There are two sources for this high correlation:
1. The nonlinear polynomials, such as \( y, y|y|, y|y|^2 \), etc., are highly correlated.

2. The data samples \( y(n) \) at different time indices are correlated.

The correlation due to the first source can be greatly reduced by using the orthogonal polynomial proposed in [63]. The correlation from the second source can be alleviated by using a special training signal, whose samples at different time indices are independent. In many cases, however, dedicated training is not feasible. Hence, the accuracy of the solution \( a \) can be improved by using higher precision floating point numbers, such as 64-bit double precision instead of 32-bit single precision.

In general, power amplifier characteristics do not change rapidly with time; changes in the power amplifier characteristics are often due to temperature drift and aging, which have very long time constants. After gathering a block of \( y(n) \) and \( x(n) \) data samples, the training branch (block A) can process the data off-line. This lowers the processing requirements of the predistortion system. Once the predistorter identification algorithm has converged, the new set of parameters is plugged into the high speed predistorter; this can be readily implemented using application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs).

### 2.3 Crest factor reduction

Due to the nature of signal generation, WCDMA, OFDM, multi-carrier GSM, and multi-carrier EDGE signals have large peak-to-average-power ratios. The PAPR values set high demands for the linearity of the power amplifier; since it is desirable for the PA to operate in its linear region, this leads to low power efficiency. The use of deliberate envelope clipping to digitally distort the signal, while still maintaining the signal quality at a sufficient level, is a simple and practical way to decrease the PAPR. Moreover, the clipping reduced PAPR gives rise to the possibility of utilizing the dynamic range of the DAC more efficiently.

#### 2.3.1 Crest factor effect on power amplifiers

The degree of the signal's envelope fluctuations is often measured by the crest factor (CF), defined as

\[
CF = 10 \log_{10} \left( \frac{\max(x^2)}{E[x^2]} \right).
\]  

(2.26)
where $x$ is a real valued up-converted bandpass signal. The CF is the ratio of the peak power to the mean power of the signal, which is the PAPR.

In power amplifiers, the output back-off (OBO) is defined as

$$ OBO = 10 \log_{10} \left( \frac{P_{sat}}{P_{sat}^2 + \left( P_{sat}^2 \right)^2} \right) $$

(2.27)

where $G$ is a nonlinear function representing the PA’s nonlinear gain response and $P_{sat}$ is the PA’s maximum output power. The OBO defines the difference (in decibels) between the average output power and the saturated output power; this is illustrated in Figure 2.17. When the PA is operating in the linear region, the OBO becomes the same as the input signal’s CF. In a real system, the AM/AM characteristic is not linear near the saturation power level. For this reason, equality between CF and OBO does not hold, however, it can be assumed that $OBO \approx CF$ [??].

The measured efficiency of the class A amplifier as a function of OBO is presented in Figure 2.18. The efficiency is measured with a constant envelope sine wave. It can be seen that a reduction in the CF leads to reduced OBO and to enhanced efficiency.

2.3.2 Crest factor effect on digital-to-analog converters

An additional advantage of the clipping is that the maximum of the clipped signal is known. This presents the possibility of scaling the clipped signal to the full dynamic range of the DACs.

Figure 2.17: Definition of the output back-off.
The relationship between the signal-to-noise ratio (SNR) and the word-length, for a quantized signal, is covered in most digital signal processing literature. For a real and quantized signal, the SNR is given by

$$SNR = 6.02(N_b - 1) + 10.8 - CF[dB],$$  \hspace{1cm} (2.28)

where $N_b$ denotes the number of bits. In (2.28), a uniformly distributed quantization error over every quantization step is presupposed. When the signal is complex, the quantization power will double, since the quantization will occur independently in both the real and imaginary signals. Consequently, the SNR will decrease by 3.01 dB, i.e.

$$SNR = 6.02N_b + 1.76 - CF[dB].$$  \hspace{1cm} (2.29)

assuming that the quantization noise is uniformly distributed over the whole frequency band, from zero to the Nyquist frequency ($f_s/2$). In order to find an equation for the adjacent channel power (ACP), the noise must be integrated over a frequency band with the same bandwidth as the signal ($W$). Therefore, we obtain

$$ACP = 6.02N_b + 1.76 - CF + 10 \log_{10} \left( \frac{f_s}{2W} \right).$$  \hspace{1cm} (2.30)
The CF can be reduced using various methods; generally, the CF is suppressed at the expense of the signal quality, i.e. reduced SNR and increased ACP. Therefore, it should not be assumed that (2.29) and (2.30) will give precise results for the real case when clipping is involved. Instead, they show the general relationship between the CF and signal quality. When the examination is confined to the quantization error and the CF is reduced, adequate performance (in terms of both the SNR and ACP) can be achieved with a lower number of bits [76].
Chapter 3

Piecewise Pre-equalized Linearization

3.1 Introduction

Recently, there has been increasing importance placed on spectral efficiency in mobile communications. Thus, the linearity and efficiency of radio frequency (RF) power amplifiers (PAs) have been critical design issues for non-constant envelope digital modulation schemes, which have high peak to average power ratio (PAPRs). RF PAs have nonlinearities, which generate amplitude modulation/amplitude modulation (AM/AM) and amplitude modulation/phase modulation (AM/PM) distortions at the output of the PA. These effects create spectral regrowth in the adjacent channels and in-band distortion, which degrades the error vector magnitude (EVM). A tradeoff exists between linearity and efficiency; power efficiency is very low when the amplifier operates in its linear region and increases as the amplifier is driven into its compression region. In order to enhance both linearity and efficiency at the same time, one of the various linearization techniques should be applied to an efficient RF PA. For example, a Doherty power amplifier (DPA) can achieve higher efficiencies than traditional PA designs, although at the expense of linearity. Various linearization techniques have been proposed in the literature, such as feedback, feed-forward and predistortion [25, 38].

The most promising linearization technique is baseband digital predistortion, which takes advantage of the recent advances in digital signal processors. Digital predistortion achieves good linearity and power efficiency, with a reduced system complexity when compared to the
widely used feed-forward linearization technique. The software implementation of the digital predistorter (PD) provides configurability suitable for the multi-standard environments. Through the combination of digital predistortion with the aforementioned efficient DPA, there is the potential to maximize system linearity and overall efficiency. However, most digital PDs presuppose that PAs have either no or weak memory [17, 48, 31]. This is impractical in wideband applications, where memory effects describe the output signal as a function of both current and past input signals. The sources of PA memory effects are self-heating of the active device (thermal memory effects) and frequency dependencies of the active device, related to the matching networks or bias circuits (electrical memory effects) [78]. As signal bandwidth increases, the PA memory effects become significant and will limit the performance of memoryless digital PDs.

Various approaches have been suggested for overcoming memory effects in digital PDs. For electrical memory effects, a Volterra filter structure was used to compensate for the memory effects, using an indirect learning algorithm. However, the number of optimization coefficients becomes increasingly large as the order increases [30]. This complexity makes the Volterra filter based PD extremely difficult to implement in real hardware. The memory polynomial structure, which is a simplified version of the Volterra filter, has been used to reduce the number of coefficients; however, a large computational load is still required [41, 28, 6, 55]. In addition, a memory polynomial based PD suffers from a numerical instability when higher order polynomial terms are included, since a matrix inversion is required for estimating the polynomial coefficients. In order to alleviate the numerical instability associated with the traditional polynomials, an alternative, yet equally complex structure based on orthogonal polynomials has been used [64]. To further reduce the complexity (at the expense of the performance), a Hammerstein predistorter has been proposed [27, 29, 37]. The Hammerstein predistorter consists of a finite impulse response (FIR) filter or a linear time invariant (LTI) system, followed by a memoryless polynomial PD. It assumes that the PA model follows a Wiener model structure, which consists of a memoryless nonlinearity followed by a FIR filter or an LTI system. This implementation restricts the compensation of the Hammerstein structure to only memory effects coming from the RF frequency response. If the RF frequency response is quite flat, the Hammerstein PD cannot correct for any other memory effects, such as bias-induced or thermal memory effects [44]. A LUT based approach which used envelope filters for the memory effects was proposed in [47]. However, no results were shown and the method was for a narrowband
CHAPTER 3. PIECEWISE PRE-EQUALIZED LINEARIZATION

pacing system: this implies that it was intended to compensate for thermal memory effects. Recently, a static LUT digital baseband PD cascaded with a sub-band filtering block was developed: the system combats gain and phase variations, which are due to PA temperature changes after an initial setting for the fixed LUT PD [35].

In this Chapter, we propose a piecewise pre-equalized LUT PD, which is a cascade of a LUT PD and piecewise pre-equalizers. This approach may be considered as an extended structure of the LUT based Hammerstein PD, which has only one equalizer. Our results show that the proposed method is superior to the Hammerstein PD. Our approach has the distinct advantage of simplicity and is easy to implement in real hardware. In Section 3.2, the proposed piecewise pre-equalized LUT PD is described. The measurement based PA behavioral model is presented in Section 3.3. The simulation results comparing a memoryless PD based on a LUT and the proposed PD with memory are given in Section 3.4. In Section 3.5, experimental results using the proposed PD are compared to alternative structures, using two WCDMA carriers in the test bed. Lastly, Section 3.6 evaluates the complexity for the proposed approach and the memory polynomial.

3.2 The proposed piecewise pre-equalized based LUT PD

Figure 3.1 illustrates the structure of the piecewise pre-equalized LUT based PD. The $N$ by $K - 1$ filter coefficients in the LUT are used to compensate for memory effects, where $N$ is the depth of the LUT and the FIR filter has $K$ taps (including $W^n_k(|u(n)|)$ which is equal to 1). Note that for an LUT based Hammerstein PD, $N$ is equal to one.

The piecewise pre-equalizers use a FIR filter rather than an infinite impulse response (IIR) filter, due to stability issues. The output of the pre-equalizers can be described by

$$z(n) = \sum_{k=0}^{K-1} W^n_k(|u(n)|)x(n-k)$$

$$= \sum_{k=0}^{K-1} W^n_k(|u(n)|)u(n-k)F_m(|u(n-k)|).$$

(3.1)

where $W^n_k(|u(n)|)$ is the $k$th tap and $m$th indexed coefficient corresponding to the magnitude of the input signal, $|u(n)|$, and $F_m$ is the memoryless LUT structure, which is a function of $|u(n-k)|$. For analytical purposes, the memoryless LUT ($F_m$) structure can be replaced by
CHAPTER 3. PIECEWISE PRE-EQUALIZED LINEARIZATION

Figure 3.1: Block diagram of the piecewise pre-equalized LUT based PD.
the following polynomial model

\[ F_m(|u(n - k)|) = \sum_{p=1}^{2p-1} b_{2p-1} |u(n - k)|^{2(p-1)}. \]  

(3.2)

where \(2p - 1\) is the polynomial order and \(b\) is a complex coefficient corresponding to the polynomial order. Moreover, note that the tap coefficients and memoryless LUT coefficients \((F_m)\) depend on \(u(n)\) and \(u(n - k)\), respectively.

Therefore, the proposed model can be expressed using a polynomial equation by

\[ z(n) = \sum_{k=0}^{K-1} \sum_{k=0}^{L} w_{k,l}(|u(n)|) \sum_{p=1}^{2p-1} b_{2p-1} u(n - k) |u(n - k)|^{2(p-1)}. \]  

(3.3)

where \(W_k^m(|u(n)|)\) is the \(k\)th tap coefficient with the \(m\)th index being a function of \(|u(n)|\). Without loss of generality, the piecewise pre-equalizers can be similarly defined using an \(l\)th order polynomial

\[ z(n) = \sum_{k=0}^{K-1} \sum_{l=1}^{L} w_{k,l}(|u(n)|) 2^{(l-1)} \sum_{p=1}^{2p-1} b_{2p-1} u(n - k) |u(n - k)|^{2(p-1)}. \]  

(3.4)

where \(w_{k,l}\) is the \(k\)th tap and \(l\)th order coefficient. Figure 3.2 illustrates the corresponding block diagram of the piecewise pre-equalized PD when polynomial equations are utilized. From Figures 3.1 and 3.2, it can easily be seen that (3.1) is equivalent to (3.4). For illustration purposes, we set \(K=2, L=2, \) and \(P=2\), in order to represent a third odd order polynomial model of a nonlinearity. we then expand (3.4) as follows:

\[ z(n) = w_{0,1}|b_1 u(n) + b_3 u(n)|u(n)|^2 \]
\[ + w_{0,3}|b_1 u(n)|u(n)|^2 + b_3 u(n)|u(n)|^4 |u(n)|^2 \]
\[ + w_{1,1}|b_1 u(n - 1) + b_3 u(n - 1)|u(n - 1)|^2 \]
\[ + w_{1,3}|b_1 u(n - 1)|u(n)|^2 + b_3 u(n - 1)|u(n - 1)|^2 |u(n)|^2 |u(n)|^2. \]  

(3.5)

The first tap coefficient \((k = 0)\) in (3.1), which is \(w_{0,1} + w_{0,3}|u(n)|^2\) in (3.5), can be considered equal to one; thus, the memoryless nonlinearity predistorter LUT \(F_m\) will not be repeated. (3.5) can then be reduced as

\[ z(n) = b_1 u(n) + b_3 u(n)|u(n)|^2 \]
\[ + w_{1,1}|b_1 u(n - 1) + b_3 u(n - 1)|u(n - 1)|^2 \]
\[ + w_{1,3}|b_1 u(n - 1)|u(n)|^2 + b_3 u(n - 1)|u(n - 1)|^2 |u(n)|^2. \]  

(3.6)
Clearly, (3.6) can be considered as the predistortion of a truncated Volterra series model for a power amplifier, which is able to compensate for nonlinearity and memory effects [85]; refer to Appendix A.

However, (3.6) is based on a polynomial representation of the proposed model, in order to demonstrate the piecewise pre-equalized LUT PD memory compensation. Similar to the Volterra series, the polynomial representation requires too many complex multiplications. The complexity is reduced when a piecewise pre-equalized LUT PD based approach, as shown in Figure 3.1, is utilized.

Figure 3.3 presents a graphical explanation of the piecewise pre-equalized LUT PD. A typical memoryless predistorter response is shown in Figure 3.3(a), while Figure 3.3(b) demonstrates the hysteresis created by the piecewise pre-equalizers divided into N pieces. Since the hysteresis of the power amplifier is not necessarily uniformly distributed over the whole input magnitude range, we propose the piecewise pre-equalizers. The cascade of 3.3(a) and 3.3(b) results in the piecewise pre-equalized LUT based PD as represented in Figure 3.3(c). Figure 3.3(d) shows the response of a typical power amplifier response and Figure 3.3(b) results in the piecewise pre-equalized LUT based PD, as represented in Figure 3.3(c). Figure 3.3(d) shows the response of a typical power amplifier response with memory. The desired linear response is achieved after Figure 3.3(c) and Figure 3.3(d) are
Figure 3.3: Piecewise pre-equalized LUT PD graphical expressions: (a) complex gain adjuster response, (b) piecewise equalizer response, (c) response of the cascaded complex gain adjuster and piecewise equalizers, (d) power amplifier response, and (e) desired response from (c) and (d).
3.3 The proposed predistorter algorithm

Cavers studied the optimal addressing of the LUT and showed that addressing via a uniform quantization achieves near optimal performance [20]. Therefore, we applied the linear magnitude addressing method for the LUT indexing as follows:

\[ m = \text{round}(|u(n)| \cdot N), \quad (3.7) \]

where \( \text{round}(\bullet) \) returns the nearest integer number, which is the index \( (m) \), and \( N \) is the LUT size. As shown in (3.1), the digital complex baseband input signal samples are multiplied prior to pre-equalization, by complex coefficients drawn from LUT entries

\[ x(n) = u(n) \cdot F_m(|u(n)|), \quad (3.8) \]

where \( F_m(|u(n)|) \) is the complex coefficient corresponding to an input signal magnitude, used for compensating AM/AM and AM/PM PA distortions.

After the digital-to-analog conversion of \( z(n) \), this signal is: a) up-converted to RF, b) amplified by the PA generating distortions, c) attenuated, d) down-converted to baseband, and e) converted from analog-to-digital and applied to the delay estimation algorithm. The feedback signal, i.e. the delayed PA output, \( y(n - \Delta) \) can be described by

\[ y(n - \Delta) = G(|z(n - \Delta)|) \cdot e^{j \Phi(|z(n - \Delta)|)} \]

where \( G(\bullet) \) and \( \Phi(\bullet) \) are AM/AM and AM/PM PA distortions, respectively, and \( \Delta \) is the feedback loop delay. For estimating \( \Delta \), a correlation technique was applied as follows:

\[ R(d) = \frac{1}{N} \sum_{n=0}^{N-1} z(n) \cdot y^*(n + d), \quad (3.10) \]

where \( d \) is the delay variable and \( N \) is the block size to correlate. After the delay estimation, the memory-less LUT coefficients can be determined using the least mean square (LMS) algorithm with indirect learning (refer to Figure 3.4) [30]

\[ F_m(|u(n + 1)|) = F_m(|u(n)|) + \mu u(n)e(n), \quad (3.11) \]

where \( n \) is the iteration number, \( \mu \) is the stability factor, and \( e(n) \) is \( x(n) - y(n)F_m(|x(n)|) \). Note that the addressing generated for (3.7) can be reused for indexing \( y(n) \), which is a
CHAPTER 3. PIECEWISE PRE-EQUALIZED LINEARIZATION

Figure 3.4: The indirect learning algorithm for the predistorter of the power amplifier.

distorted signal and can compound errors due to incorrect indexing. The samples, \( x(n) \), should bypass by the piecewise pre-equalizers during this procedure. After convergence of the indirect learning LMS algorithm, the equalizers are activated.

An indirect learning method with the LMS algorithm has also been used to adapt the piecewise filter coefficients. The input of the multiple equalizers in the feedback path, written in vector format, is

\[
y_{FI}(n) = [y_{F0}(n) \ y_{F}(n-1) \ \cdots \ y_{F}(n-K+1)],
\]

where \( y_{F}(n) \) is the post-LUT output, i.e. \( y(n)F_{m}(|y(n)|) \). Therefore, the multiple FIR filter outputs, \( y_{FO}(n) \), can be derived in vector format using the following equations:

\[
y_{FO}(n) = W^{m} \cdot y_{FI}(n)^{T},
\]

and

\[
W^{m} = [W^{m}_{0} \ W^{m}_{1} \ \cdots \ W^{m}_{K-1}],
\]

where \((\cdot)^{T}\) is the transpose operator.

Adaptation of the pre-equalizers’ tap coefficients are obtained as follows:

\[
W^{m}(\lfloor u(n+1) \rfloor) = W^{m} + \mu \cdot (y_{FI}(n)^{T})^{*} \cdot e(n).
\]

where \( e(n) \) is the error signal between \( z(n) \) and \( y_{FO}(n) \), \( \mu \) is the step size and \((\cdot)^{*}\) denotes the complex conjugate.
CHAPTER 3. PIECEWISE PRE-EQUALIZED LINEARIZATION

For the memory polynomial PD, indirect learning using a recursive least squares (RLS) algorithm was applied as described in [30]

\[ a^n = a^{n-1} + K(n) \cdot e^*(n), \]  

(3.16)

where \( a \) is the coefficient column vector, \( e(n) \) is the error signal defined by

\[ e(n) = x(n) - y(n)^T \cdot a^{(n-1)}, \]  

(3.17)

\( y(n) \) is the row vector equal to \([y(n), y(n)]^T, \ldots, y(n-1), y(n-1)]^T, \ldots, \]. \( K(n) \) is the gain vector defined by

\[ K(n) = \frac{P(n-1) \cdot y^*(n)}{\lambda \cdot y^T(n) \cdot P(n-1) \cdot y^*(n)}. \]  

(3.18)

\( \lambda \) is the forgetting factor, \(^*\) represents the complex conjugate, and \( P(n) \) is updated according to

\[ P(n) = \frac{P(n-1) - K(n) \cdot y^T(n) \cdot P(n-1)}{\lambda}. \]  

(3.19)

3.4 PA behavioral modeling

In order to simulate the performance of the proposed PD in MATLAB, behavioral modeling based on time domain measurement samples was first carried out. Among the PA models, the simplest truncated Volterra model is the diagonal Volterra model, termed memory polynomial model, in which all off-diagonal terms are zero. This condition drastically reduces the number of model parameters to be estimated. However, when the off-diagonal terms are more important than the diagonal terms, there are significant consequences, which decreases the model’s reliability. In these case, the condition should be relaxed to include near-diagonal terms.

Therefore, the selected behavioral model was based on the truncated Volterra model [85] as follows:

\[ \tilde{y}(n) = \sum_{i=0}^{K_1-1} h_1(i) \tilde{x}(n-i) + \sum_{i_1=0}^{K_2-1} \sum_{i_2=0}^{K_3-1} \sum_{i_3=0}^{K_4-1} h_3(i_1, i_2, i_3) \tilde{x}(n-i_1) \tilde{x}(n-i_2) \tilde{x}(n-i_3), \]  

(3.20)

where \( h_l(i_1, i_2, \ldots, i_l) \) is the \( l \)th order Volterra kernel and \( K_l \) represents the memory of the corresponding nonlinearity.
The model coefficients are estimated via the least square method, in order to minimize the cost function

$$F[e^2(t)] = \sum_{n} (y(n) - \hat{y}(n))^2.$$  \hspace{1cm} (3.21)

A 300Watt peak envelope power (PEP) DPA, which used two 170Watt pushpull type Motorola laterally diffused metal-oxide-semiconductors (LDMOSs) in the final stage, was designed and built, as shown in Figure 3.5 [22]. The DPA operates in the 2140 MHz band and has 61 dB of gain and 28% power added efficiency (PAE), with an average output power of 39 W. In order to construct the PA model based on measurements of the actual PA, the test bench shown in Figure 3.6 was utilized [19, 11].

For the measurements, a single downlink UMTS carrier was used as the input test signal, in order to extract the model coefficients; the test signal had 64 dedicated physical channels (DPCCHs) of Test Model 1, which had 3.84 Mbit/s and 9.8 dB of a crest factor.
The signal was uploaded to an Agilent electronic signal generator (ESG) (model 4438C) through a local area network (LAN) cable. The RF signal generated by the ESG is applied to the DPA; the output of the DPA is then fed into a single channel vector signal analyzer (VSA) (model 89641A) after attenuation. The normalization and synchronization are performed, in order to compare the complex envelope at both the input and output of the DPA.

Thirty-thousand time domain data samples, from the input and output of the DPA, were used to construct the behavioral model of the truncated Volterra series. The seventh order truncated Volterra series PA model, with a memory length of four, was constructed and extracted using in-house software implemented in MATLAB. The extracted behavioral model had -44.1 dB of normalized mean square error (NMSE), defined as

\[ \text{NMSE}[dB] = 10 \log_{10} \frac{\sum_n |y_{\text{meas}}(n) - y_{\text{mod}}|^2}{\sum_n |y_{\text{meas}}|^2}. \]  

where \( y_{\text{meas}} \) and \( y_{\text{mod}} \) are the measured and modeled data samples, respectively. Figures 3.7 and 3.8 show the time domain results for the in-phase (I) and quadrature (Q) components, respectively, of the truncated Volterra series behavioral model. The frequency domain results of the behavioral model are shown in Figure 3.9. Clearly, it can be seen that behavioral model matches the measurement data in both the time and frequency domains.

### 3.5 Simulation results

Based on the behavioral model constructed in Section 3.4, we have simulated four types of PDs: 1) a memoryless LUT PD, 2) a Hammerstein PD, 3) the proposed piecewise pre-equalized PD, and 4) a memory polynomial PD. The adjacent channel leakage ratio (ACLR) performances are compared.

The simulations for the aforementioned PDs were performed in MATLAB, based on the behavioral model for the PA. The LUT size was fixed at 128 entries for all simulations; this size is a compromise between quantization effects and memory size. First, an eight tone signal with 500 kHz spacing, 9.03 dB of PAR and 4 MHz bandwidth, which is comparable to a WCDMA signal, was used for verifying the proposed method. Figures 3.10 and 3.11 show the results before and after linearization of the LUT PD and of the LUT Hammerstein PD, respectively. The Hammerstein PD deteriorates the performance above 10 MHz and improves it within a 10 MHz bandwidth. Note that the Hammerstein PD with LUT was not able to compensate for the memory effects. If the RF frequency response in the main
Figure 3.7: In-phase signal modeling results.

Figure 3.8: Quadrature signal modeling results.
Figure 3.9: Frequency domain modeling results.

Figure 3.10: Linearization with a memory-less LUT PD.
signal path is quite flat, the Hammerstein PD is only able to correct frequency response memory effects. There is no obvious improvement for reducing spectral regrowth using the conventional Hammerstein PD. Experimental results in the next section show that the ability of the Hammerstein PD to suppress memory effect distortions is quite limited. This is in agreement with the simulation results and conclusion in [27]. In Figures 3.12 and 3.13, the performances of the proposed piecewise pre-equalized PD (with 2 taps) and the memory polynomial PD (of 5th order with 2 memory terms) are shown, respectively. The results show that the proposed PD is comparable to the memory polynomial PD in terms of ACLR performance.

A single WCDMA carrier was applied to the LUT PD, the LUT Hammerstein PD, the proposed PD, and the memory polynomial PD. Linearization results for the aforementioned PDs are given in Figure 3.11. The ACLR at a frequency offset (± 5 MHz) is evaluated and summarized in Table 3.1. The conventional Hammerstein PD did not improve any memory effect distortions over the memoryless PD. Conversely, the proposed piecewise pre-equalized PD could suppress distortions due to nonlinearities, as well as memory effects of the PA.
Figure 3.12: Linearization with the proposed piecewise pre-equalized PD.

Figure 3.13: Linearization with the memory polynomial PD.
Figure 3.14: Linearization: (a) without PD, (b) LUT PD, (c) Hammerstein PD with a 5-tap FIR filter, (d) proposed piecewise pre-equalized PD with 2 taps, and (e) memory polynomial PD of 5th order with 2 memory terms.

Table 3.1: Summary of the ACLR simulations for the different PDs.

<table>
<thead>
<tr>
<th>PD Type</th>
<th>ACPR (Lower)</th>
<th>ACPR (Upper)</th>
</tr>
</thead>
<tbody>
<tr>
<td>No PD</td>
<td>-40dB</td>
<td>-38.7dB</td>
</tr>
<tr>
<td>LUT</td>
<td>-36.4dB</td>
<td>-34.5dB</td>
</tr>
<tr>
<td>Hammerstein</td>
<td>-36.7dB</td>
<td>-34.2dB</td>
</tr>
<tr>
<td>Proposed piecewise pre-equalized PD</td>
<td>-45.1dB</td>
<td>-44.1dB</td>
</tr>
<tr>
<td>Memory Polynomial</td>
<td>-46.3dB</td>
<td>-45.1dB</td>
</tr>
</tbody>
</table>
3.6 Experimental results

After simulating and verifying the ACLR performance of the proposed PD in MATLAB (based on the behavioral PA model), an experiment was performed using the actual DPA in our test bench. The experimental set-up used to evaluate the various PD performances is similar to the test bed in Figure 3.5, however, without the switch. The transmitter prototype consists of an ESG, which has two digital to analog converters (DACs) and an RF up-converter, along with the PA. The receiver is comprised of an RF down-converter, a high speed analog to digital converter, and a digital down-converter. This receiver prototype can be constructed by a VNA. For the host DSP, a personal computer (PC) with MATLAB and Agilent’s Advanced Design System (ADS) was used for both the delay compensation and predistortion algorithm. For the measurements, two downlink WCDMA carriers were used as the input test signal, in order to verify the compensation performance of the different PDs; the test signal had 64 DPCCHs of Test Model 1, which had 3.84 Mchips/s and 9.8 dB of a crest factor. All of the PD coefficients were identified by an indirect learning algorithm, which is considered to be the inverse modeling of the PA. During the verification process, the following were used: a) a 256-entry LUT, b) a 5 tap FIR filter for Hammerstein PD, c) the proposed piecewise pre-equalized PD (with 2 taps), and d) a 5th order, 2 delay memory polynomial. The value for the number of taps was optimized using several measurements. Figure 3.15 shows the linearization performance for each of the different PDs. The ACLR calculation at the output of the prototype transmitter was performed at a frequency offset (5 MHz) and (-5 MHz) from the center frequency. The ACLR values are summarized in Table 3.2. For the transmitter with the Hammerstein PD comprised of a 5 tap FIR filter, the ACLR is approximately 1 dB better than the LUT PD on the upper ACLR (5 MHz; offset); the values are approximately the same at the lower ACLR (-5 MHz; offset). The proposed PD and the 5th order, 2 memory polynomial PD exhibited similar compensation performance: they improved the ACLR by approximately 1 dB and 6 dB more than both the Hammerstein and memoryless LUT PDs, for the lower and upper ACLR values, respectively.
Figure 3.15: Experimental results: (a) without PD, (b) LUT PD, (c) Hammerstein PD with a 5-tap FIR filter, (d) proposed piecewise pre-conditioned PD with 2 taps, and (e) memory polynomial PD of 5th order with 2 memory terms.

Table 3.2: Summary of the ACLR measurements for the different PDs.

<table>
<thead>
<tr>
<th>PD Type</th>
<th>ACLR (Lower)</th>
<th>ACLR (Upper)</th>
</tr>
</thead>
<tbody>
<tr>
<td>No PD</td>
<td>-35.5 dBc</td>
<td>-36.2 dBc</td>
</tr>
<tr>
<td>LUT</td>
<td>-45.6 dBc</td>
<td>-47.5 dBc</td>
</tr>
<tr>
<td>Hammerstein</td>
<td>-15.5 dBc</td>
<td>-18.5 dBc</td>
</tr>
<tr>
<td>Proposed piecewise pre-conditioned PD</td>
<td>-51.8 dBc</td>
<td>-53.4 dBc</td>
</tr>
<tr>
<td>Memory Polynomial</td>
<td>-52.2 dBc</td>
<td>-54.5 dBc</td>
</tr>
</tbody>
</table>
3.7 Complexity evaluation

The complexity of the digital predistortion algorithm is a crucial problem. Hence, the complexity of both the proposed and memory polynomial methods are evaluated. Note that the complexity calculations neglect the LUT readings, writings, indexing, and calculations of the square root (SQRT) of the signal magnitude. These values are neglected since the LUT indexing depends on both the methods and the variable: for example, the magnitude, logarithm, power, and SQRT operations can be implemented in different ways. Therefore, the complexity is only estimated by counting the number of additions, subtractions, and multiplications per input sample. In order to consider a real hardware implementation, complex operations are converted into real operations; memory size is also considered. For example, one complex multiplication requires two real additions and four real multiplications. If $N$ is the number of LUT entries, the memory size required is $2N$ (I&Q LUTs).

3.7.1 The piecewise pre-equalized based LUT PD

Table 3.3 gives the complexity of the proposed piecewise pre-equalized based LUT PD, for $N$ entries and $L$ coefficients per filter. If the LUT has 256 entries and the filters have 2 taps, the PD requires 26 real additions (subtractions), 32 real multiplications per sample, and a memory size of 3070. Our proposed PD requires the same number of additions and multiplications as the traditional Hammerstein PD, but requires more memory.
Table 3.4: Complexity estimation of the memory polynomial PD.

<table>
<thead>
<tr>
<th>Method</th>
<th>Operation</th>
<th>No. Operations</th>
</tr>
</thead>
<tbody>
<tr>
<td>PD output</td>
<td>ADD/SUB</td>
<td>$4O - 2$</td>
</tr>
<tr>
<td></td>
<td>MPY</td>
<td>$6O - 2(K + 1)$</td>
</tr>
<tr>
<td>Memory size</td>
<td></td>
<td>$2O$</td>
</tr>
<tr>
<td>1 Iteration</td>
<td>ADD/SUB</td>
<td>$6O^2 + 2O + 2$</td>
</tr>
<tr>
<td></td>
<td>MPY</td>
<td>$16O^2 + 12O - 4$</td>
</tr>
<tr>
<td>Memory size</td>
<td></td>
<td>$2O$</td>
</tr>
<tr>
<td>Total Operations</td>
<td>ADD/SUB</td>
<td>$6O^2 + 6O$</td>
</tr>
<tr>
<td></td>
<td>MPY</td>
<td>$16O^2 + 18O - 2K - 6$</td>
</tr>
<tr>
<td>Memory size</td>
<td></td>
<td>$4O$</td>
</tr>
</tbody>
</table>

3.7.2 The memory polynomial PD

For the memory polynomial method using an RLS indirect learning algorithm [30], the number of arithmetic operations are given in Table 3.4, where $O$ is equal to $P(K + 1)$. For example, for $P = 5$ and $K = 1$, it requires 660 real additions (subtractions), 1772 real multiplications per sample, and a memory size of 40. In addition, the memory polynomial PD needs one real division. But it can be removed depending on the algorithms. The memory polynomial PD requires around 50 times more real multiplications per sample, when compared to the proposed PD. Therefore, the proposed method offers a significant reduction in complexity. In addition, the number of real multiplications for the memory polynomial method grows as the square power of the polynomial order and memory length increase.

3.8 Conclusions

The piecewise pre-equalized LUT based digital predistortion was described, simulated, measured and compared with the different PD structures. A MATLAB based behavioral model of the PA was constructed, using time domain measurements for a 300-W PEP DPA in our test bench. The results showed that a correction capability similar to the memory polynomial PD was achieved. In addition, the proposed PD performance is superior to the conventional Hammerstein approach, which has a limited capability, when eight tones with 500 kHz tone spacing and a single WCDMA carrier (with 3.84 MHz signal bandwidth) are employed. The proposed method and the various PDs were experimentally verified using an actual
DPA in the same test bed. When two WCDMA carriers were applied, approximately 4 dB of additional correction was achieved, compared to the conventional Hammerstein PD. Moreover, the complexity of the proposed and the memory polynomial methods were estimated and compared. The proposed piecewise pre-equalized PD was found to perform equivalently to a memory polynomial PD and was significantly less complex. The effectiveness of the piecewise pre-equalized LUT approach for compensating frequency dependent memory effects was demonstrated in both simulations and measurements. In the future, the proposed method can be extended to correct long time constant memory effects, generated from the self heating of the transistors.
Chapter 4

Digital Baseband Derived RF Predistortion

4.1 Introduction

Reliable mobile communication services rely on clean and consistent transmission from base stations, under widely and rapidly changing conditions. The radio frequency (RF) power amplifiers (PA) of the wireless communication system's base station have been the most critical and costly component. This is due to the stringent requirements on spectrum emissions and transmitter power efficiency. Recent advances in digital signal processors have allowed digital baseband predistortion to be successfully utilized and meet the various specifications. Baseband predistortion requires the entire transmit path to be several times wider than the signal bandwidth, in order to compensate for the predistorted input. This wideband transmit path demands a very accurate and fast digital-to-analog converter (DAC) [17]. Moreover, as the bandwidth of the input signal increases to accommodate multiple carriers, the processor speed requirement of the baseband predistortion system becomes an issue. Figure 4.1 illustrates these concerns. The narrow up-converter bandwidth of the RF envelope digital predistorter is offset by the disadvantages of the additional components: for example, an additional analog-to-digital-converter (ADC), an accurate envelope detector, and a costly large RF delay line [82] (refer to Figure 4.2).
Figure 4.1: Digital baseband predistortion architecture.
Figure 4.2: RF envelope digital predistortion architecture.
4.2 Digital baseband derived RF predistortion architecture

The block diagram of the proposed system is shown in Figure 4.3. The predistortion function, \( F \), is generated by a vector modulator and derived at baseband, however, it is applied at RF. The input signal is indexed by an instantaneous magnitude calculation, in order to determine the proper correction coefficients from the lookup table. The DAC in the main signal path should have at least twice the signal bandwidth. A baseband digital delay is able to compensate for the difference, \( \tau_d \), between the predistorting path and the main transmit path. A delay calibration procedure is required to compensate for the delay mismatch between the two signal paths.

4.3 Delay effects and calibration

To observe the delay mismatch effects with respect to the system performance, let the RF input, \( x(t) \), consist of two tones, with a tone spacing of \( (\omega_2 - \omega_1) \) and equal amplitude \( A \) as follows:

\[
x(t) = A \cos(\omega_1 t) + A \cos(\omega_2 t)
\]

\[
= 2A \cos \left( \frac{\omega_2 - \omega_1}{2} t \right) \cos \left( \frac{\omega_2 + \omega_1}{2} t \right).
\]

(4.1)

The predistortion function, \( F \), with a delay mismatch \( (\tau_d) \) between the two paths, can be written as

\[
F(t - \tau_d) = a_1 + a_3 |\tilde{x}(t - \tau_d)|^2 + a_5 |\tilde{x}(t - \tau_d)|^4
\]

\[
= (a_1 + \frac{1}{2}a_3 + \frac{3}{8}a_5)
\]

\[
+ \frac{1}{2}(a_3 + a_5) \cos[(\omega_2 - \omega_1)t + \omega_1 \tau_d]
\]

\[
+ \frac{1}{8}a_5 \cos[2(\omega_2 - \omega_1)t + 2\omega_1 \tau_d].
\]

(4.2)

where \( \tilde{x}(t) \) is the envelope of the input signal, \( a_i \) are the complex coefficients, \( \tau_d \) is the delay mismatch, and \( A \) has been normalized to unity. Note that (4.2) demonstrates that the predistortion function requires twice the envelope frequency, in order to compensate for up to fifth order inter-modulation distortions (IMDs). The predistorted input signal, \( x_{pd}(t) \).
CHAPTER 4. DIGITAL BASEBAND DERIVED RF PREDISTORTION

Figure 4.3: Digital baseband derived RF predistortion architecture.
can then be derived as
\[ x_{pd}(t) = x(t)F(t - \tau_d) \]
\[ = b_1 \left[ \cos(\omega_1 t) + \cos(\omega_2 t) \right] \]
\[ + b_3 \left[ \cos((\omega_1 + \omega_2 - \omega_1) \tau_d) + \cos((\omega_2 - \omega_1) \tau_d) \right] \]
\[ + b_3 \cos((2\omega_1 - \omega_2) t - (\omega_2 - \omega_1) \tau_d) \]
\[ + b_5 \cos((2\omega_1 - \omega_2) t + (\omega_2 - \omega_1) \tau_d) \]
\[ + b_5 \cos((3\omega_1 - 2\omega_2) t - 2(\omega_2 - \omega_1) \tau_d) \]
\[ + b_5 \cos((3\omega_1 - 2\omega_2) t + 2(\omega_2 - \omega_1) \tau_d) \]
\[ + b_5 \cos((3\omega_2 - 2\omega_1) t - 2(\omega_2 - \omega_1) \tau_d), \tag{4.3} \]

where the \( b_i \) are complex coefficients. Clearly, the various IMDs of the predistortion function have magnitudes and phases which are dependent on \( \tau_d \). Note that the delay mismatch will cause additional memory effects.

For simplicity, we assume the delay is perfectly matched and only IMDs up to the 3rd order are considered. IMD cancellation performance (IMDc) for a 3rd order IMD (IM3) can be represented using the vector summation depicted in Figure 4.4: it can then be expressed as the following ratio
\[ IMDc = -20 \log_{10} \left( \frac{|IM3_{PD} + IM3_{PA}|}{IM3_{PA}} \right) \]
\[ = -10 \log_{10} \left( \frac{|IM3_{PD}|^2 + 1 - 2 \frac{|IM3_{PD}|}{IM3_{PA}} \cos \theta} \right). \tag{4.4} \]

where \( \theta \) denotes phase mismatch.

If perfect IM3 cancellation is assumed, i.e. \(|IM3_{PD}|\) is equal to \(|IM3_{PA}|\) and \( \theta \) is zero, \( IMDc \) goes to negative infinity. In addition, \(|IM3_{PD}|\) and \(|IM3_{PA}|\) are out-of-phase. However, due to the delay mismatch, the phase mismatch is no longer zero. In fact, it is dependent on \( \tau_d \) and the two tone spacing, as shown in (4.3). Considering only up to 3rd order polynomial functions, i.e. there are no 5th order IMD components \( (a_5 = 0) \), then the upper IM3 component phase decreases by \((\omega_2 - \omega_1) \tau_d \) and the lower IM3 component phase increases by an equal amount. This means that a phase mismatch is caused, resulting in a performance degradation at the output of the system.

Using (4.4), the influence of the IM3 cancellation performance on the phase error resulting from delay mismatch can be plotted; refer to Figure 4.5. Imposing the condition
Figure 4.4: Representation of vector summation.
CHAPTER 4. DIGITAL BASEBAND DERIVED RF PREDISTORTION

Figure 4.5: IM3 cancellation performance as a function of $\tau_d$.

Figure 4.6: IM5 cancellation performance as a function of $\tau_d$. 
that $a_3$ is not zero. IM5 cancellation performance is given in Figure 4.6. Note that if IM5 is included, IM3 cancellation performance is affected by two IM3 terms and $a_3$; more generally, adding higher orders causes IM3 to be affected by more higher order terms. However, higher order IMDs have smaller weights. Therefore, they will have little influence on IM3 cancellation performance; refer to Figure 4.5 showing the lowest bound of IM3 cancellation performance.

Figure 4.7 displays the delay calibration procedure, which matches the delays between the two paths. In step 1, the upper path delay and feedback delay, i.e., $\tau_{d1} + \tau_f$, is determined by the correlation block; step 2 decides the total lower path delay and the feedback delay, i.e., $\tau_{d2} + \tau_f$. Therefore, the delay difference between the two paths can be estimated by simple subtraction, $\tau_{d1} - \tau_{d2}$. The correlation block determines coarse delay estimates and fine (fractional) delay estimates, via Lagrange interpolators.

4.4 Experimental results

A single carrier WCDMA signal with a 9.8 dB peak-to-average power ratio (PAPR) was used on the test bench for the proposed predistortion system architecture; refer to Figure 4.8.
Figure 4.8: Test bench for the proposed predistortion system.
The test bench consists of two electronic signal generators (Agilent E4433B and E4438C), a vector modulator (Analog Devices AD8341), a Doherty power amplifier with a 300-W peak envelope power (PEP), a vector signal analyzer (Agilent VSA3064A), and a personal computer with MATLAB and advanced design system (ADS) software. The baseband in-phase (I) and quadrature (Q) outputs on the rear panel of E4438C are connected to the AD8341. The first source (E4433B) is considered as the master and the 10 MHz reference (10 MHz input). The RF input signal, \( x(t) \), and the baseband derived function, \( F \), are synchronized based on the following procedure. In the master source, a marker is placed at the beginning of the input signal file \( x(t) \); a pulse is sent on the EVENT1 output every time this marker is met. The EVENT1 output is connected to the pattern trigger input of the slave. To estimate the delay difference, a fine delay calibration was performed based on Section 4.3.

Figures 4.9 and 4.10 show the measurement results captured from the spectrum analyzer.
Figure 4.10: The captured spectrum after linearization, at 17 dBm of the average output power.
Figure 4.11: Measured spectral results for delay dependence of the proposed system: (a) without predistortion, (b) with one sample advance, (c) with one sample delay, and (d) with coarse delay match.

At 47 dBm of the average output power, the ACLR at the 5 MHz center frequency offset was approximately -30 dBc before linearization; after linearization, it was approximately -45 dBc for the new structure of the digital predistortion system.

The delay dependence performances are given in Figure 4.11. A one sample (26 ns) advance and one sample delay degrades the system performance by approximately 4 to 10 dB. In the proposed system, the delay can be perfectly matched via a digital delay; this is not possible with RF envelope digital predistortion, which utilizes an analog RF delay line.

As seen in Figure 4.11, there is an asymmetry between the upper and lower ACLR. This means that the coarse delay compensation is not sufficient to reduce the asymmetry, since the delay mismatch can cause memory effects in the system. In turn, this results in an asymmetry in the frequency domain. Therefore, we applied Lagrange interpolators to increase the minimum delay resolution (10 times), in order to match the delay in steps of 2.6 ns.

In Figure 4.12, the effect on fractional delay synchronization is shown, at 11 dBm of
Figure 4.12: Measured spectral results for the proposed system, with fractional delay compensation at 44 dBm of the average output power: (a) without predistortion, (b) with predistortion and coarse delay match, and (c) with predistortion and fractional delay match.
Figure 4.13: Measured spectral results for the different fractional delays, at 14 dBm of the average output power: (a) with coarse delay, (b) with 3 fractional delay, (c) with 4 fractional delay, and (d) with 5 fractional delay.
Figure 4.14: Measured spectral results for the proposed system, with fractional delay compensation at 46 dBm of the average output power: (a) without predistortion, (b) with predistortion and coarse delay match, and (c) with predistortion and fractional delay match.
Figure 4.15: Measured spectral results for the different fractional delays, at 46 dBm of the average output power: (a) with coarse delay, (b) with 3 fractional delay, (c) with 4 fractional delay, and (d) with 5 fractional delay.


Table 4.1: Summary of the ACLR performance of the proposed system.

<table>
<thead>
<tr>
<th>Output power</th>
<th>Method</th>
<th>ACLR (Lower)</th>
<th>ACLR (Upper)</th>
</tr>
</thead>
<tbody>
<tr>
<td>44 dBm</td>
<td>No PD</td>
<td>-36.6dBc</td>
<td>-36.7dBc</td>
</tr>
<tr>
<td>46 dBm</td>
<td>No PD</td>
<td>-34.2dBc</td>
<td>-34.7dBc</td>
</tr>
<tr>
<td>44 dBm</td>
<td>PD with coarse</td>
<td>-47.2dBc</td>
<td>-48.9dBc</td>
</tr>
<tr>
<td>46 dBm</td>
<td>PD with coarse</td>
<td>-49.2dBc</td>
<td>-45.7dBc</td>
</tr>
<tr>
<td>44 dBm</td>
<td>PD with 3 fractional</td>
<td>-51.9dBc</td>
<td>-51.0dBc</td>
</tr>
<tr>
<td>46 dBm</td>
<td>PD with 3 fractional</td>
<td>-47.9dBc</td>
<td>-48.4dBc</td>
</tr>
<tr>
<td>44 dBm</td>
<td>PD with 4 fractional</td>
<td>-51.4dBc</td>
<td>-51.5dBc</td>
</tr>
<tr>
<td>46 dBm</td>
<td>PD with 4 fractional</td>
<td>-48.6dBc</td>
<td>-48.4dBc</td>
</tr>
<tr>
<td>44 dBm</td>
<td>PD with 5 fractional</td>
<td>-52.2dBc</td>
<td>-51.1dBc</td>
</tr>
<tr>
<td>46 dBm</td>
<td>PD with 5 fractional</td>
<td>-49.6dBc</td>
<td>-47.7dBc</td>
</tr>
</tbody>
</table>

the average output power. When a coarse delay was used (b), the predistorter suppressed distortions around 11 dB for the lower ACLR and around 12 dB for the upper ACLR. With a fractional delay, the performance could be optimized; thus, approximately 4 dB and 3 dB more reduction was achieved, as shown in (b) and (c) of Figure 4.12, respectively. In addition, the asymmetry between the lower and upper ACLR was reduced, which means that the delay mismatch was minimized with the fractional delay. As seen in Figure 4.13, similar performances are achieved using different fractional delays. At 46 dBm of the average output power, the measured results are shown in Figures 4.13 and 4.14. As illustrated in the previous results, the asymmetry was reduced by the fractional delay compensation. These performances are summarized in Table 4.1.

A comparison between the three digital PD architectures is tabulated in Table 4.2. For example, if the main path DAC had 14 bits, a speed of 240 Msps would be required for the digital baseband (BB) PD; a digital BB/RF PD would require 14 bits and a 48 Msps DAC for the main path, along with two additional DACs with 8 bits and 96 Msps. The purpose of the two additional DACs is to handle the lookup table coefficients. Therefore, they aren’t required to have the same resolution as the main DAC, as is the case for the digital BB PD. This implies that the cost of the two additional DACs is low.

4.5 Conclusions

In this Chapter, we presented a predistortion architecture, which removed the wideband requirements of the up-converter and eliminated the inaccurate and costly components of the
Table 4.2: Comparison of the three digital PD architectures.

<table>
<thead>
<tr>
<th>Main Path DAC</th>
<th>Digital BB PD</th>
<th>RF Env./Digital PD</th>
<th>Digital BB/RF PD</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Wideband</td>
<td>1×Input BW</td>
<td>1×Input BW</td>
</tr>
<tr>
<td></td>
<td>(5×Input BW)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Main Path BW</td>
<td>Wideband</td>
<td>Narrowband</td>
<td>Narrowband</td>
</tr>
<tr>
<td>Upconverter</td>
<td>Same</td>
<td>Same</td>
<td>Same</td>
</tr>
<tr>
<td>LPF</td>
<td>Wideband</td>
<td>Narrowband</td>
<td>Narrowband</td>
</tr>
<tr>
<td>Env. Det.</td>
<td>No</td>
<td>Yes(inaccurate)</td>
<td>No</td>
</tr>
<tr>
<td>RF delay lines</td>
<td>No</td>
<td>Yes(loss, cost)</td>
<td>No</td>
</tr>
<tr>
<td>Vector Modulator</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>Additional DACs</td>
<td>No</td>
<td>Two(2×BW)</td>
<td>Two(2×BW)</td>
</tr>
<tr>
<td>Additional ADC</td>
<td>No</td>
<td>One</td>
<td>No</td>
</tr>
<tr>
<td>Comments</td>
<td>No additional components</td>
<td>No need for an access to baseband</td>
<td>Narrowband and accurate</td>
</tr>
</tbody>
</table>

The performance of the proposed system was verified using a 300-W Doherty Amplifier in our test bench. Experimental results demonstrated that the proposed architecture achieved an ACLR reduction comparable to that of the conventional baseband predistortion architecture.
Chapter 5

Crest Factor Reduction and Linearization

5.1 Introduction

Due to the increasing importance of spectral efficiency in mobile communications, effective modulation techniques have been used, including wideband code division multiple access (WCDMA) and orthogonal frequency division multiplexing (OFDM). These modulations have large envelope fluctuations, since the transmitted signal is generated by adding a large number of statistically independent signals. The high peak-to-average power ratio (PAPR) sets strict requirements for the linearity of the power amplifier (PA). This leads to low power efficiency, since it is desirable for the PA to operate in its linear region. The use of deliberate envelope clipping to digitally distort the signal, while still maintaining the signal quality at a sufficient level, is a simple and practical way to decrease the PAPR. Moreover, reducing the PAPR via clipping has the potential to allow for a more efficient utilization of the digital-to-analog-converter's (DAC's) dynamic range. The various PAPR techniques can be categorized into two groups: linear techniques (modulation-and-coding-dependent) and nonlinear techniques (modulation-and-coding-independent). Using linear techniques for OFDM systems does not cause signal distortion in the time domain; hence, the spectral properties are not altered [59, 13, 43]. Conversely, nonlinear techniques modify the envelope of the time domain signal and are mainly based on clipping-filtering and windowing [60, 77].
To suppress peak re-growth when filtering the clipped signal's out-of-band distortion, iterative clipping and filtering for OFDM systems have been proposed in [3] and [46]. These works suggested that iterative clipping and filtering of the clipped pulses would reduce the convergence rate to the targeted PAPR. However, repeated clipping and filtering techniques that have been implemented for OFDM systems require several iterations to converge to the desired PAPR level, implying that it is not an efficient algorithm for hardware implementation.

5.2 Crest factor reduction for wideband applications

5.2.1 A scaled peak cancellation method

Figure 5.1 illustrates the general structure of the scaled peak cancellation technique. The clipper output, \( c_n \), can be written as follows:

\[
  c_n = \begin{cases} 
    \frac{A}{|x_n|}, & \text{if } |x_n| > A; \\
    1, & \text{if } |x_n| \leq A.
  \end{cases}
\]  

where \( A \) is the clipping threshold level. The clipped pulse or peak cancellation pulse, \( p_n \), can be written as

\[
  p_n = x_n - x_n \cdot c_n.
\]

Finally the PAPR reduced signal, \( z_n \), is described by

\[
  z_n = x_{n-d} - \alpha \cdot p_f_n \\
  = x_{n-d} - \alpha \cdot p_n \cdot h_n.
\]
where $p_f_n$, $h_n$, and $\alpha$ denote the output signal of the noise shaper (shown in Figure 5.2), the impulse response of the low pass filter (LPF), and the scaler, respectively, and $()^*$ denotes the convolution operation. For multi-carrier WCDMA applications, $p_n$ should be:

a) frequency translated by $\omega_n$, b) filtered, c) frequency translated back to baseband, and d) combined. These steps are necessary, since the out-of-band emissions reside between the different carriers. Therefore, they cannot be filtered out by one LPF. Conversely, in the single carrier application, only one finite impulse response (FIR) filter is required. The coefficients for the multi-carrier and single carrier FIR filters are the same. Note that there is peak re-growth beyond the clipped signal, since $p_n$ is filtered by the noise shaper and subsequently subtracted from the delayed input signal. This has the net effect of increasing the peaks beyond that of the clipped signal.

In order to reduce the PAPR and increase the convergence rate to the desired threshold level, the repeated pulse cancellation (RPC) technique was proposed for OFDM systems [84], which is based on clipping-filtering ($\alpha = 1$).

We propose a novel algorithm for RPC in WCDMA downlink systems, called the scaled RPC (SRPC). Let $z_n$ in (5.3) be the output signal and $z_n^{(1)}$ be the output after the first
iteration. After the $i$th iteration, the resulting signal can be represented by
\[
\begin{align*}
    z_n^{(1)} &= z_n^{(0)} - \alpha^{(1)} \cdot p f_n^{(1)} \\
    z_n^{(2)} &= z_n^{(1)} - \alpha^{(2)} \cdot p f_n^{(2)} = z_n^{(0)} - \alpha^{(1)} \cdot p f_n^{(1)} - \alpha^{(2)} \cdot p f_n^{(2)} \\
    & \vdots \\
    z_n^{(i)} &= z_n^{(i-1)} - \alpha^{(i)} \cdot p f_n^{(i)} = z_n^{(0)} - \sum_{j=1}^{i} \alpha^{(j)} \cdot p f_n^{(j)}. 
\end{align*}
\] (5.4)

The scale factor, $\alpha$, at the $i$th iteration can be calculated as
\[
\alpha^{(i)} = \frac{\max(|p f_n^{(i)}|)}{\max(|p f_n^{(0)}|)}. 
\] (5.5)

According to the central limit theorem, the envelope of the input signal has a Rayleigh distribution. It may be possible to find the maximum clipping pulse magnitude numerically once the threshold level is set. This implies that it may be possible that the maximum magnitude of the filtered pulse is also determined accordingly.

The scaling factor reduces the computational load, which saves hardware resources during implementation. Numerical simulations found that two or three iterations of the SRPC are sufficient. The details are provided in Section 5.3.

### 5.2.2 Peak windowing technique

As detailed in [77], the windowing method filters the clipped output signal $c_n$ in (5.1) to the function
\[
    b_n = 1 - \sum_{k=-\infty}^{\infty} a_k \cdot w_{n-k}, 
\] (5.6)

where $w_n$ is the window function and $a_k$ is a coefficient weight. The function $b_n$ must satisfy the inequality, in order to achieve the desired clipping level
\[
    1 - \sum_{k=-\infty}^{\infty} a_k \cdot w_{n-k} \leq c_n. 
\] (5.7)

To minimize the resultant error in the time domain, the inequality must be as close to equality as possible. This is dependent on the type and length of the window. The resultant function is then multiplied by the delayed input signal. The output signal according to the peak windowing (PW) method can be defined as
\[
    y_n = \xi_{n-i} \cdot b_n. 
\] (5.8)
CHAPTER 5. CREST FACTOR REDUCTION AND LINEARIZATION

5.3 Simulation results

The 3rd Generation Partnership Project (3GPP) standard specifications state that the error vector magnitude (EVM) and adjacent channel leakage ratio (ACLR) at a 5 MHz offset should be less than 17.5 % and -45 dBc, respectively. The scrambling codes and time offsets of the time slot duration, for the multi-carrier test model 1 (TM1) of the WCDMA downlink system is summarized in Table 5.1. This table is based on 3GPP Test Specification (TS) 25.141. Section 6.1.1 of Release 6 (2002-12) [1]. Numerical simulations were performed using a TM1 signal, with 64 dedicated physical channels (DPCHs) and 614.400 input samples (one radio frame at 61.44 Msamples/sec); all samples were processed in MATLAB. Figure 5.4 shows the complementary cumulative distribution function (CCDF) before and after setting up the scrambling codes and time offsets. This signal setup (Table 5.1) demonstrates a PAPR reduction of 2.61 dB at 0.01 % of CCDF, as shown in Figure 5.4.
Figure 5.4: The CCDF plot for four WCDMA carriers, before and after the setup in Table 5.1.
CHAPTER 5. CREST FACTOR REDUCTION AND LINEARIZATION

Figure 5.5: PAPR versus EVM for RPC and PW, with four WCDMA carriers.

A low pass FIR filter with 129 taps was designed to meet out-of-band distortion specifications of -77 dBc. Figure 5.5 illustrates the PAPRs with respect to EVM for a seven-stage pulse cancellation (PC) (RPC) and PW with an 85 tap hamming window length. The solid line with diamond markers represents the performance with just clipping; this sets the lower bound on the PAPR and EVM. Clearly, a large out-of-band spectral radiation exists. The seven-stage PC compressed the PAPR by 1.3 dB more than the single stage, at an EVM of 10 %; the PW reduced the PAPR by 0.5 dB when compared to the single stage. In Figure 5.5, the PW’s performance (solid line) is comparable to that of a 2 stage RPC (dashed line with square marker), for up to an EVM of 10 %. Beyond an EVM of 10 %, the RPC performance exceeds that of the PW technique. Using the RPC technique, the PAPR can be suppressed to approximately 6 dB at a fixed 10 % of EVM; using the PW method based on a four WCDMA carrier input signal, 6.7 dB is achievable.

In Figure 5.6, the proposed method (SRPC) is shown as a function of PAPR versus EVM. Note that even a single stage of the proposed algorithm outperforms the PW technique. In addition, it only requires two iterations to obtain the same performance that the previous RPC method requires seven iterations to achieve. The proposed RPC method attains a
PAPR of 5.71 dB at 10% of EVM, after only three stages.

To the best of our knowledge, this performance is state of the art for WCDMA applications [84, 83, 73]. It should be noted that relaxing the ACLR characteristics can further reduce the PAPR.

Figure 5.7 illustrates the critical disadvantage that the PW technique has when compared with the RPC and SRPC, which degrades the ACLR. The original input signal has an ACLR of approximately -77 dBc. Also note is that the RPC and SRPC techniques deteriorate the ACLR up to approximately 2 dB, when the clipping threshold is reduced. This is due to the decrease in the average power as clipping becomes more significant. Simulations were performed for a different number of carriers and the results are tabulated in Table 5.3. For a single carrier, all three techniques indicate a similar ability in terms of EVM and PAPR. However, the PW method still allows the ACLR to be compromised, unlike the other two methods. The RPC technique requires more than five iterations, which increases its complexity; the proposed SRPC method only requires two iterations. For the three and four carrier cases, it is not possible for the PW method to achieve a PAPR of 5.5 dB, even without considering EVM and ACLR. In this case, this window significantly alters many
input samples due to the large clipping, which significantly changes the average power.

The probability density function (PDF) of the PW method is illustrated in Figure 5.8. The solid line shows the PDF of the original input signal and the dashed-line indicates the PW compressed signal. The degradation of the EVM can be explained by the difference between the two PDFs.

In Figure 5.9, the PDF of the technique based on the single PC method is plotted (dashed-line). Compared to the PW case, this approach should be more heavily clipped, since the PC method regenerates the peaks. The smaller magnitude samples are to some extent affected. As seen in Figure 5.9, the PDF difference is more significant near the clipping threshold level. However, by using the RPC algorithm, this difference can be minimized for samples with magnitude less than 1 V. This becomes clear in Figure 5.10, where the PDF is illustrated at each of the three stages for the RPC technique. The same PDF pattern is also obtained for our new approach, SRPC. Among the three techniques, the PC based methods are the most desirable choice for maximizing the system performance. Moreover, the new SRPC technique achieves better performance in only three iterations, whereas the RPC algorithm requires seven. Hence, the SRPC technique has a reduced system complexity.
CHAPTER 5. CREST FACTOR REDUCTION AND LINEARIZATION

Table 5.2: Performance for different carrier numbers.

<table>
<thead>
<tr>
<th>No. of Carriers</th>
<th>Method</th>
<th>RMS EVM %</th>
<th>7 dB</th>
<th>6 dB</th>
<th>5.5 dB</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>PW</td>
<td>4.69 %</td>
<td>8.89 %</td>
<td>12.16 %</td>
<td></td>
</tr>
<tr>
<td></td>
<td>RPC</td>
<td>4.76 %</td>
<td>9.26 %</td>
<td>12.03 %</td>
<td></td>
</tr>
<tr>
<td></td>
<td>SRPC</td>
<td>4.74 %</td>
<td>8.7 %</td>
<td>11.65 %</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>PW</td>
<td>6.46 %</td>
<td>14.44%</td>
<td>-</td>
<td></td>
</tr>
<tr>
<td></td>
<td>RPC</td>
<td>5.15 %</td>
<td>9.38 %</td>
<td>13 %</td>
<td></td>
</tr>
<tr>
<td></td>
<td>SRPC</td>
<td>4.64 %</td>
<td>8.5 %</td>
<td>11.7 %</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>PW</td>
<td>7.6 %</td>
<td>18.66%</td>
<td>-</td>
<td></td>
</tr>
<tr>
<td></td>
<td>RPC</td>
<td>5.5 %</td>
<td>9.5 %</td>
<td>13.1 %</td>
<td></td>
</tr>
<tr>
<td></td>
<td>SRPC</td>
<td>5 %</td>
<td>8.26 %</td>
<td>11.4 %</td>
<td></td>
</tr>
</tbody>
</table>

Figure 5.8: PDF of the PW for four WCDMA carriers.
CHAPTER 5. CREST FACTOR REDUCTION AND LINEARIZATION

Figure 5.9: PDF of the single PC for four WCDMA carriers.

Figure 5.10: PDF of the RPC for four WCDMA carriers.
5.4 Experimental results

### 5.4.1 A class AB power amplifier

The new PAPR reduction technique was applied to a 240-W peak envelope power (PEP) class AB laterally diffused metal oxide semiconductor (LDMOS) power amplifier, which was comprised of three stages (MHL21386, MRF21045, and MRF21240). The PA had a gain of 58 dB at 2100 MHz. The test bench included: Agilent Technologies’ Advanced Design System (ADS), Mathworks’ MATLAB, a personal computer, an arbitrary programmable signal generator (Agilent ESG E4389C), the PA, and a spectrum analyzer (Agilent E1440A); refer to Figure 5.11(a). The personal computer functions as a digital signal processor for implementing the crest factor reduction (CFR) algorithm.

Four TM1 WCDMA carriers with 64 DPCCH are processed through the PA, in order to obtain PAPR, ACLR and efficiency measurements.

The output spectrum of the PA as a function of the input signal’s PAPR is observed in Figure 5.12. The average output power is fixed at 90 W, so that ACLR characteristics can be compared. Note that the PA was optimized for an operating average output power of 20 W for two WCDMA carriers. Table 5.3 summarizes the measured results as a function of: 1) PAPR, 2) ACLR, and 3) dc current consumption at a fixed average output power of 20 W. The measurements demonstrate a 6 to 14 dB improvement in ACLR, as summarized in Table 5.3. A 2 to 6 dB ACLR asymmetry is also observed; this is a result of PA memory effects. In order to observe the SRPC method in terms of PA efficiency, the ACLR is held constant to the 12.4 dB PAPR signal (approximately -28 dBc). The results are summarized in Table 5.4. A PA efficiency boost of 6.6% occurs when the SRPC method is utilized. By fixing the ACLR to -37 dBc and -33 dBc, a 2.9% efficiency enhancement is achieved, as shown in Table 5.5.

<table>
<thead>
<tr>
<th>PAPR (dB)</th>
<th>ACLR [Lower, Upper] (dBc)</th>
<th>Current (mA)</th>
</tr>
</thead>
<tbody>
<tr>
<td>12.4</td>
<td>-29.37, -27.74</td>
<td>6.27</td>
</tr>
<tr>
<td>9.8</td>
<td>-37.35, -35.57</td>
<td>6.37</td>
</tr>
<tr>
<td>5.7</td>
<td>-33.93, -37.24</td>
<td>6.40</td>
</tr>
</tbody>
</table>

Table 5.3: Measurement results for a fixed 20 W output power.
Figure 5.11: The test bench for the CFR algorithm.
Figure 5.12: Output spectrum of the PA for four WCDMA carriers with different PAPRs.

| Table 5.4: Measurement results with a first fixed ACLR. |
|----------------|----------------|----------------|----------------|
| PAPR (dB)      | Output Power (W) | Current (A)   | Efficiency (%) |
| 12.1           | 20              | 0.27          | 11.8           |
| 9.8            | 35.7            | 8.16          | 16.2           |
| 5.7            | 46.4            | 9.32          | 18.4           |

| Table 5.5: Measurement results with a second fixed ACLR. |
|----------------|----------------|----------------|----------------|
| PAPR (dB)      | Output Power (W) | Current (A)   | Efficiency (%) |
| 9.8            | 20              | 6.47          | 11.6           |
| 5.7            | 39.2            | 7.67          | 14.5           |
5.4.2 A Doherty amplifier with digital predistortion and feed-forward linearization

Two carriers of a WCDMA TCH signal with 64 DPCH are utilized in the measurements. The average 30-W Doherty feed-forward linear power amplifier (DEFLPA) has been demonstrated to increase the power efficiency by 2.2%, when compared to the conventional class AB feed-forward linear power amplifier (FFLPA) [22]. In addition, a linearly optimized DPA has been developed, which has a better ACLR [23]. The test bench for the Doherty predistortion (PD) and the DEFLPA consists of: a personal computer for digital signal processing (DSP), an electronic signal generator (Agilent E1428C), the 300-W PEP DPA, a vector signal analyzer (Agilent VSA89601A) for capturing the feedback signals, and a spectrum analyzer (Agilent E4406A); refer to Figure 5.11(b) and (c) [12]. The predistortion algorithm applied here is based on a memory polynomial architecture (32 order with three memory terms), using indirect learning [28]. The CFR is processed on the personal computer (MATLAB), in order to generate different PAPRs: 9.8 dB (no CFR), 6.5 dB, and 5.5 dB. The DEFLPA has a gain of 60 dB at 2140 MHz.
Figure 5.14: Experimental results using the 6.5 dB PAPR signal.

Figure 5.15: Experimental results using the 5.5 dB PAPR signal.
Figure 5.13 shows the measured results for ACLR, output power, and efficiency for a 9.8 dB PAPR input signal. At an ACLR of -45 dBc, the efficiency improved to 24.5% using PD. We included 10-W of power consumption for the DSP based memory polynomial implementation [36]. The DFFLPA achieved an efficiency of 17% for the same ACLR value of -45 dBc. For an ACLR specification of -55 dBc, the Doherty PD and DFFLPA have efficiencies of approximately 15% and 12%, respectively. In Figure 5.14, for an input signal with a 6.5 dB PAPR (7.01% of EVM), the Doherty PD and DFFLPA achieved efficiencies of 27% and 19.5%, respectively, at an ACLR of -45 dBc. An important result is that for an ACLR specification of -55 dBc, the efficiencies are almost identical for the Doherty PD and DFFLPA. As the PAPR is further reduced to 5.5 dB (11.8% of EVM), as seen in Figure 5.15, the Doherty PD’s efficiency can approach 29%; the DFFLPA can achieve 21%. Once again the efficiencies of the two systems are almost identical for an ACLR specification of -55 dBc. As the input signal PAPR decreased, the measured results demonstrated that the Doherty PD required more back-off than the DFFLPA. This implies that when using the same LDMOS output device, the DFFLPA with CFR can be used to achieve a higher output power than the Doherty PD with CFR. Ultimately this results in a lower cost for the DFFLPA in comparison to the Doherty PD.

5.5 Conclusions

A new CFR algorithm, i.e. the SRPC, was proposed; the SRPC reduces system complexity and improves system performance, in terms of EVM, ACLR and PAPR. Simulation results validate that the proposed method compresses the PAPR by over 4 dB for four carrier WCDMA applications. Clipping-filtering based techniques were demonstrated to have a superior performance compared to the PW technique, when two or more stages are utilized. Experimental results showed that the SRPC technique enhanced the PA power efficiency by 2.9% to 6.6%, when a 240-W PEP class AB power amplifier was used in the test bench. Restating, the SRPC technique improved the ACLR by 6 to 14 dB, for a fixed 20-W average output power. This technique is simple to implement in hardware and is modulation and coding independent. Further efficiency enhancement is achievable through digital linearization. A performance comparison is given between the Doherty PD and DFFLPA techniques with CFR, when two WCDMA carriers are applied. The results show that for a low ACLR specification of -45 dBc, the Doherty PD with CFR achieved up to
29 % power efficiency; the DFFLPA with the CFR obtained 21 %. However, as the ACLR specification becomes more stringent, the efficiency gap between the two systems decreases. The DFFLPA system was always able to obtain a higher output power than the Doherty PD approach, which ultimately factors into the cost.
Chapter 6

An FPGA Testbed for Digital Predistortion

6.1 Introduction

The wireless communication industry has a great interest in the design of highly efficient, reliable, and low-cost power amplifiers; the high power amplifier (HPA) of the transmitter has to meet stringent performance requirements, in order to achieve the overall system specification. With wideband code division multiple access (WCDMA), the challenge becomes even greater, as linearity must be maintained over a wider bandwidth. There are a number of techniques that can be used for linearization. Digital predistortion is one such technique that can potentially meet the requirements of a WCDMA transmitter; it is more efficient and cheaper than the conventional feed-forward technique. There has been a lot of research concerning digital predistortion; however, it has mostly been in the simulation domain. Many of these systems are too complicated and complex to implement in a real hardware. Thus, implementing the digital predistortion in a field programmable gate array (FPGA) has become a primary issue. In this Chapter, an FPGA test bed for the digital predistorter is implemented using an Altera digital signal processing (DSP) development board, in order to verify the performance of the digital predistorter.
6.2 Design flow

Most DSP designs are first simulated using MATLAB, until the design specifications are met. This code is then converted into C, which can be run by a DSP, or in a hardware description language (HDL) for hardware configuration. However, by employing Mathworks’ Simulink, we can specify a design using a graphical user interface (GUI). Simulink removes many of the logical programming errors that can occur, by providing predefined intellectual property (IP) simulation blocks. Since Altera blocks can be synthesized, Simulink can generate HDL code for functional simulation in either Altera’s Quartus II or Mentor Graphics’ ModelSim. Once the functional testing is complete, the synthesis process maps the HDL code to the functional logic gates and flip-flops (FF), which constitute the FPGA. Next, the place and route software must efficiently map the gates to the Stratix DSP/FPGA hardware.
6.3 Digital predistortion test bed, using an Altera DSP development FPGA board

The test bench using the Altera DSP development FPGA board is shown in Figure 6.1. The Altera Stratix EP1S80 DSP development board includes two 12-bit 125-MHz analog-to-digital converters (ADCs), two 14-bit 165-MHz digital-to-analog converters (DACs), 2 MBytes synchronous SRAM, 64 Mbits of flash memory, on-board 80-MHz oscillator, and the Stratix device EP1S80B956. The features of the Stratix device are summarized in Table 6.1, with a top view of the board components and interfaces shown in Figure 6.2.

The test bench operates as follows. First, WCDMA samples generated in the advanced design system (ADS) software are stored in the memory of the FPGA; the lookup table (LUT) coefficients are initially set to unity, so that \( I_{\text{pre}} \) and \( Q_{\text{pre}} \) are the same as the original input samples. Secondly, the analog I and Q signals after passing through DACs are directly up-converted by the quadrature modulator (ESG4433B signal generator). The up-converted RF signal drives a Doherty PA; the attenuated PA output signal is then brought back to the vector signal analyzer (VSA), replacing the down-converter, the ADC, the digital down-converter (DDC), and memory. Thirdly, the signals stored in the FPGA and in the VSA memory are synchronized and compared to update LUT coefficients. In this test bed, the algorithm for the LUT coefficients is performed in the PC as digital signal processing; this is one iteration.
Figure 6.2: Stratix EP1S80 DSP development board components and interfaces.
6.4 Experimental results

The Doherty power amplifier (PA), with an average 30-W output power, is utilized in this digital predistortion (PD) test bench. The single WCDMA carrier, with an approximately 10-dB PAPR, is stored in the FPGA memory, in order to drive the Doherty PA. Figure 6.3 shows the adjacent channel leakage ratio (ACLR) performances for no PD (a), PD (b), and the ideal linear output (c), for an average 43-dBm output power. With the addition of the PD in the FPGA test bench, the ACLR was suppressed up to approximately 11 dBc; this performance is close to the original input signal. In Figure 6.4, at the average 15-dBm output power, ACLR performances are illustrated without PD (a), with PD (b), and for the linear output (c). After PD, there is a reduction in the output spectrum in the order of -15 dBc.
Figure 6.4: Measurement results at the average 45 dBm output power: (a) without PD, (b) with PD, and (c) with a linear output.
6.5 Conclusions

In this Chapter, a digital PD testbed was implemented in the FPGA development board. Using the DSP builder software embedded in simulink from Altera, most of the system was easily implemented in hardware without any knowledge of HDL or verilog. Simulink does not have the extensive library blocks as does the that Quartus II software. Thus, it would beneficial if the target system was implemented by both DSP builder and Quartus II.
Chapter 7

Summary and Future Research

The research presented in this dissertation was motivated by the need to enhance the efficiency and linearity of wireless transmitters. Currently, the power amplifier in the wireless transmitter is inherently nonlinear and the efficiency of the power amplifier is low.

Three methods were considered, in order to increase the efficiency and improve the linearity for wireless communication systems: 1) a lookup table based digital predistortion system, with memory effect compensation, 2) a baseband derived radio frequency (RF) digital predistortion system, and 3) a novel crest factor reduction algorithm, applied to a class AB power amplifier, a Doherty power amplifier, a Doherty feed-forward linear power amplifier, and a digitally predistorted Doherty power amplifier.

Chapter 3 presented the piecewise pre-equalizer based lookup table predistortion linearization system, which compensated for memory effects and nonlinearity. The method was computationally efficient compared to the memory polynomial predistortion system.

Chapter 4 provided the novel baseband derived RF digital predistortion system. The system exhibited the narrowband advantage of an RF envelope digital predistortion system and the accuracy of a baseband digital predistortion system.

The third contribution was the novel crest factor reduction method, which saved hardware resources. This method was applied to the class AB power amplifier, the Doherty power amplifier, the Doherty feed-forward linear power amplifier, and the digitally predistorted Doherty power amplifier.

The following points illustrate potential future research topics. Although most important envelope memory effects were compensated for in this dissertation, depending on the application, other memory effects may pose problems. For example, consider thermal effects
or the RF frequency response of the system. Subsequently, an FPGA hardware implementation of a high speed predistorter, which considers all memory effects, along with a DSP board for algorithm implementation, would be a practical research topic. Another interesting research topic is the design of a wideband adaptive biasing of the power amplifier, in order to further enhance efficiency when combined with memory effects predistortion. This may be the direction to proceed for future wireless communication systems.
Appendix A

Predistortability Analysis of the Proposed Predistortion

If the PA obeys the truncated Volterra model, it can be expressed by (3.20). In order to simplify the analysis, a truncated Volterra model of the PA can be defined as

\[
y(n) = x(n) + c_3 x(n)^3 + c_{1,3} x(n - 1)^3 + c_{1,2,3} x(n - 1)x(n)^2. \tag{A.1}
\]

If the PD follows the structure expressed in (3.4) and, without loss of generality, is reduced as much as possible, it follows that

\[
x(n) = b_1 u(n) + b_3 u(n)^3 + w_{1,1} b_3 u(n-1)^3 + w_{1,3} b_1 u(n)u(n-1)^2. \tag{A.2}
\]

The cancellation is easily observed if we replace \( b_1 \) with 1, \( b_3 \) with \( d_3 \), \( w_{1,1} b_3 \) with \( d_{1,3} \), and \( w_{1,3} \) with \( d_{1,2,3} \). After inserting \( x(n) \) into \( y(n) \) and removing the negligible high and even order terms, we arrive at

\[
y(n) = u(n) + (d_3 + c_3) u(n)^3 + (d_{1,3} + c_{1,3}) u(n-1)^3 + (d_{1,2,3} + c_{1,2,3}) u(n-1) u(n)^2. \tag{A.3}
\]

Thus, the desired output \( u(n) \) can be achieved via the PD, when \( d_3 \), \( d_{1,3} \), and \( d_{1,2,3} \) cancel out \( c_3 \), \( c_{1,3} \), and \( c_{1,2,3} \), respectively.
Bibliography


BIBLIOGRAPHY


