vlad present

8/12/2019 Vlad Present

1/15

Implementation of Linear Predictive

Coding (LPC) of Speech

213A class project

Spring 2000

Jean Franois Frigon and Vladislav Teplitsky

Implementation of LPC

Outline

Introduction to

Speech Modeling

Introduction to

Speech Modeling

Architecture

Overview

Architecture

Overview

System

Demonstration

System

Demonstration

LPC

Algorithm

LPC

Algorithm

Pitch

Detection

Pitch

Detection


2/15

Speech Modeling Non-stationary


Speech is a highly non-stationary signal

Dynamically changes over time

Changes occur very quickly

Speech Modeling Frame Blocking


Need to analyze the signal over many short segments,called frames

Apply a short-duration (usually 20-30 msec) overlappingwindow (usually Hamming) to the speech signal in orderto segment into frames

A single frame of speech is stationary perform analysis


3/15


Speech Modeling LTI Model

SourceSourceTransfer

Function

Transfer

Function RadiationRadiation sounds

LTI Model is valid for

moderately loud sounds

short speech segments frames (20 30 msec)

Speech Modeling Source (Voiced)


Sounds are either voiced or unvoiced

Voiced (e.g. all vowels) sounds are

generated by vocal cords vibrations

These vibrations are periodic in time, thus

are approximated by an impulse train

Spacing between impulses is the pitch, F0

F0

Hz


4/15

Speech Modeling Source (Unvoiced)


Unvoiced sounds (e.g. /sh/, /s/, /p/) aregenerated without vocal cords vibrations

The excitation is modeled by a White

Gaussian Noise source

Unvoiced sounds have no pitch since they

are excited by a non-periodic signal


Speech Modeling Transfer Function

Transfer function models the effects of the

vocal tract on the source signal

Transfer function is either all-pole (vowel

model) or pole-and-zero (consonant model)

Poles of the transfer function resonances

of the vocal tract - are called formants

Human auditory system is much more

sensitive to poles than to zeroes of thetransfer function


5/15



We will consider only an all-pole transfer

function of the form:

where G is the gain, p is the order (number

of poles), and ai is the pole.

p 2Bandwidth of signal (in kHz)+[2,3,4]

e.g. BW=4 kHz, then

p = 24 + [2,3,4] [10,11,12]

=

=p

i

ii aa

GzH

1

* )1)(1(

)(



Example: a 10th order transfer functionmodel:


6/15

Speech Modeling Radiation


Models how sound is radiated by the lips

Usually approximated by a digital

differentiator:

Radiation is not important for classification

of a sound

Thus, we will omit it from our

implementation

11)( = zzR


Architecture Overview

VoiceVoice SegmentationSegmentation

Pitch

Detection

Pitch

DetectionLPCLPC

Parameters:Silence

LPC Coeff.Gain

Voiced/Unvoiced

Pitch Frequency

Parameters:Silence

LPC Coeff.

Gain

Voiced/Unvoiced

Pitch FrequencyChannelChannel

LPC

Synthesizer

LPC

Synthesizer


7/15


Voice Segmentation

20 ms

30 ms

Overlap

8000 samples/sec

20 ms step size (160 samples)

30 ms window (240 samples)Process 240 samples in 20 ms


Voice Segmentation - Filtering and Windowing

z-1

-0.98

Segment

Samples

Hamming

Window

Coefficients

To Silence Detection,

LPC and Pitch

Detection

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10

)1

2cos(46.054.0)(

=

=

Nn

N

nnw


8/15


Voice Segmentation - Silence Detection

Compute R(0)Compute R(0)

Is R(0) > R(0)

for Background

Noise

Is R(0) > R(0)

for Background

NoiseYesYesNoNo

Compute LPC

and Pitch

Detection

Compute LPC

and Pitch

Detection

Silence

Period: Stop

Algorithm

and set G2=0

Silence

Period: Stop

Algorithm

and set G2=0


LPC - Motivation

Speech Difference Equation for apth order filter:

=

+=p

k

k nGuknsans1

)()()(

Want to minimize the mean-squared prediction error:

=

=p

k

k knsnsne

1

)()()(

For a single input impulse or stationary white noise,

the obtained coefficients are identical to the aks


9/15


LPC - Autocorrelation (1)

If we assume thats(n) is zero outside the interval 0

n

N-1,we then need to solve the following set of linear equations:

( ) piiRkiRp

k

k ==

1)(1

Where:

=

+=kN

m

kmsmskR1

0

)()()(


LPC - Autocorrelation (2)

In matrix form the set of linear equation can be expressed as:

=

)(

)3(

)2(

)1(

)0()3()2()1(

)3()0()1()2(

)2()1()0()1(

)1()2()1()0(

3

2

1

pR

R

R

R

RpRpRpR

pRRRR

pRRRR

pRRRR

p


10/15


LPC - Levinson-Durbin Algorithm (1)

By exploiting

Toeplitz structure of the matrix;

Particular structure of the right-hand side of the linear system

of equation

We can use the efficient Levinson-Durbin recursive procedure

to solve this particular system of equations.


LPC - Levinson-Durbin Algorithm (2)

The Levinson-Durbin recursive procedure is given by:

( ) )1(2)()1()1()(

)(

)1(

1

1

)1(

(0)

1

11for

)()(

1for

)0(E

=

=

=

=

=

=

ii

i

i

jii

i

j

i

j

ii

i

i

i

j

ij

i

EkE

k

ij

k

E

jiRiR

k

pi

R

The final solution is given by: pjp

jj = 1)(


11/15


LPC - Gain Coefficient

It can be shown that the gain coefficient is given by:

=

==p

k

nk EkRRG1

2 )()0(

WhereEn is the minimum mean squared error

prediction and is given byE(p) from Levinson-

Durbins Algorithm.

We will transmit G2.


LPC Algorithm

From Segmentation:

s(n) andR(0)

From Segmentation:

s(n) andR(0)

ComputeR(i) 1 i pComputeR(i) 1 i p

Levinson-Durbins Algorithm:

Find i 1 i pand G2

Levinson-Durbins Algorithm:

Find i 1 i pand G2

Transmit

to decoder

Transmit

to decoder


12/15

Pitch Detection - Motivation


Recall that source can be either a periodic

impulse train spaced by F0 or random noise

Autocorrelation function of a speech frame:

If x(n) is periodic in N, then R(k) is also

periodic in N

Thus, we can compute R(k) and check if its

periodic

=

+=1

0

)()()(kN

m

kmxmxkR


Pitch Detection Motivation

First we clip the frame using 3-level center

clipping function:

That is:

CL

-CL

+1

-1

otherwise

CnxifCnxif

nxC L

L

+= )()(

0

11

)]([

x(n)

C[x(n)]


13/15



Next we compute the modified

autocorrelation function:

where can have only 3

different values:

=

+=1

0

)()()(kN

m

n kmxmxkR

)()( kmxmx +

0)(0)()()(

)()(

01

1

)()(=+= +

+=

+

=+kmxormxifkmxmxif

kmxmxif

kmxmx



We dont need to compute for all

values of k (i.e. 0 k N)

Thus we only need to look in the range:80 Hz F0 350 Hz

)(kRn

20080men

350150women

F0 (Hz) maxF0 (Hz) min


14/15

Pitch Detection Algorithm

Speech

Framex(n)

Speech

Framex(n)

LPFFc= 900 Hz

LPFFc= 900 HzCL = 30% ofmax{x(n)}CL = 30% ofmax{x(n)} Clip x(n)Clip x(n)

Compute AC

Rn(k) for

Fs/350 k Fs/80

Compute AC

Rn(k) for

Fs/350 k Fs/80

Compute

R = max{Rn(k)}

Compute

R = max{Rn(k)}Compute

Rn(0)

Compute

Rn(0)

if R 30% of Rn(0) then frame is voiced,

output pitch period = k + Fs/350else frame is unvoiced,

output 0

if R 30% of Rn(0) then frame is voiced,output pitch period = k + F

s

/350

else frame is unvoiced,

output 0



LPC Synthesizer

Impulse

Train

Generator

Pitch

Period

Random

Noise

Generator

Voice/Unvoiced

Switch

2G

Time-VaryingIIR Filter

s'i


15/15

References


L. R. Rabiner and R. W. Schafer.DigitalProcessing of Speech Signals. Prentice Hall,

Englewood Cliffs, New Jersey, 1978.

Douglas OShaugnessy. Speech

Communication Human and Machine.

Addison Wesley Books, 1978.

M. M. Sondhi.New Methods of Pitch

Extraction. IEEE Trans. Audio and

Electroacoustics, Vol. AU-16, No. 2, pp.262-266, June 1968.

vlad present

Documents