Draft Andre Adrian Document: draft-aec-03.txt DFS Deutsche Flugsicherung Category: Experimental december 13th, 2004 Expires: ? Voice over Internet Acoustic Echo Cancellation Status of this Memo This document specifies an Acoustic Echo Cancellation implementation for hands-free Voice over Internet telephony and requests discussion and suggestions for improvements. Distribution of this memo is unlimited. Copyright Notice Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved. You are allowed to use this source code in any open source or closed source software you want. You are allowed to use the algorithms for a hardware solution. You are allowed to modify the source code. You are not allowed to remove the name of the author from this memo or from the source code files. You are not allowed to monopolize the source code or the algorithms behind the source code as your intellectual property. This source code is free of royalty and comes with no warranty. Abstract This document specifies an acoustic echo cancellation (AEC) for voice over IP. Because of the large latency in VoIP communication (tenths to hunderts of milliseconds), AEC is necessary. The presented implementation is based on the well-known Normalized Least Mean Square (NLMS) and Geigel Double talk detector (DTD) algorithms. To improve performance, a pre-whitening filter is used. The presented algorithm is therefore of NLMS-pw family. The NLMS-pw family is known to give good echo cancellation for moderate processing resources. This algorithm is of complexity O(3*L) with L number of taps in the NLMS filter. Table of Contents 1. INTRODUCTION 2. AEC PRINCIPLES 3. AEC algorithms 3.1. Finite Impulse Response (FIR) Highpass Filter 3.2. Geigel Double Talk Detector 3.3. Normalized Least Mean Square - Pre-Whitening Filter 4. References A. The C++ Source Code A.1 aec.h A.2 aec.cpp A.3 aec_test.cpp A.4 Compile source code A.5 Test source code 1. INTRODUCTION A hands-free telephone or full-duplex intercom system has a feedback or echo problem because the output from the loudspeaker feeds into the microphone. Several methods can be used to reduce or eliminate the problem: 1.) Reduce the overall amplification. If the system amplification is less then 1 a feedback dies away. This solution leads to poor volume. 2.) Use Acoustic Echo Suppression. Echo Suppression is realized with speech activated switches. Suppression reduces the full-duplex telephone to half-duplex. The switches can even "switch away" beginnings of words. 3.) Use Acoustic Echo Cancellation. This is realized with an adaptive or learning filter. First the filter learns from given microphone and speaker signals the acoustics. After learning, the filter can calculate an estimated microphone signal from the loudspeaker signal. This estimated mic signal is subtracted from the real mic signal. The difference signal no longer contains the loudspeaker signal - the feedback loop is broken. The Least Means Square algorithm from Widrow and Hoff is known since 1960. Unfortunately the LMS is a slow learner. The learning speed or convergence rate is controlled by a constant value. This value in the LMS can only be optimized for loud signals or for weak signals. Optimizing for loud signals produces slow convergence with weak signals. Optimizing for weak signals gives divergence with loud signals. Divergence can be defined as "the filter does not reduce the echo but does increase the echo" and is very ugly. The Normalized LMS has a constant convergence rate for loud and weak signals, the convergence rate controlling parameter is derived from the signal energy. For white noise signal, where all frequencies have the same energy, the NLMS performs good. But the human speech has more energy in low frequencies then in high frequencies. Therefore, a NLMS gives good echo cancellation for low frequencies and poor echo cancellation for high frequencies. A pre-whitening filter in front of the echo cancellation filter transforms human speech into something more "white noise" like - the energy of high frequency signals is similar to the energy of low frequency signals. The presented algorithm uses the most simple pre-whitening filter possible, a first order or one pole highpass filter with transfer frequency equal to half of the sample frequency (4kHz for the narrowband sample frequency of 8kHz). Because the pre-whitening filter is fixed, the complexity of this NLMS-pw filter is still the same as for the NLMS filter. One important point should be remembered: The AEC in your telephony device helps your telephony partner to hear no echo. Therefore AEC is an altruistic algorithm. 2. AEC PRINCIPLES The core of the acoustic echo cancellation is described in the introduction. Next to the NLMS-pw three more blocks are used: 1.) A highpass filter for the microphone signal. Telephone users are used to a frequency range between 300Hz and 3400Hz. Narrowband VoIP can give 0Hz to 4000Hz. After hearing a VoIP signal with frequencies below 300Hz testers complained about the bad quality. With a 300Hz cut-off filter sound is limited as in telephone. The highpass filter in use is a 13 taps finite impulse response (FIR) filter. FIR filter was used because of its stability. 2.) A double talk detector. The AEC filter should only learn if the signal from the microphone is determined from the loudspeaker signal only. If the local or near-end user is talking, the filter can no longer learn successful. Detection of user talking is done by comparing the volume levels of loudspeaker and microphone. This implementation uses the well-known Geigel DTD. 3.) An Acoustic Echo Suppressor (AES) or Non Linear Processor (NLP). If the Double talk detector (DTD) detects "no talking", the microphone signal gets attenuated by 6dB. This is done to suppress echo artefacts. AEC block diagram. Sin is the microphone signal, Rout and Rin is the loudspeaker signal. Sout is the echo-cancelled microphone signal: +--+ + +---+ Sin -->---|HP|--+------->(+)----+-->|NLP|--->-- Sout +--+ | /|\ | +---+ | -| | \|/ | | +---+ +----+ | |DTD|---->|NLMS|<-+ +---+ +----+ /|\ /|\ | | | | Rout -<---------+---------+-----------------<-- Rin Figure 1.) AEC block diagram 3. AEC algorithms This chapter gives the mathematical background to the source code. This document will not give derivations of the algorithms or proofs. See references for more information. 3.1. Finite Impulse Response (FIR) Highpass Filter Ambient noises are often prominent in the frequency range to 300Hz. Typical examples are fans (2800 Rpm are 46.7Hz) and hard disks (7200 Rpm are 120Hz). Second, the small loudspeakers have often a resonance frequency around 80Hz. This is a non-linearity to the echo cancellation. Third, and maybe most important, the users are used to telephone quality with a 300Hz cut-off. The FIR filter has 13 taps. That gives a group delay of 0.8ms. Because of the stability a FIR filter was used. 3.2. Geigel Double Talk Detector Talk detection can be done with a threshold for the microphone signal only. This approach is very sensitive to the threshold level. A more robust approach is to compare microphone level with loudspeaker level. The threshold in this solution will be a relative one. Because we deal with echo, it is not sufficient to compare only the actual levels, but we have to consider previous levels, too. The Geigel DTD brings these ideas in one simple formula: The last L levels (index 0 for now and index L-1 for L samples ago) from loudspeaker signal are compared to the actual microphone signal. To avoid problems with phase, the absolute values are used. Double talk is declared if: |d| >= c * max(|x[0]|, |x[1]|, .., |x[L-1]|) with |d| is the absolute level of actual microphone signal, c is a threshold value (typical value 0.5 for -6dB or 0.71 for -3dB), |x[0]| is the absolute level of actual loudspeaker signel, |x[L-1]| is the absolute level of loudspeaker signal L samples ago. See references 3, 7, 9. 3.3. Normalized Least Mean Square - Pre-Whitening Filter The NLMS-pw, NLMS and LMS are of the gradient descent-based algorithms family. The good features of gradient-descent based algorithms are simplicity and robustness. First we look at the "echo cancelling" formula, the convolution. This formula is used to subtract the (from the loudspeaker signal) estimated microphone signal from the real microphone signal. e = d - X' * W with e is the linear error signal or echo-cancelled microphone signal, d is the desired signal or the microphone signal with echo, X' is the transpose of the loudspeaker signals vector, W is the adaptive weights vector. With a matching vector W the echo cancellation can be perfect. Unfortunately, learning the vector W has limitations. The loudspeaker is not the only audio source at filter learning. Ambient sounds and noises, system internal amplifier and converter noises and non-linearities of loudspeaker and microphone have a negative impact on learning. Due to the LMS simplicity, all elements of W are updated with the same "mikro * e" term. This simple approach makes the LMS robust and only demanding moderate processing resources, but this "one term fits all" approach prevents "perfect" learning, too. The LMS algorithm has the update formula: W[n+1] = W[n] + 2*mikro*e*X[n] with W[n+1] is the new adaptive weights vector, W[n] is the previous adaptive weights vector, mikro is the step size constant or variable, e is the error signal X[n] is the loudspeaker signals vector. The constant scalar mikro becomes a variable in NLMS. This variable is calculated from the loudspeaker signals vector with: 1 mikro = ------ X' * X with X' is the transpose of the loudspeaker signals vector, X is the loudspeaker signals vector. Note: The vector dot product is a scalar. It is the sum of the element-wise multiplication of both vectors. The constant value 2 in the LMS formula changes into a stability "tuneing" constant. For stable adaptation this constant should be between 0 and 1, this NLMS-pw uses a value of 0.7. The NLMS-pw uses for the weights vector update and the calculation of mikro highpass-filtered values of e and X. The filtered values are used because the NLMS converges best with white noise signals, and human voice is not white noise. The fixed highpass filter approach used in this NLMS-pw does not increase the overall complexity. With ef = highpass(e) Xf = highpass(X) we get our NLMS-pw weights vector update formulas: 0.7 mikro = -------- Xf' * Xf W[n+1] = W[n] + mikro*ef*Xf[n] with ef is the highpass-filtered value of e, Xf is the highpass-filtered value of X, and the other values are as above. Both filters are 1. order FIR with a transfer frequency of 4000Hz. For other pre-whitening algorithms see references 6, 8, 9. For non-LMS echo cancellation algorithms see references 6 and 9. 4. References [1] B. Widrow, M. E. Hoff Jr., "Adaptive switching circuits", Western Electric Show and Convention Record, Part 4, pages 96-104, Aug. 1960 [2] B. Widrow, et al, "Stationary and Nonstationary Learning Characteristics of the LMS Adaptive Filter", Proc. of the IEEE, vol. 64 No. 8, pp. 1151-1162, Aug. 1976 [3] D.L. Duttweiler, "A twelve-channel digital echo canceller", IEEE Trans. Commun., Vol. 26, pp. 647-653, May 1978 [4] B. Widrow, S.D. Stearns, Adaptive Signal Processing, Prentice-Hall, 1985 [5] D. Messerschmitt, D. Hedberg, C. Cole, A. Haoui, P. Winship, "Digital Voice Echo Canceller with a TMS32020", Application report SPRA129, Texas Instruments, 1989 [6] R. Storn, "Echo Cancellation Techniques for Multimedia Applications - a Survey", TR-96-046, International Computer Science Institute, Berkeley, Nov. 1996 [7] J. Nikolic, "Implementing a Line Echo Canceller using the block update and NLMS algorithms on the TMS320C54x DSP", Application report SPRA188, Texas Instruments, Apr. 1997 [8] M. G. Siqueira, "Adaptive Filtering Algorithms in Acoustic Echo Cancellation and Feedback Reduction", Ph.D. thesis, University of California, Los Angeles, 1998 [9] T. Gaensler, S. L. Gay, M. M. Sondhi, J. Benesty, "Double-Talk robust fast converging algorithms for network echo cancellation", IEEE trans. on speech and audio processing, vol. 8, No. 6, Nov. 2000 [10] M. Hutson, "Acoustic Echo Cancellation using Digital Signal Processing", Bachelor of Engineering (Honours) thesis, The School of Information Technology and Electrical Engineering, The University of Queensland, Nov 2003 [11] A. Adrian, "Audio Echo Cancellation", Free Software/Open Source Telephony Summit 2004, German Unix User Group, Geilenkirchen, Germany, Jan. 16-20, 2004 Appendix A. The C++ Source Code /***************** A.1 APPENDIX aec.h *****************/ /* aec.h * * Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved. * * Acoustic Echo Cancellation NLMS-pw algorithm * * Version 1.3 filter created with www.dsptutor.freeuk.com */ #ifndef _AEC_H /* include only once */ // use double if your CPU does software-emulation of float typedef float REAL; /* dB Values */ const REAL M0dB = 1.0f; const REAL M3dB = 0.71f; const REAL M6dB = 0.50f; const REAL M9dB = 0.35f; const REAL M12dB = 0.25f; const REAL M18dB = 0.125f; const REAL M24dB = 0.063f; /* dB values for 16bit PCM */ /* MxdB_PCM = 32767 * 10 ^(x / 20) */ const REAL M10dB_PCM = 10362.0f; const REAL M20dB_PCM = 3277.0f; const REAL M25dB_PCM = 1843.0f; const REAL M30dB_PCM = 1026.0f; const REAL M35dB_PCM = 583.0f; const REAL M40dB_PCM = 328.0f; const REAL M45dB_PCM = 184.0f; const REAL M50dB_PCM = 104.0f; const REAL M55dB_PCM = 58.0f; const REAL M60dB_PCM = 33.0f; const REAL M65dB_PCM = 18.0f; const REAL M70dB_PCM = 10.0f; const REAL M75dB_PCM = 6.0f; const REAL M80dB_PCM = 3.0f; const REAL M85dB_PCM = 2.0f; const REAL M90dB_PCM = 1.0f; const REAL MAXPCM = 32767.0f; /* Design constants (Change to fine tune the algorithms */ /* The following values are for hardware AEC and studio quality * microphone */ /* maximum NLMS filter length in taps. A longer filter length gives * better Echo Cancellation, but slower convergence speed and * needs more CPU power (Order of NLMS is linear) */ #define NLMS_LEN (80*8) /* convergence speed. Range: >0 to <1 (0.2 to 0.7). Larger values give * more AEC in lower frequencies, but less AEC in higher frequencies. */ const REAL Stepsize = 0.7f; /* minimum energy in xf. Range: M70dB_PCM to M50dB_PCM. Should be equal * to microphone ambient Noise level */ const REAL Min_xf = M75dB_PCM; /* Double Talk Detector Speaker/Microphone Threshold. Range <=1 * Large value (M0dB) is good for Single-Talk Echo cancellation, * small value (M12dB) is good for Doulbe-Talk AEC */ const REAL GeigelThreshold = M6dB; /* Double Talk Detector hangover in taps. Not relevant for Single-Talk * AEC */ const int Thold = 30 * 8; /* for Non Linear Processor. Range >0 to 1. Large value (M0dB) is good * for Double-Talk, small value (M12dB) is good for Single-Talk */ const REAL NLPAttenuation = M12dB; /* Below this line there are no more design constants */ /* Exponential Smoothing or IIR Infinite Impulse Response Filter */ class IIR_HP { REAL x; public: IIR_HP() { x = 0.0f; }; REAL highpass(REAL in) { const REAL a0 = 0.01f; /* controls Transfer Frequency */ /* Highpass = Signal - Lowpass. Lowpass = Exponential Smoothing */ x += a0 * (in - x); return in - x; }; }; /* 13 taps FIR Finite Impulse Response filter * Coefficients calculated with * www.dsptutor.freeuk.com/KaiserFilterDesign/KaiserFilterDesign.html */ class FIR_HP13 { REAL z[14]; public: FIR_HP13() { memset(this, 0, sizeof(FIR_HP13)); }; REAL highpass(REAL in) { const REAL a[14] = { // Kaiser Window FIR Filter, Filter type: High pass // Passband: 300.0 - 4000.0 Hz, Order: 12 // Transition band: 100.0 Hz, Stopband attenuation: 10.0 dB -0.043183226f, -0.046636667f, -0.049576525f, -0.051936015f, -0.053661242f, -0.054712527f, 0.82598513f, -0.054712527f, -0.053661242f, -0.051936015f, -0.049576525f, -0.046636667f, -0.043183226f, 0.0f }; memmove(z+1, z, 13*sizeof(REAL)); z[0] = in; REAL sum0 = 0.0, sum1 = 0.0; int j; for (j = 0; j < 14; j+= 2) { // optimize: partial loop unrolling sum0 += a[j] * z[j]; sum1 += a[j+1] * z[j+1]; } return sum0+sum1; } }; /* Recursive single pole IIR Infinite Impulse response filter * Coefficients calculated with * http://www.dsptutor.freeuk.com/IIRFilterDesign/IIRFiltDes102.html */ class IIR1 { REAL x, y; public: IIR1() { memset(this, 0, sizeof(IIR1)); }; REAL highpass(REAL in) { // Chebyshev IIR filter, Filter type: HP // Passband: 3700 - 4000.0 Hz // Passband ripple: 1.5 dB, Order: 1 const REAL a0 = 0.105831884f; const REAL a1 = -0.105831884; const REAL b1 = 0.78833646f; REAL out = a0 * in + a1 * x + b1 * y; x = in; y = out; return out; } }; /* Recursive two pole IIR Infinite Impulse Response filter * Coefficients calculated with * http://www.dsptutor.freeuk.com/IIRFilterDesign/IIRFiltDes102.html */ class IIR2 { REAL x[2], y[2]; public: IIR2() { memset(this, 0, sizeof(IIR2)); }; REAL highpass(REAL in) { // Butterworth IIR filter, Filter type: HP // Passband: 2000 - 4000.0 Hz, Order: 2 const REAL a[] = { 0.29289323f, -0.58578646f, 0.29289323f }; const REAL b[] = { 1.3007072E-16f, 0.17157288f }; REAL out = a[0] * in + a[1] * x[0] + a[2] * x[1] - b[0] * y[0] - b[1] * y[1]; x[1] = x[0]; x[0] = in; y[1] = y[0]; y[0] = out; return out; } }; // Extention in taps to reduce mem copies #define NLMS_EXT (10*8) // block size in taps to optimize DTD calculation #define DTD_LEN 16 class AEC { // Time domain Filters IIR_HP hp00, hp1; // DC-level remove Highpass) FIR_HP13 hp0; // 300Hz cut-off Highpass IIR1 Fx, Fe; // pre-whitening Highpass for x, e // Geigel DTD (Double Talk Detector) REAL max_max_x; // max(|x[0]|, .. |x[L-1]|) int hangover; // optimize: less calculations for max() REAL max_x[NLMS_LEN / DTD_LEN]; int dtdCnt; int dtdNdx; // NLMS-pw REAL x[NLMS_LEN + NLMS_EXT]; // tap delayed loudspeaker signal REAL xf[NLMS_LEN + NLMS_EXT]; // pre-whitening tap delayed signal REAL w[NLMS_LEN]; // tap weights int j; // optimize: less memory copies int lastupdate; // optimize: iterative dotp(x,x) double dotp_xf_xf; // double to avoid loss of precision double Min_dotp_xf_xf; REAL s0avg; public: AEC(); /* Geigel Double-Talk Detector * * in d: microphone sample (PCM as REALing point value) * in x: loudspeaker sample (PCM as REALing point value) * return: 0 for no talking, 1 for talking */ int dtd(REAL d, REAL x); /* Normalized Least Mean Square Algorithm pre-whitening (NLMS-pw) * The LMS algorithm was developed by Bernard Widrow * book: Widrow/Stearns, Adaptive Signal Processing, Prentice-Hall, 1985 * * in mic: microphone sample (PCM as REALing point value) * in spk: loudspeaker sample (PCM as REALing point value) * in update: 0 for convolve only, 1 for convolve and update * return: echo cancelled microphone sample */ REAL nlms_pw(REAL mic, REAL spk, int update); /* Acoustic Echo Cancellation and Suppression of one sample * in d: microphone signal with echo * in x: loudspeaker signal * return: echo cancelled microphone signal */ int AEC::doAEC(int d, int x); float AEC::getambient() { return s0avg; }; void AEC::setambient(float Min_xf) { dotp_xf_xf = Min_dotp_xf_xf = NLMS_LEN * Min_xf * Min_xf; }; }; #define _AEC_H #endif /***************** A.2 APPENDIX aec.cpp *****************/ /* aec.cpp * * Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved. * * Acoustic Echo Cancellation NLMS-pw algorithm * * Version 1.3 filter created with www.dsptutor.freeuk.com */ #include #include #include #include #include "aec.h" /* Vector Dot Product */ REAL dotp(REAL a[], REAL b[]) { REAL sum0 = 0.0, sum1 = 0.0; int j; for (j = 0; j < NLMS_LEN; j+= 2) { // optimize: partial loop unrolling sum0 += a[j] * b[j]; sum1 += a[j+1] * b[j+1]; } return sum0+sum1; } AEC::AEC() { max_max_x = 0.0f; hangover = 0; memset(max_x, 0, sizeof(max_x)); dtdCnt = dtdNdx = 0; memset(x, 0, sizeof(x)); memset(xf, 0, sizeof(xf)); memset(w, 0, sizeof(w)); j = NLMS_EXT; lastupdate = 0; s0avg = M80dB_PCM; setambient(Min_xf); } REAL AEC::nlms_pw(REAL mic, REAL spk, int update) { REAL d = mic; // desired signal x[j] = spk; xf[j] = Fx.highpass(spk); // pre-whitening of x // calculate error value // (mic signal - estimated mic signal from spk signal) REAL e = d - dotp(w, x + j); REAL ef = Fe.highpass(e); // pre-whitening of e // optimize: iterative dotp(xf, xf) dotp_xf_xf += (xf[j]*xf[j] - xf[j+NLMS_LEN-1]*xf[j+NLMS_LEN-1]); if (update) { // calculate variable step size REAL mikro_ef = Stepsize * ef / dotp_xf_xf; // update tap weights (filter learning) int i; for (i = 0; i < NLMS_LEN; i += 2) { // optimize: partial loop unrolling w[i] += mikro_ef*xf[i+j]; w[i+1] += mikro_ef*xf[i+j+1]; } } if (--j < 0) { // optimize: decrease number of memory copies j = NLMS_EXT; memmove(x+j+1, x, (NLMS_LEN-1)*sizeof(REAL)); memmove(xf+j+1, xf, (NLMS_LEN-1)*sizeof(REAL)); } return e; } int AEC::dtd(REAL d, REAL x) { // optimized implementation of max(|x[0]|, |x[1]|, .., |x[L-1]|): // calculate max of block (DTD_LEN values) x = fabsf(x); if (x > max_x[dtdNdx]) { max_x[dtdNdx] = x; if (x > max_max_x) { max_max_x = x; } } if (++dtdCnt >= DTD_LEN) { dtdCnt = 0; // calculate max of max max_max_x = 0.0f; for (int i = 0; i < NLMS_LEN/DTD_LEN; ++i) { if (max_x[i] > max_max_x) { max_max_x = max_x[i]; } } // rotate Ndx if (++dtdNdx >= NLMS_LEN/DTD_LEN) dtdNdx = 0; max_x[dtdNdx] = 0.0f; } // The Geigel DTD algorithm with Hangover timer Thold if (fabsf(d) >= GeigelThreshold * max_max_x) { hangover = Thold; } if (hangover) --hangover; return (hangover > 0); } int AEC::doAEC(int d, int x) { REAL s0 = (REAL)d; REAL s1 = (REAL)x; // Mic Highpass Filter - to remove DC s0 = hp00.highpass(s0); // Mic Highpass Filter - telephone users are used to 300Hz cut-off s0 = hp0.highpass(s0); // ambient mic level estimation s0avg += 1e-4f*(fabsf(s0) - s0avg); // Spk Highpass Filter - to remove DC s1 = hp1.highpass(s1); // Double Talk Detector int update = !dtd(s0, s1); // Acoustic Echo Cancellation s0 = nlms_pw(s0, s1, update); // Acoustic Echo Suppression if (update) { // Non Linear Processor (NLP): attenuate low volumes s0 *= NLPAttenuation; } // Saturation if (s0 > MAXPCM) { return (int)MAXPCM; } else if (s0 < -MAXPCM) { return (int)-MAXPCM; } else { return (int)roundf(s0); } } /***************** A.3 APPENDIX aec_test.cpp *****************/ /* aec_test.cpp * * Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved. * * Test stub for Acoustic Echo Cancellation NLMS-pw algorithm * Author: Andre Adrian, DFS Deutsche Flugsicherung * * * compile c++ -O2 -o aec_test aec_test.cpp aec.cpp -lm * * Version 1.3 set/get ambient in dB */ #include #include #include #include #include "aec.h" #define TAPS (80*8) typedef signed short MONO; typedef struct { signed short l; signed short r; } STEREO; float dB2q(float dB) { /* Dezibel to Ratio */ return powf(10.0f, dB / 20.0f); } float q2dB(float q) { /* Ratio to Dezibel */ return 20.0f * log10f(q); } /* Read a raw audio file (8KHz sample frequency, 16bit PCM, stereo) * from stdin, echo cancel it and write it to stdout */ int main(int argc, char *argv[]) { STEREO inbuf[TAPS], outbuf[TAPS]; fprintf(stderr, "usage: aec_test [ambient in dB] out.raw\n"); AEC aec; if (argc >= 2) { aec.setambient(MAXPCM*dB2q(atof(argv[1]))); } int taps; while (taps = fread(inbuf, sizeof(STEREO), TAPS, stdin)) { int i; for (i = 0; i < taps; ++i) { int s0 = inbuf[i].l; /* left channel microphone */ int s1 = inbuf[i].r; /* right channel speaker */ /* and do NLMS */ s0 = aec.doAEC(s0, s1); /* copy back */ outbuf[i].l = 0; /* left channel silence */ outbuf[i].r = s0; /* right channel echo cancelled mic */ } fwrite(outbuf, sizeof(STEREO), taps, stdout); } float ambient = aec.getambient(); float ambientdB = q2dB(ambient / 32767.0f); fprintf(stderr, "Ambient = %2.0f dB\n", ambientdB); fflush(NULL); return 0; } /***************** A.4 APPENDIX Compile source code *****************/ On a Linux system with GNU C++ compiler enter: g++ aec_test.cpp aec.cpp -o aec_test -lm /***************** A.5 APPENDIX Test source code *****************/ The microphone and loudspeaker signals have to be synchronized on a sample-to-sample basis to make acoustic echo cancellation working. An AC97 conformal on-board soundcard in a Personal Computer can be set in a special stereo mode: The left channnel records microphone signal and the right channel reports loudspeaker signal. To set-up a Linux PC with ALSA sound system, microphone connected to Mic in and loudspeaker connected to right Line out enter: amixer -q set 'Master',0 50% unmute amixer -q set 'PCM',0 80% unmute amixer -q set 'Line',0 0% mute amixer -q set 'CD',0 0% mute amixer -q set 'Mic',0 0% mute amixer -q set 'Video',0 0% mute amixer -q set 'Phone',0 0% mute amixer -q set 'PC Speaker',0 0% mute amixer -q set 'Aux',0 0% mute amixer -q set 'Capture',0 50%,0% amixer -q set 'Mic Boost (+20dB)',0 1 amixer -q cset iface=MIXER,name='Capture Source' 0,5 amixer -q cset iface=MIXER,name='Capture Switch' 1 To test the acoustic echo cancellation we simulate a real telephone conversation in 5 steps: (1) record far-end speaker, (2) perform acoustic echo cancellation (this should change nothing) (3) playback far-end speaker and at the same time record near-end spk. (4) perform acoustic echo cancellation (5) playback near-end speaker (far-end speech should be cancelled) To record 10 seconds of speech into the file b.raw enter: arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b.raw To perform AEC at the far-end enter: ./aec_test b1.raw To playback file b1.raw and simultaneously record b2.raw enter both commands in one go: aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b1.raw & arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b2.raw To perform AEC at the near-end enter: ./aec_test b3.raw To playback the echo-cancelled near-end enter: aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b3.raw