The IASTED International Conference on
Signal and Image Processing and Applications
SIPA 2011

June 22 – 24, 2011
Crete, Greece

TUTORIAL SESSION

Speech Modeling and Enhancement in Nonstationary Noise Environments

Prof. Israel Cohen
Technion - Israel Institute of Technology, Israel
icohen@ee.technion.ac.il

Prof. Sharon Gannot
Bar-Ilan University, Israel
gannot@eng.biu.ac.il

Duration

3 hours

Abstract

fiogf49gjkf0d
Enhancement of speech signals is of great interest in many applications ranging from speech recognition, to hearing aids and hands-free mobile communication. Although this problem has attracted significant research efforts for several decades, many aspects remain open and require further research. Among them is the single-channel nonstationary noise reduction. Traditional speech enhancement approaches usually consist of two components: noise power spectrum estimation and estimation of the desired clean speech signal. In single-channel based applications, spectral information is usually exploited for the estimation of the noise. In particular, the noise signal is assumed stationary or slowly varying, compared to the speech signal. A common approach for estimating the noise spectrum is to average the noisy measurement over periods where the speech is absent, and estimate the speech signal based on some statistical model.
The assumption of stationary noise poses a major limitation on traditional speech enhancement algorithms, making them inadequate in many nonstationary noise environments, e.g., noise originating from engines, keyboard typing, construction operations, bells, knocking, rings, hammering, etc.
In this tutorial, we focus on statistical speech enhancement methods, and describe some of the fundamental components, which constitute a speech spectral enhancement system. The proposed tutorial consists of six major parts. In the first part, we formulate the problem of spectral enhancement, address the time-frequency correlation of spectral coefficients for speech and noise signals, and present statistical models that conform with these characteristics. The second part presents diffusion maps, which is a data analysis method for structural multiscale geometric organization of raw data and dimensionality reduction. The third part addresses the problem of noise spectrum estimation in nonstationary and transient noise environments. The forth part describes useful estimators for the a priori SNR under speech presence uncertainty. We present the decision-directed approach, which is heuristically motivated, and a recursive estimation approach, which is based on the generalized autoregressive conditional heteroskedasticity (GARCH) model. The fifth part addresses the problem of speech spectral estimation under speech presence uncertainty, based on various fidelity criteria. The sixth part of the tutorial finally provides a detailed example of a speech enhancement algorithm, and demonstrates its performance in various noise-types environments.

Timeline

fiogf49gjkf0d
  1. Background (20m)
    • Formulation of the speech enhancement Problem
    • Statistical models of speech
    • Statistical model of transient noise
    • Quality measures
  2. Diffusion maps (40m)
    • Graph construction
    • Markov random walk
    • Dimensionality reduction
    • Diffusion filtering
  3. Noise spectrum estimation (40m)
    • Noise power spectrum estimation under speech presence uncertainty
    • Time-varying recursive averaging
    • Minima-controlled estimation
    • Estimation of transient noise using diffusion maps
  4. A priori SNR estimation (30m)
    • Decision-directed estimation
    • Estimation using a GARCH model
    • Relation between the GARCH model and decision-directed estimation
  5. Signal estimation (25m)
    • MMSE spectral estimation
    • MMSE log-spectral amplitude Estimation
    • Nonlocal diffusion filtering
    • Training-based estimation
  6. Speech enhancement algorithm (20m)
    • Implementation
    • Performance evaluation in various noisy environments
  7. Discussion on selection of speech enhancement techniques (5m)

Target Audience

fiogf49gjkf0d

Background Knowledge Expected of the Participants

fiogf49gjkf0d
Basics of signal processing and random signals

Presenters

Qualifications of the Instructor(s)s

Tutorial Session Portrait

fiogf49gjkf0d
Israel Cohen is an Associate Professor in the Department of Electrical Engineering at the Technion - Israel Institute of Technology. He received the B.Sc. (Summa Cum Laude), M.Sc. and Ph.D. degrees in electrical engineering from the Technion in 1990, 1993 and 1998, respectively.
From 1990 to 1998, he was a Research Scientist with RAFAEL Research Laboratories, Haifa, Israel Ministry of Defense. From 1998 to 2001, he was a Postdoctoral Research Associate with the Computer Science Department, Yale University, New Haven, CT. In 2001 he joined the Electrical Engineering Department of the Technion.
His research interests are statistical signal processing, analysis and modeling of acoustic signals, speech enhancement, noise estimation, microphone arrays, source localization, blind source separation, system identification and adaptive filtering.
He is a coeditor of the Multichannel Speech Processing section of the Springer Handbook of Speech Processing (Springer, 2008), a coauthor of Noise Reduction in Speech Processing (Springer, 2009), a coeditor of Speech Processing in Modern Communication: Challenges and Perspectives (Springer, 2010), and a general co-chair of the 2010 International Workshop on Acoustic Echo and Noise Control (IWAENC).
Dr. Cohen is a recipient of the Alexander Goldberg Prize for Excellence in Research, and the Muriel and David Jacknow award for Excellence in Teaching. He served as Associate Editor of the IEEE Transactions on Audio, Speech, and Language Processing and IEEE Signal Processing Letters, and as Guest Editor of a special issue of the EURASIP Journal on Advances in Signal Processing on Advances in Multi-microphone Speech Processing and a special issue of the EURASIP Speech Communication Journal on Speech Enhancement.

Tutorial Session Portrait

fiogf49gjkf0d
Sharon Gannot is an Associate Professor in the School of Engineering at Bar-Ilan University, Israel. He received his B.Sc. degree (summa cum laude) from the Technion – Israel Institute of Technology, in 1986 and the M.Sc. (cum laude) and Ph.D. degrees from Tel-Aviv University, Israel in 1995 and 2000, respectively, all in electrical engineering.
In the year 2001 he held a post-doctoral position at the department of Electrical Engineering (SISTA) at K.U.Leuven, Belgium. From 2002 to 2003 he held a research and teaching position at the Faculty of Electrical Engineering, Technion-Israel Institute of Technology, Haifa, Israel.
Dr. Gannot is the recipient of Bar-Ilan University outstanding lecturer award for the year 2010. He is a coeditor of the Speech Enhancement section of the Springer Handbook of Speech Processing (Springer, 2008), and a coeditor of Speech Processing in Modern Communication: Challenges and Perspectives (Springer, 2010). Dr. Gannot serves as Associate Editor of the IEEE Transactions on Audio, Speech and Language Processing, and a member of the IEEE Audio and Acoustic Signal Processing Technical Committee. He is also Associate Editor of EURASIP Journal on Advances in signal Processing, an Editor of two special issues on Multi-microphone Speech Processing of the same journal, and a guest editor of ELSEVIER Speech Communication journal. He is a member of the Technical and Steering committee of the International Workshop on Acoustic Echo and Noise Control (IWAENC) since 2005 and the general co-chair of IWAENC 2010 held in Tel-Aviv, Israel. His research interests include parameter estimation, statistical signal processing and speech processing using either single- or multimicrophone arrays.