The Ninth IASTED International Conference on
Signal Processing, Pattern Recognition and Applications
SPPRA 2012

June 18 – 20, 2012
Crete, Greece

TUTORIAL SESSION

Adaptive Sinusoidal Modeling of Speech and Audio with Applications to Voice Function Assessment

Yannis Stylianou
University of Crete, Greece
yannis@csd.uoc.gr

Duration

3 Hours

Abstract

fiogf49gjkf0d
Sinusoidal models are frequently used in speech and audio signal processing. For speech, they have found applications in speech synthesis and speech modifications, speech coding, speech enhancement, voice quality assessment, etc. All these models, assume that speech, and audio in general, is locally stationary during the estimation of their parameters. Speech, however, is not stationary and especially during fast transitions between phonemes and voicing states it can be considered as highly non-stationary. Same observations can be made for music signals and other audio signals as well. To address the non-stationary nature of audio signals, novel models have recently been suggested where the audio signal is projected in a set of non-parametric basis functions which are adaptive to the local characteristics of the input signal. A central theme of all the new and old speech and audio models is that of frequency estimation.
In the suggested tutorial, I will address the frequency estimation problem in the context of speech and audio signal processing and more specifically I will make a focus on the passage from the non-adaptive and stationary speech and audio models to new adaptive and non-stationary representations. Then, I will present applications and the potential for using these new representations on the analysis of speech making a focus on the analysis of pathologic speech.
It is suggested that the tutorial is split into two parts. In the first part of the tutorial, novel algorithms for the adaptive audio analysis will be presented and how they are related to the well known sinusoidal representation as well as to non-linear frequency estimators like the Newton-Gauss. The second part will be dedicated to applications like tremor estimation, estimation of jitter and shimmer through a mathematical – sinusoidal based – description, objective evaluation of spasmodic dysphonia and vocal fatigue.

Objectives

fiogf49gjkf0d
The main objectives of this tutorial is to present to students, researchers, and engineers working in the field of signal, speech and audio processing, the recent developments in sinusoidal models, in frequency estimation, in non-linear speech and audio signal processing. In terms of applications, to show how these novel algorithms of signal processing can be applied in the domain of voice function assessment and pathologic voices.

Timeline

fiogf49gjkf0d
Part I: Adaptive and Non-stationary Sinusoidal Speech Modeling (2hs)
* Stationary Sinusoidal Modeling of Speech and Audio
* Frequency Estimation in Audio Signals:
*One/multi tone estimation (non adaptive)
* Linear approaches: FFT-based
* Non-linear approaches: Newton-Gauss, Prony-based, …
* Estimations in Noise (SNR, Cramer-Rao Bound)
*Adaptive to the signal frequency estimators
*Parametric approaches
*Non-parametric approaches
*Adaptive and Non-stationary Modeling of Speech
Part II: Applications (1h)
* Voice Function Assessment
* Jitter and Shimmer in the sinusoidal modeling
*Spectral Jitter
* Estimation of Tremor
*Spasmodic dysphonia

Background Knowledge Expected of the Participants

fiogf49gjkf0d
The main background necessary for following the tutorial is knowledge of the main signal processing theory. Necessary information about the nature of speech and audio signals, as well as the main definitions used in the tutorial, will be provided in advance.

Qualifications of the Instructor(s)

fiogf49gjkf0d
Yannis Stylianou is Professor at University of Crete, Department of Computer Science, CSD UOC, Associated Researcher in the Signal Processing Laboratory of the Institute of Computer Science ICS at FORTH and visiting Professor at AHOLAB, University of the Basque Country, in Bilbao, Spain (2011-2012). He received the Diploma of Electrical Engineering from the National Technical University, N.T.U.A., of Athens in 1991 and the M.Sc. and Ph.D. degrees in Signal Processing from the Ecole National Superieure des Telecommunications, ENST, Paris, France in 1992 and 1996, respectively. From 1996 until 2001 he was with AT&T Labs Research (Murray Hill and Florham Park, NJ, USA) as a Senior Technical Staff Member. In 2001 he joined Bell-Labs Lucent Technologies, in Murray Hill, NJ, USA (now Alcatel-Lucent). Since 2002 he is with the Computer Science Department at the University of Crete and the Institute of Computer Science at FORTH.
His current research focuses on speech signal processing algorithms for speech analysis, statistical signal processing (detection and estimation), and time-series analysis/modelling. He has (co-)authored more than 100 scientific publications, and 9 US patents, which have received more than 1600 citations (excluding self-citations) with H-index=20. He co-edited the book on “Progress in Non Linear Speech Processing”, Springer-Verlag, 2007 and at Interspeech 2007, he gave a tutorial on Voice Conversion. He is co-organizer of the IEEE Signal Processing Society Winter School on Speech and Audio Processing for Immersive Environments and Future Interfaces (16-20 January 2012, Heraklion, Crete, Greece - http://www.s3p-saie.eu/ ). He has been the P.I. and scientific director of several European and Greek research programs and has been participating as leader in USA research programs.
Among other projects, he is currently P.I. of the FP7-FET-OPEN project LISTA: “The Listening Talker”, where the goal is to develop scientific foundations for spoken language technologies based on human communicative strategies. In LISTA, he is charged of speech modelling and speech modifications in order to suggest novel techniques for spoken output generation of artificial and natural speech.
He has created a lab for voice function assessment equipped with high quality instruments for speech and voice recordings (i.e., high-speed camera) for the purpose of basic research in speech and voice, as well for services, in collaboration with the Medical School at the University of Crete.
He is on the Board of the International Speech Communication Association (ISCA), and of the IEEE Multimedia Communications Technical Committee. He was member of the IEEE Speech and Language Technical Committee. He is on the Editorial Board of the Digital Signal Processing Journal of Elsevier, of Journal of Electrical and Computer Engineering, Hindawi JECE, Associate Editor of the EURASIP Journal on Speech, Audio, and Music Processing, ASMP, and of the EURASIP Research Letters in Signal Processing, RLSP. He was Associate Editor for the IEEE Signal Processing Letters, Vice-Chair of the Cost Action 2103: "Advanced Voice Function Assessment", VOICE, and on the Management Committee for the COST Action 277: "Nonlinear Speech Processing".