It is generally desirable in music reproduction but can be detrimental to speech-related applications. For the human listener, while the early reflections help to improve speech intelligibility, the late reflections have been shown to impair perceived speech quality. For speech processing technologies such as automatic speech recognizers, reverberation reduces accuracy and performance. Dereverberation is therefore an important research topic with interest driven by increasing availability of communication devices and consumer demand.

One approach to dereverberation computes a set of equalizing filters that are used to perform the dereverberation processing, given multichannel inputs and estimates of the acoustic impulse responses AIRs between the source signal and microphones. However, estimation errors are inevitable in practice and therefore robust channel equalizers are required. This thesis aims to develop such robust algorithms in a manner that is desirable specifically for speech dereverberation.

The framework of channel shortening is used, having been previously shown to give promising results. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio SNR , and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.

Speech Dereverberation with Context-aware Recurrent Neural Networks

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. Beamforming , Deep neural networks , Dynamic features , Feature adaptation , Robust speech recognition , Reverberation challenge , Speech enhancement.

Singular spectrum-based matrix completion for time series recovery and prediction. Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.

