Binaural-Projection MWF Audio Samples

Former research in binaural speech enhancement has demonstrated a demand of binaural cue preservation beyond the requirements of noise suppression and speech quality. The binaural state-of-the-art is frequently grouped into the class of spatio-temporal optimum filters with composite cost functions dedicated to a compromise of simultaneous requirements and the class of common-gain spectral filters with exact cue preservation by construction. In this paper, we pursue spatio-temporal filtering by convex MMSE estimation constrained to strict binaural cue preservation. To this end, we rely on a frequency-domain representation of well-known interaural-level (ILD) and interaural-time differences (ITD) for setting up a complex-valued constraint. It is then demonstrated that the sought spatial filter effectively falls into the class of common-gain spectral filtering, where the gain consists of a new arrangement of two spectral weightings related to acoustic transfer function (ATF) and power-spectral density (PSD), respectively. Moreover, its equivalence to an unconstrained multiple-input/multiple-output multichannel Wiener filter (MIMO-MWF) with binaural projection onto original spatial cues is shown, hence the naming of the proposed solution as a binaural-projection multichannel Wiener filter (BP-MWF). Experimental results in terms of ILD/ITD spectral histograms and distance metrics confirm that BP-MWF meets the desire of spatial cue preservation. Regarding noise suppression and speech quality, BP-MWF turns out to improve instrumental segSNR, PESQ and STOI metrics over binaural state-of-the-art, such as the partial-noise-estimation forms of MVDR and MWF, and is competitive with the unconstrained MIMO-MWF as an upper bound. The results are finally supported by a formal listening test including various SNR, source directions, and noise types.

Audio samples:

In the following, we provide audio samples processed with our proposed binaural-projection multichannel Wiener filter (BP-MWF), as well as several comparison methods comprising the cue-preserving MMSE (CP-MMSE), the multichannel Wiener filter (MIMO-MWF), its partial-noise-estimation form (MWF-N) and the partial-noise-estimation form of the minimum variance distortionless response beamformer (MVDR-N). For unprocessed noisy speech (noisy), binaural clean speech utterances with the direction of arrival (DOA) of 0° (in front of the listener) and 45° (in front-right of the listener) were deteriorated with pink noise, babble noise and directional noise, respectively.

Pink noise: