Binaural-Projection Multichannel Wiener Filter for
Cue-Preserving Binaural Speech Enhancement

Stefan Thaleiser, Gerald Enzner

Former research in binaural speech enhancement has demonstrated a demand of binaural cue preservation beyond the requirements of noise suppression and speech quality. The binaural state-of-the-art is frequently grouped into the class of spatio-temporal optimum filters with composite cost functions dedicated to a compromise of simultaneous requirements and the class of common-gain spectral filters with exact cue preservation by construction. In this paper, we pursue spatio-temporal filtering by convex MMSE estimation constrained to strict binaural cue preservation. To this end, we rely on a frequency-domain representation of well-known interaural-level (ILD) and interaural-time differences (ITD) for setting up a complex-valued constraint. It is then demonstrated that the sought spatial filter effectively falls into the class of common-gain spectral filtering, where the gain consists of a new arrangement of two spectral weightings related to acoustic transfer function (ATF) and power-spectral density (PSD), respectively. Moreover, its equivalence to an unconstrained multiple-input/multiple-output multichannel Wiener filter (MIMO-MWF) with binaural projection onto original spatial cues is shown, hence the naming of the proposed solution as a binaural-projection multichannel Wiener filter (BP-MWF). Experimental results in terms of ILD/ITD spectral histograms and distance metrics confirm that BP-MWF meets the desire of spatial cue preservation. Regarding noise suppression and speech quality, BP-MWF turns out to improve instrumental segSNR, PESQ and STOI metrics over binaural state-of-the-art, such as the partial-noise-estimation forms of MVDR and MWF, and is competitive with the unconstrained MIMO-MWF as an upper bound. The results are finally supported by a formal listening test including various SNR, source directions, and noise types.


Audio samples:

In the following, we provide audio samples processed with our proposed binaural-projection multichannel Wiener filter (BP-MWF), as well as several comparison methods comprising the cue-preserving MMSE (CP-MMSE), the multichannel Wiener filter (MIMO-MWF), its partial-noise-estimation form (MWF-N) and the partial-noise-estimation form of the minimum variance distortionless response beamformer (MVDR-N). For unprocessed noisy speech (noisy), binaural clean speech utterances with the direction of arrival (DOA) of 0° (in front of the listener) and 45° (in front-right of the listener) were deteriorated with pink noise, babble noise and directional noise, respectively.


Pink noise:

Processing Method pink noise, -10 dB SNR, 0° DOA pink noise, 0 dB SNR, 0° DOA pink noise, 10 dB SNR, 0° DOA pink noise, 20 dB SNR, 0° DOA
noisy
MIMO-MWF
MVDR-N
MWF-N
CP-MMSE
BP-MWF

Processing Method pink noise, -10 dB SNR, 45° DOA pink noise, 0 dB SNR, 45° DOA pink noise, 10 dB SNR, 45° DOA pink noise, 20 dB SNR, 45° DOA
noisy
MIMO-MWF
MVDR-N
MWF-N
CP-MMSE
BP-MWF

Skip to babble noise or directional noise.


Babble noise:

Processing Method babble noise, -10 dB SNR, 0° DOA babble noise, 0 dB SNR, 0° DOA babble noise, 10 dB SNR, 0° DOA babble noise, 20 dB SNR, 0° DOA
noisy
MIMO-MWF
MVDR-N
MWF-N
CP-MMSE
BP-MWF

Processing Method babble noise, -10 dB SNR, 45° DOA babble noise, 0 dB SNR, 45° DOA babble noise, 10 dB SNR, 45° DOA babble noise, 20 dB SNR, 45° DOA
noisy
MIMO-MWF
MVDR-N
MWF-N
CP-MMSE
BP-MWF

Skip to pink noise or directional noise.


Directional noise:

Processing Method directional noise, -10 dB SNR, 0° DOA directional noise, 0 dB SNR, 0° DOA directional noise, 10 dB SNR, 0° DOA directional noise, 20 dB SNR, 0° DOA
noisy
MIMO-MWF
MVDR-N
MWF-N
CP-MMSE
BP-MWF

Processing Method directional noise, -10 dB SNR, 45° DOA directional noise, 0 dB SNR, 45° DOA directional noise, 10 dB SNR, 45° DOA directional noise, 20 dB SNR, 45° DOA
noisy
MIMO-MWF
MVDR-N
MWF-N
CP-MMSE
BP-MWF

Skip to pink noise or babble noise.