International journal of audiology

Matrix sentence intelligibility prediction using an automatic speech recognition system.

PMID 26383042


The feasibility of predicting the outcome of the German matrix sentence test for different types of stationary background noise using an automatic speech recognition (ASR) system was studied. Speech reception thresholds (SRT) of 50% intelligibility were predicted in seven noise conditions. The ASR system used Mel-frequency cepstral coefficients as a front-end and employed whole-word Hidden Markov models on the back-end side. The ASR system was trained and tested with noisy matrix sentences on a broad range of signal-to-noise ratios. The ASR-based predictions were compared to data from the literature ( Hochmuth et al, 2015 ) obtained with 10 native German listeners with normal hearing and predictions of the speech intelligibility index (SII). The ASR-based predictions showed a high and significant correlation (R² = 0.95, p < 0.001) with the empirical data across different noise conditions, outperforming the SII-based predictions which showed no correlation with the empirical data (R² = 0.00, p = 0.987). The SRTs for the German matrix test for listeners with normal hearing in different stationary noise conditions could well be predicted based on the acoustical properties of the speech and noise signals. Minimum assumptions were made about human speech processing already incorporated in a reference-free ordinary ASR system.