Monday, July 2, 2018

Looking on the waves

Looking on the waves


Here is the question - a perfectly looking sound file which is transcribed with 10% accuracy. Sounds crazy, isnt it? Click on it to enlarge. No noise, no accent.



Because of that Im looking on state-of-art in channel normalization, especially for non-linear channel distortions. No good solution yet, Ive only found the description of the problem in very old paper


SOURCES OF DEGRADATION OF SPEECH RECOGNITION IN THE TELEPHONE NETWORK Pedro J. Moreno and Richard M. Stern From the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Adelaide, Australia, Vol. I., pp. 109 - 112, April, 1994.

There is CDCN normalization, few CMN improvements, RASTA and even recently invented HN normalization. CDCN is suprisingly available in Sphinxtrain but nobody uses it. Well it gives no improvement but its an interesting approach worth to document one day. The idea to collect statistics from the speech to apply it later sounds nice.

There are model-level approaches, various feature transforms, adaptations. They do not really look that attractive. Most papers now deal with channel compensation for speaker recognition, not speech recognition. I must admit the topic is too large to overview it in few weeks.

Luckily, I can also spend time looking on the waves like the one on the right. Somewhat more pleasant I would say.



visit link download