1杭州职业技术学院,浙江省杭州市,310020;
2美欣达集团有限公司,浙江省湖州市,313000;
摘要:本文讨论了适老化语音识别面临的独特挑战,如老年人多使用方言,以及老年人语音特征的变化对数据需求的影响;综述了近年来自动语音识别技术的主要发展,包括传统的高斯混合模型-隐马尔可夫模型、基于深度神经网络的混合模型,以及端到端方法如CTC-注意力机制和Transformer架构的进展,及其在中国方言语音识别中的应用。
关键词:适老化语音识别;卷积神经网络;端到端模型
参考文献
[1]Li, H., B. Ma, and K.A. Lee, Spoken language recognition: from fundamentals to practice. Proceedings of the IEEE, 2013. 101(5): p. 1136-1159.
[2]Zissman, M.A. Automatic language identification using Gaussian mixture and hidden Markov models. in 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing. 1993. IEEE.
[3]Reynolds, D.A., T.F. Quatieri, and R.B. Dunn, Speaker verification using adapted Gaussian mixture models. Digital signal processing, 2000. 10(1-3): p. 19-41.
[3]Burget, L., P. Matejka, and J. Cernocky. Discriminative training techniques for acoustic language identification. in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2006. IEEE.
[4]Tipping, M.E. and C.M. Bishop, Mixtures of probabilistic principal component analyzers. Neural computation, 1999. 11(2): p. 443-482.
作者简介:袁稳沉,1991.06,男,汉,浙江,博士,教师,讲师,人工智能、计算力学,杭州职业技术学院。