Interspeech 2022论文速览:语音技术前沿研究
某中心在今年Interspeech会议上发表的40余篇论文中,自动语音识别(ASR)和文本转语音(TTS)约占一半,其余论文涵盖声学水印、自动配音、量化及公平性等多个主题。
声学水印
- 实用空中感知声学水印
Ameya Agaskar
音频分类
- 基于CNN的音频事件识别用于Prime Video内容暴力自动分类与评级
Tarun Gupta, Mayank Sharma, Kenny Qiu, Xiang Hao, Raffay Hamid - 多任务学习框架中声学事件标记对场景分类的影响
Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas
自动配音
- 面向自动配音的等时性感知神经机器翻译
Derek Tam, Surafel Melaku Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico - 离屏自动配音的韵律对齐
Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote
自动语音识别
- 流式ASR的计算成本分摊Transformer
Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant Strimel
提出一种机制,通过动态开关Transformer块组件以提高计算资源效率 - 自动语音识别的内容-上下文分解表示
David M. Chan, Shalini Ghosh - 流式语音识别的卷积增强循环神经网络传感器
Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris - 长格式对话语音自动识别的定向语音分离
Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero - 领域提示:面向ASR系统内存和计算高效的领域自适应
Saket Dingliwa, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff - 基于RNN-Transducer的语音识别模型增量学习
Deepak Baby, Pasquale D’Alterio, Valentin Mendelev - 通过模块替换实现循环神经网络传感器语音识别的知识蒸馏
Kaiqi Zhao, Hieu Duy Nguyen, Animesh Jain, Nathan Susanj, Athanasios Mouchtaris, Lokesh Gupta, Ming Zhao - ASR重评分中基于BERT的置信度模型学习排序
Ting-Wei Wu, I-FAN CHEN, Ankur Gandhe - 通过弹性权重巩固减少自动语音识别中的地理差异
Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas - RefTextLAS:用于准确阅读评估的参考文本偏置听写模型
Phani Sankar Nidadavolu, Na Xu, Nick Jutila, Ravi Teja Gadde, Aswarth Abhilash Dara, Joseph Savold, Sapan Patel, Aaron Hoff, Veerdhawal Pande, Kevin Crews, Ankur Gandhe, Ariya Rastrow, Roland Maas - 通过修剪路径嫁接增强RNN-T网格
Mirek Novak, Pavlos Papadopoulos - 使用数据增强和一致性正则化改进半监督语音识别
Ashtosh Sapru
对话系统
- 口语对话系统的上下文声学打断分类
Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff
公平性
- 语音识别公平性:性能差异发现与缓解
Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke - 说话人验证公平性的对抗重加权
Minho Jin, Chelsea J.-T. Ju, Zeya Chen, Yi Chieh Liu, Jasha Droppo, Andreas Stolcke
使用对抗网络识别说话人验证数据集中的低性能组(绿色)并调整其对训练损失的贡献(底部)
关键词检测
- 关键词检测的延迟控制
Christin Jose, Joe Wang, Grant Strimel, Mohammad Omar Khursheed, Yuriy Mishchenko, Brian Kulis
语言识别
- 歌唱语言识别的多模态策略
Wo Jae Lee, Emanuele Coviello
多设备处理
- 多设备语音处理的挑战与机遇
Gregory Ciccarelli, Jarred Barber, Arun Nair, Israel Cohen, Tao Zhang
多方语音
- 分离器-传感器-分段器:多方语音的流式识别与分段
Ilya Sklyar, Anna Piunova, Christian Osendorfer
自然语言理解
- 实体解析中ASR鲁棒性的音素嵌入
Xiaozhou Zhou, Ruying Bao, William M. Campbell
量化
- 深度模型低比特量化的压缩权重分布
Nikko Ström, Haidar Khan, Wael Hamza - 设备端语音识别8位神经网络加速器的亚8位量化感知训练
Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow
优化权重以降低量化损失的算法训练行为
信号处理
- 时钟偏移鲁棒的声学回声消除
Karim Helwani, Erfan Soltanmohammadi, Michael M. Goodwin, Arvindh Krishnaswamy - 混合生成与预测模型的实时丢包隐藏
Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy
说话人识别/验证
- 基于图的多视图融合与局部自适应:缓解家庭内混淆性以进行说话人识别
Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke
通过图传播标签,节点表示话语,加权边量化话语间相似性
口语理解
- 标签噪声下的鲁棒口语理解系统学习
Anoop Kumar, Pankaj Sharma, Aravind Illa, Sriram Venkatapathy, Subhrangshu Nandi, Pritam Varma, Anurag Dwarakanath, Aram Galstyan - 口语理解接口的联合训练
Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow
文本转语音
- 说话人相似性自动评估
Kamil Deja, Ariadna Sanchez, Julian Roth, Marius Cotescu - CopyCat2:多说话人TTS与多对多细粒度韵律迁移的单模型
Sri Karlapati, Penny Karanasou, Mateusz Lajszczak, Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman - 使用标准化流创建新声音
Piotr Biliński, Tom Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa
生成的声音(绿色)分布在训练集声音嵌入空间中(蓝色),证实方法可生成多样新声音 - 使用条件先验VAE和风格损失的跨语言风格迁移
Dino Ratcliffe, You Wang, Alex Mansbridge, Penny Karanasou, Alexis Moinet, Marius Cotescu - 端到端LPCNet:全可微分LPC估计的神经声码器
Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy - TTS中的表达性、可变和可控时长建模
Ammar Abbas, Tom Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman - GlowVC:语言无关无文本语音转换的梅尔谱解缠模型
Magdalena Proszewska, Grzegorz Beringer, Daniel Saez Trigueros, Tom Merritt, Abdelhamid Ezzerg, Roberto Barra-Chicote - L2-GEN:L2语音合成的神经音素释义方法用于发音错误诊断
Daniel Zhang, Ashwinkumar Ganesan, Sarah Campbell, Daniel Korzekwa - 低数据?没问题:通过F0条件数据增强实现低资源、语言无关的对话文本转语音
Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo Trueba - 混合匹配:多语言文本转语音训练语料组成的实证研究
Ziyao Zhang, Alessio Falai, Ariadna Sanchez, Orazio Angelini, Kayoko Yanagisawa - 简单有效的多句子TTS:表达性与连贯韵律
Peter Makarov, Ammar Abbas, Mateusz Lajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou - 统一与征服:音素特征表示如何影响多语言文本转语音
Ariadna Sanchez, Alessio Falai, Ziyao Zhang, Orazio Angelini, Kayoko Yanagisawa