说话人识别中基于聚类特征的矢量量化技术

计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (27): 196-198.

说话人识别中基于聚类特征的矢量量化技术

徐利敏,唐振民,何可可,钱博

南京理工大学计算机科学与技术学院,南京 210094

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-09-21 发布日期:2007-09-21
通讯作者: 徐利敏

Vector quantization technology based on clustering features in speaker recognition

XU Li-min,TANG Zhen-min,HE Ke-ke,QIAN Bo

School of Computer,Nanjing University of Science and Technology,Nanjing 210094,China

Received:1900-01-01 Revised:1900-01-01 Online:2007-09-21 Published:2007-09-21
Contact: XU Li-min

摘要/Abstract

摘要： 为解决采用矢量量化的方法进行说话人识别时出现的失真问题,根据汉语语音的发音特性,提出了将矢量量化与语音特征的聚类技术相结合的方法,在进行矢量量化码书训练之前,先对特征矢量进行聚类筛选。实验结果表明,当测试语音片段长度为4 s时,在保持95％左右识别率下,采用普通矢量量化方法需64码本数,而采用该文方法只需8码本数,降低了8倍。结果说明该方法不但在一定程度上解决了因训练样本不足而引起的失真问题,而且通过方法的改进,实现了采用较低码字数产生较好的识别结果,从而提高识别效率。

关键词: 说话人识别, 矢量量化, 聚类特征, Mel频率倒谱系数

Abstract: In this paper,in order to solve the problem of distortion in speaker recognition with vector quantization,we propose a method in which we apply speaker feature based on speech clustering to vector quantization in speaker recognition.Before codebook training,the training samples of speakers would be clustered and filtrated.The experiment showed that it could reduce the number of codebook from 64 with simple vector quantization to 8 with VQ based on clustering features.The result showed：on the one hand,with the approach,the problem of distortion because of the lack of training samples would be solved to a certain extent,on the other hand,better recognition results would be acquired in lower number of codebook with the approach.In other word,the efficiency of speaker recognition is to be increased.

Key words: speaker recognition, vector quantization, clustering features, MFCC

徐利敏,唐振民,何可可,钱博. 说话人识别中基于聚类特征的矢量量化技术[J]. 计算机工程与应用, 2007, 43(27): 196-198.

XU Li-min,TANG Zhen-min,HE Ke-ke,QIAN Bo. Vector quantization technology based on clustering features in speaker recognition[J]. Computer Engineering and Applications, 2007, 43(27): 196-198.

[1]	闫晓燊，高强，朱思萌，奚学程，赵万生. 亮度不均匀低质量图像中压印字符分割方法[J]. 计算机工程与应用, 2021, 57(8): 185-191.
[2]	曾春艳，马超峰，王志锋，朱栋梁，赵楠，王娟，刘聪. 深度学习框架下说话人识别研究综述[J]. 计算机工程与应用, 2020, 56(7): 8-16.
[3]	王娇1，罗四维2，邹琪2. 图像分类中基于分类矢量量化的视觉词袋模型[J]. 计算机工程与应用, 2019, 55(10): 141-145.
[4]	王昕，张洪冉. 基于DNN处理的鲁棒性I-Vector说话人识别算法[J]. 计算机工程与应用, 2018, 54(22): 167-172.
[5]	徐利敏1，魏翔2. Android平台说话人认证系统的并行计算与设计[J]. 计算机工程与应用, 2017, 53(3): 231-236.
[6]	张小恒1，2，谢文宾2，李勇明2. 多类型语音特征进化选择算法[J]. 计算机工程与应用, 2016, 52(14): 150-155.
[7]	罗剑，杨印根，雷震春. 加权成对约束度量学习在说话人识别中的应用[J]. 计算机工程与应用, 2016, 52(11): 158-163.
[8]	庄严，于凤芹. 结合节拍语义和MFCC声学特征的音乐流派分类[J]. 计算机工程与应用, 2015, 51(3): 197-201.
[9]	田明锐1，胡永彪1，金守峰2. 结合聚类参数的圆投影模板匹配改进算法[J]. 计算机工程与应用, 2015, 51(21): 177-184.
[10]	欧阳桢，李应. 基于萤火虫算法的匹配追踪用于生态声音辨识[J]. 计算机工程与应用, 2015, 51(2): 198-204.
[11]	沈燕，肖仲喆，李冰洁，周孝进，周强，陶智. 采用GW-MFCC模型空间参数的语音情感识别[J]. 计算机工程与应用, 2015, 51(10): 219-222.
[12]	苏鹏，程健. DHMM在机械设备音频识别中的应用[J]. 计算机工程与应用, 2015, 51(1): 266-270.
[13]	胡政权，曾毓敏，宗原，李梦超. 说话人识别中MFCC参数提取的改进[J]. 计算机工程与应用, 2014, 50(7): 217-220.
[14]	陈拥军，徐罡，周兴付，赵慧. 无线传感器网络最少数量通信节点定位方法[J]. 计算机工程与应用, 2014, 50(5): 55-59.
[15]	王熙1，李应2. 多频带谱减法用于生态环境声音分类[J]. 计算机工程与应用, 2014, 50(3): 190-193.