计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (17): 18-20.

• 博士论坛 • 上一篇    下一篇

基于最大熵模型的汉语短语间停顿识别

钱揖丽1,2,荀恩东3   

  1. 1.北京工业大学 计算机学院,北京 100022
    2.山西大学 计算机与信息技术学院,太原 030006
    3.北京语言大学 信息科学学院,北京 100083
  • 收稿日期:2008-01-25 修回日期:2008-03-12 出版日期:2008-06-11 发布日期:2008-06-11
  • 通讯作者: 钱揖丽

Phrase break identification based on maximum entropy model

QIAN Yi-li1,2,XUN En-dong3   

  1. 1.College of Computer Science and Technology,Beijing University of Technology,Beijing 100022,China
    2.College of Computer Science and Information Technology,Shanxi University,Taiyuan 030006,China
    3.College of Information Sciences,Beijing Language and Culture University,Beijing 100083,China
  • Received:2008-01-25 Revised:2008-03-12 Online:2008-06-11 Published:2008-06-11
  • Contact: QIAN Yi-li

摘要: 正确标记短语间的停顿,对提高文语转换系统合成语音的自然度起着重要作用。介绍一种采用最大熵模型从真实自然的语音流中自动识别汉语短语间停顿的方法。模型的特征集包含语音和词法两类特征,采用半自动的方式获得。首先由人工根据经验设计候选特征集,然后采用特征选择算法对候选特征进行筛选,选择更有效的特征构成最终特征集,并训练生成用于汉语短语间停顿识别的最大熵模型。3组实验的结果表明,模型能够取得比较满意的短语间停顿识别效果。

Abstract: In TTS system,it is very important to mark phrase breaks correctly for high naturalness and quality of output speech.This paper presents a maximum entropy based model for phrase break identification in Chinese sentence.The characteristics for model can be divided into two different types,acoustic characteristics and linguistic characteristics.The characteristic set is acquired through a semiautomatic method.Firstly,design spare characteristics based experience;and then it uses an automatic arithmetic to pick out effective characteristics and build final characteristic set;and then trains and builds maximum entropy model based on the set.The experiment results show that the maximum entropy model can acquire satisfactory effect.