Meaningful string discovery algorithm for chapter-novel corpora

doi:10.3778/j.issn.1002-8331.2010.04.041

Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (4): 129-131.DOI: 10.3778/j.issn.1002-8331.2010.04.041

• 数据库、信号与信息处理 • Previous Articles Next Articles

Meaningful string discovery algorithm for chapter-novel corpora

LI Hai-tao，MA Zhen-hua，SHEN Wen-hua

Department of Computer Engineering，Taizhou Vocational & Technical College，Taizhou，Zhejiang 318000，China

Received:2008-08-27 Revised:2008-11-17 Online:2010-02-01 Published:2010-02-01
Contact: LI Hai-tao

章回小说的有意义串发现算法

李海涛，马振华，沈文华

台州职业技术学院计算机工程系，浙江台州 318000

通讯作者: 李海涛

Abstract

Abstract: Available meaningful string discovery algorithms are geared to mining frequent meaningful strings of large-scale corpus.As for small corpus，or less-frequent meaningful strings，their performance is poor.According to the distribution pattern of meaningful strings in chapter-novels，the theory of locality is presented，as well as an effective locality measuring method.Locality and independency are combined to describe the probability of a string to be meaningful.Experiments indicate that the method out-performances all available algorithms.At the same time，the method is able to discover less-frequent meaningful strings effectively.

摘要： 已有有意义串发现算法对于大规模语料中频繁出现的有意义串发现效果较好，而对于语料规模小，或者出现频次较低的有意义串识别效果不够理想。根据章回小说有意义串出现的特点，提出有意义串的局部性原理，并给出了字符串局部性的有效度量方式。将字符串的局部性和语用独立性结合起来，使用局部性和独立性共同描述字符串为有意义串的可能性。实验结果表明：该方法对于章回小说有意义串发现的准确率高于已有方法，同时能够更有效地发现较多的低频有意义串。

CLC Number:

TP301.6

LI Hai-tao，MA Zhen-hua，SHEN Wen-hua. Meaningful string discovery algorithm for chapter-novel corpora[J]. Computer Engineering and Applications, 2010, 46(4): 129-131.

李海涛，马振华，沈文华. 章回小说的有意义串发现算法[J]. 计算机工程与应用, 2010, 46(4): 129-131.

[1]	LI Dong-mei^1，2，LIN You-fang²，HUANG Hou-kuan². Ontology inconsistent diagnosis algorithm on E_DAG [J]. Computer Engineering and Applications, 2010, 46(9): 19-22.
[2]	NIU Yu-guang，YAN Gao-wei，XIE Gang，XIE Ke-ming. Research on granular computing based multi-objective ranking method [J]. Computer Engineering and Applications, 2010, 46(9): 42-45.
[3]	WANG Xun-bin，LU Hui-juan，ZHANG Huo-ming. Mixed tabu search algorithm for logistics dynamic vehicle scheduling problem [J]. Computer Engineering and Applications, 2010, 46(8): 228-231.
[4]	XIAO Yong-jun，LI Tie-ke，YIN Zhao-tao. Hybrid Flow Shop scheduling with special time constraints [J]. Computer Engineering and Applications, 2010, 46(8): 205-207.
[5]	REN Xiao-kang，SUN Zheng-xing，HAO Rui-zhi. Measure of rough sets’s fuzziness and SVM hybrid classification algorithm [J]. Computer Engineering and Applications, 2010, 46(7): 46-48.
[6]	XIE An-shi¹，ZHOU Chuan-hua^1，2，XU Xin-wei¹，ZHANG Fen¹. Research on adaptive genetic algorithm based on PK model [J]. Computer Engineering and Applications, 2010, 46(7): 52-56.
[7]	YUAN Hao，HE Bo，LI Min. Method for generating S-box based on spatiotemporal chaos [J]. Computer Engineering and Applications, 2010, 46(7): 115-117.
[8]	LUO Yi-qin^1，2，NI Zhi-wei^1，2，YANGGE Zhong-xiao^1，2. New fractal clustering algorithm on data stream [J]. Computer Engineering and Applications, 2010, 46(6): 136-138.
[9]	QIAN Wen-bin¹，XU Zhang-yan¹，HUANG Li-yu¹，YANG Bing-ru². Attribution reduction algorithm based on binary discernibility matrix of information entropy [J]. Computer Engineering and Applications, 2010, 46(6): 120-123.
[10]	CHAI Xiu-rong^1，2，WANG Ru-jing¹. Research of emergent material dispatching algorithm based on multi-depot and multi-material [J]. Computer Engineering and Applications, 2010, 46(6): 224-226.
[11]	WANG Fang，DAI Yong-shou，WANG Shao-shui. Modified chaos-genetic algorithm [J]. Computer Engineering and Applications, 2010, 46(6): 29-32.
[12]	WANG Xuan，XIAO Li，LIN Yan-e. Solving traveling salesman problem by using thermodynamics evolutionary algorithm [J]. Computer Engineering and Applications, 2010, 46(5): 48-50.
[13]	YANG Jia-wen^1，2，SUN He-ming¹，ZHONG Qing¹，HU Shan-shan¹. Application of multidimensional associative memory neural networks in image recalling [J]. Computer Engineering and Applications, 2010, 46(4): 186-188.
[14]	KANG Yan¹，FENG Hai-peng²，XU Wen-bo³，YANG Yan-ping¹. Cooperative approach to Quantum-behaved Particle Swarm Optimization [J]. Computer Engineering and Applications, 2010, 46(4): 39-42.
[15]	LI Qing-sheng¹，YANG Yu-xing¹，MA Ji-lan². DNA algorithm of two kinds of full permutation problem based on sticker model [J]. Computer Engineering and Applications, 2010, 46(4): 46-48.

Meaningful string discovery algorithm for chapter-novel corpora

章回小说的有意义串发现算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics