计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (28): 154-158.

• 数据库、信号与信息处理 • 上一篇    下一篇

频率相似度算法在审计规则库中的应用

谢岳山1,樊晓平1,廖志芳2,邱丽霞2   

  1. 1.中南大学 信息科学与工程学院,长沙 410075
    2.中南大学 软件学院,长沙 410002
  • 出版日期:2012-10-01 发布日期:2012-09-29

Application of frequency similarity algorithm in audit rulebase

XIE Yueshan1, FAN Xiaoping1, LIAO Zhifang2, QIU Lixia2   

  1. 1.College of Information Science and Engineering, Central South University, Changsha 410075, China
    2.College of Software, Central South University, Changsha 410002, China
  • Online:2012-10-01 Published:2012-09-29

摘要: 在语句语义相似度计算的算法中,没有考虑语句中的不同词语对语句之间相似度值的不同贡献程度,以致计算结果不理想。为此提出了基于频率函数的改进词语相似度算法,该算法将词语在语料库中的频率函数作为权重值,引入至语句的词语相似度计算中,以降低高频率词语在语句相似度值中的比重,提高算法精确率。由于当前审计方法存在散、乱、重复等现象,为了更好地复用已有的审计方法,将根据审计方法建立审计规则库,在此基础上,利用上述改进的语义相似度算法,计算用户输入与审计规则之间的相似度值,返回满足相似度阈值条件的审计规则所对应的审计方法,用户根据所返回的审计方法,选择合适的审计方法进行审计工作。实际应用表明,该算法的应用减少了人工搜索审计方法的时间,提高了审计效率。

关键词: 语义相似度, 审计方法, 审计规则库, 规则匹配

Abstract: As sentence semantic similarity algorithms based on HowNet ignore the weight that different words have different contribution to the sentence similarity value, the similarity result is not reasonable based on these algorithms. In order to solve this problem, the paper improves the traditional semantic similarity algorithm by adding a frequent function of words in corpus as a weight factor into the final sentence semantic similarity algorithm, this function can decrease the proportion that high frequence words devote to high sentence similarity value and improve the accuracy. In aspect of algorithm application, it is known that audit methods summarized by auditing administration are always reduplicated and not clear, in order to reuse these audit methods, this paper builds an audit rule base according to the audit methods. On this basis, it adopts the improved semantic similarity algorithm to calculate the similarity value between user input and rules in audit rule base. It will return the corresponding audit methods of the audit rules whose similarity value satisfy the given threshold, auditors can pick the most appropriate audit method to audit the programme. The application shows the algorithm improves the audit efficiency.

Key words: semantic similarity, audit method, audit rule base, rule matching