信息网络安全 ›› 2021, Vol. 21 ›› Issue (11): 9-16.doi: 10.3969/j.issn.1671-1122.2021.11.002

• 技术研究 • 上一篇    下一篇

基于机器学习分类算法的前提选择技术研究

熊焰1, 程传虎2(), 武建双3, 黄文超1   

  1. 1.中国科学技术大学计算机科学与技术学院,合肥 230026
    2.中国科学技术大学网络空间安全学院,合肥 230026
    3.合肥天帷信息安全技术有限公司,合肥 230009
  • 收稿日期:2021-06-21 出版日期:2021-11-10 发布日期:2021-11-24
  • 通讯作者: 程传虎 E-mail:cchpersonal@163.com
  • 作者简介:熊焰(1960—),男,安徽,教授,博士,主要研究方向为计算机网络与信息安全、移动计算与移动网络、分布式处理|程传虎(1993—),男,河南,硕士研究生,主要研究方向为软件自动化验证、机器学习|武建双(1984—),男,山西,硕士,主要研究方向为网络安全等级保护|黄文超(1982—),男,湖北,副教授,博士,主要研究方向为信息安全、人工智能、移动计算、网络与系统安全自动化验证
  • 基金资助:
    国家自然科学基金(61972369)

Research on Premise Selection Technology Based on Machine Learning Classification Algorithm

XIONG Yan1, CHENG Chuanhu2(), WU Jianshuang3, HUANG Wenchao1   

  1. 1. Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
    2. Department of Cyberspace Security, University of Science and Technology of China, Hefei 230026, China
    3. Hefei Tianwei Information Security Technology Co., Ltd, Hefei 230009, China
  • Received:2021-06-21 Online:2021-11-10 Published:2021-11-24
  • Contact: CHENG Chuanhu E-mail:cchpersonal@163.com

摘要:

前提选择是提高定理自动证明成功率的关键技术,可以根据证明目标的相关性选择最有可能成功证明当前猜想的引理。已有的前提选择算法推荐的引理相关度不高,无法进一步提高定理的自动证明能力。针对以上问题,文章提出一种基于机器学习分类算法的组合方案,从公式结构和符号之间的依赖关系出发,提取有效特征向量集,并在k-近邻算法和朴素贝叶斯算法的基础上引入LDA主题词提取技术,进一步捕捉符号和依赖项之间的相关性,使得最后的组合算法预测的准确性更高。实验结果表明,该方案推荐的引理比现有的前提选择算法相关性更高,可以有效提高定理自动证明的成功率。

关键词: 定理自动证明, 前提选择, LDA主题提取, Coq证明助手

Abstract:

Premise selection is the key technology to improve the success rate of automatic theorem provf. It can choose the lemma which is most likely to prove the current conjecture successfully according to the relevance of the proving goal. However, the relevance of the lemmas recommended by the existing premise selection algorithm is not high, and the automatic proof ability of the theorem cannot be further improved. To solve the above problems, a combination algorithm based on machine learning classification is proposed. The scheme starts from the relationship between formula structure and symbols, extracts effective feature vector set, and introduces LDA topics extraction techniques on the basis of k-nearest neighbor algorithm and naive Bayes algorithm to further capture the correlation between symbols and dependencies, which makes the final combination algorithm more accurate. Experimental results show that this method has higher accuracy than that the existing premise selection algorithm, and can effectively improve the success rate of automatic theorem provf.

Key words: automatic theorem proving, premise selection, LDA topics extraction, Coq proof assistant

中图分类号: