Research on Premise Selection Technology Based on Machine Learning Classification Algorithm

doi:10.3969/j.issn.1671-1122.2021.11.002

Abstract

Abstract:

Premise selection is the key technology to improve the success rate of automatic theorem provf. It can choose the lemma which is most likely to prove the current conjecture successfully according to the relevance of the proving goal. However, the relevance of the lemmas recommended by the existing premise selection algorithm is not high, and the automatic proof ability of the theorem cannot be further improved. To solve the above problems, a combination algorithm based on machine learning classification is proposed. The scheme starts from the relationship between formula structure and symbols, extracts effective feature vector set, and introduces LDA topics extraction techniques on the basis of k-nearest neighbor algorithm and naive Bayes algorithm to further capture the correlation between symbols and dependencies, which makes the final combination algorithm more accurate. Experimental results show that this method has higher accuracy than that the existing premise selection algorithm, and can effectively improve the success rate of automatic theorem provf.

Key words: automatic theorem proving, premise selection, LDA topics extraction, Coq proof assistant

CLC Number:

TP309

XIONG Yan, CHENG Chuanhu, WU Jianshuang, HUANG Wenchao. Research on Premise Selection Technology Based on Machine Learning Classification Algorithm[J]. Netinfo Security, 2021, 21(11): 9-16.

Figures/Tables 7

References 16

[1]	BLANCHETTE J C, KALISZYK C, PAULSON L C, et al. Hammering Towards QED[J]. Journal of Formalized Reasoning, 2016, 9(1):101-148.
[2]	BARRETT C, TINELLI C. Satisfiability Modulo Theories[M]. Cham: Springer, 2018.
[3]	ZHANG Hengruo, FU Ming. Design and Implementation of CoQ Automatic Proof Strategy Based on Z3[J]. Journal of Software, 2017, 28(4):819-826.
	张恒若, 付明. 基于Z3的Coq自动证明策略的设计和实现[J]. 软件学报, 2017, 28(4):819-826.
[4]	MENG Jia, PAULSON L C. Lightweight Relevance Filtering for Machine-generated Resolution Problems[J]. Journal of Applied Logic, 2009, 7(1):41-57. doi: 10.1016/j.jal.2007.07.004 URL
[5]	JONES K S. A Statistical Interpretation of Term Specificity and Its Application in Retrieval[J]. Journal of Documentation, 1972, 28(1):11-21. doi: 10.1108/eb026526 URL
[6]	ROEDERER A, PUZIS Y, SUTCLIFFE G. Divvy: An ATP Meta-system Based on Axiom Relevance Ordering[C]// Springer. International Conference on Automated Deduction, August 2-7, 2009, Montreal, QC, Canada. Heidelberg: Springer, 2009: 157-162.
[7]	JAKUBŮV J, URBAN J. Hammering Mizar by Learning Clause Guidance[EB/OL]. https://arxiv.org/pdf/1904.01677.pdf, 2019-04-02.
[8]	FÄRBER M, KALISZYK C. Random Forests for Premise Selection[C]// Springer. International Symposium on Frontiers of Combining Systems, September 21-24, 2015, Wroclaw, Poland. Cham: Springer, 2015: 325-340.
[9]	PIOTROWSKI B, URBAN J. ATPboost: Learning Premise Selection in Binary Setting with ATP Feedback[C]// Springer. International Joint Conference on Automated Reasoning, July 14-17, 2018, Oxford, UK. Cham: Springer, 2018: 566-574.
[10]	KÜHLWEIN D, VAN LAARHOVEN T, TSIVTSIVADZE E, et al. Overview and Evaluation of Premise Selection Techniques for Large Theory Mathematics[C]// Springer. International Joint Conference on Automated Reasoning, June 26-29, 2012, Manchester, UK. Heidelberg: Springer, 2012: 378-392.
[11]	KALISZYK C, URBAN J. Mizar 40 for Mizar 40[J]. Journal of Automated Reasoning, 2015, 55(3):245-256. doi: 10.1007/s10817-015-9330-8 URL
[12]	URBAN J, SUTCLIFFE G, PUDLÁK P, et al. MaLARea SG1-machine Learner for Automated Reasoning with Semantic Guidance[C]// Springer. International Joint Conference on Automated Reasoning, August 12-15, 2008, Sydney, Australia. Heidelberg: Springer, 2008: 441-456.
[13]	PIOTROWSKI B, URBAN J. Stateful Premise Selection by Recurrent Neural Networks[EB/OL]. https://arxiv.org/abs/2004.08212, 2020-03-11.
[14]	CHVALOVSKÝ K, JAKUBŮV J, SUDA M, et al. ENIGMA-NG: Efficient Neural and Gradient-boosted Inference Guidance for E[C]// Springer. International Conference on Automated Deduction, August 27-30, 2019, Natal, Brazil. Cham: Springer, 2019: 197-215.
[15]	FERREIRA D, FREITAS A. Premise Selection in Natural Language Mathematical Texts[EB/OL]. https://www.researchgate.net/publication/343296926_Premise_Selection_in_Natural_Language_Mathematical_ Texts, 2020-12-11.
[16]	KALISZYK C, URBAN J, MICHALEWSKI H, et al. Reinforcement Learning of Theorem Proving[EB/OL]. https://arxiv.org/pdf/1805.07563.pdf, 2018-12-03.

	Math Classes	CoRN
定理（引理）数量	2730	4941
特征数量	63629	90378
依赖数量	30053（2415 unique）	30531（4363 unique）