|
Algorithm of Chinese Keywords Extraction based on Multi-feature
PAN Li-min, WU Jun-hua, LIN Meng, LUO Sen-lin
2014, 14 (8):
40-44.
doi: 10.3969/j.issn.1671-1122.2014.08.007
In text processing area, key words has become a critical technique for a long time. Key words extraction is aimed to extract the vital words or phrases which can summarize the literature content. Considering the influence of 6 factors (such as term frequency, term length, part of speech, position, internet-dictionary and stop word list) to the weight of keywords in text, we propose a new algorithm of Chinese keywords extraction in this paper. The proposed algorithm combines linear weighting, and compound word construction and filtering. The experimental data consist of 10 categories of literature which are downloaded from China National Knowledge Infrastructure, namely environment, information technology, transportation, education, economics, culture and history, chemistry, medicine, agriculture and politics. The results show when the value of candidate words equals 5, the approximate matching precision is 54.8%, the recall rate is 65.1%. The proposed method can not only solves the problem of low recall coursed by word-segmentation in keyword extraction, but also extract words which are not high-frequency but important for the text meaning effectively.
References |
Related Articles |
Metrics
|