Programming Research Group
Research Report RR-07-01
Learning to Extract Significant Phrases from Text
Yuan J.Lui,
February 2007, 14pp.
Abstract
Prospective readers can quickly determine whether a document is relevant to their information need if the
significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many
documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly.
Therefore, there is a need for automatic keyphrase extraction. This report introduces a new domain independent
keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification
task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes,
and a new machine learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that
this algorithm performs at least as well as other keyphrase extraction tools and that it significantly outperforms
Microsoft Word 2000's AutoSummarize feature.
This paper is available as a 103,337 bytes pdf file.
|