第1个回答 2011-06-05
中文 » 英语<~ 复制The Chinese word segmentation is Chinese information processing foundation. In natural language understanding, language research, Chinese text automatic indexing, information retrieval, machine translation, etc, the Chinese word segmentation plays an irreplaceable role. Therefore, the Chinese word segmentation research is very important.
However, the Chinese word segmentation research level is already far behind its associated related technologies, become the bottleneck of restricting the development of other technologies. The Chinese word segmentation research process encountered the following questions: linguistic difficulties, the words appear ceaselessly, ambiguity discriminant, participle standard is not uniform; Computer difficulties, no reasonable natural language form model, no effective way for understanding of the semantic and formalized, etc. These problems will restricts the development of the Chinese word segmentation.
Based on synthetic analysis of existing research results of the Chinese word segmentation, focus on Chinese word segmentation based on graph, is put forward based on S - EK figure shortest path Chinese word segmentation. The main content of the study are as follows:
1. The main for the Chinese word segmentation algorithm was studied, and the comparison and analysis of three commonly used words segmentation algorithm based on string matching, based on statistical words segmentation algorithm and the words segmentation algorithm based on knowledge understanding and of words segmentation algorithm and the advantages and disadvantages of between them are summarized. Finally the paper also gives the assessment of the Chinese word segmentation and its significance.
2. Key in a directed graph and combined Chinese word segmentation is studied, the shortest path to N - the Chinese word segmentation algorithm digraph was improved, puts forward S - EK chart and adopt N - yuan statistical model to compute a word in a certain context, and the probability of made smooth processing, value the final result as S - EK figure edge metric.
3. Based on S - EK proposed graph advantages s-rough shortest path algorithm EK. This algorithm in and N - a shortest path algorithm and Dijkstra algorithm is compared, and the experiment and theoretical derivation proves this algorithm has certain advantages and value.
Keywords: the Chinese word segmentation; Information processing; S - EK figure; The shortest path; Statistical model