/ プログラム/ 発表一覧/ 著者一覧/ 企業展示一覧/ jsai2012ホーム /

3M2-IOS-3b-4 Extracting Transliteration Pairs from Classical Chinese Buddhist Literature

*セッションの無断動画配信はご遠慮下さい。

Tweet #jsai2012 このエントリーをはてなブックマークに追加

06月14日(Thu) 13:30〜18:00 M会場(-山口県自治会館/大会議室(80))
3M2-IOS-3b International Organized Session「Special Session on Web Intelligence & Data Mining (2)」

演題番号3M2-IOS-3b-4
題目Extracting Transliteration Pairs from Classical Chinese Buddhist Literature
著者Yu-Chun Wang(Department of Computer Science and Information Engineering, National Taiwan University)
Tsai Richard Tzong-Han(Yuan Ze University)
時間06月14日(Thu) 15:00〜15:30
概要Transliteration pair extraction, which identifies transliterations corresponding to foreign loanwords in literature, is a key task and very challenging in several research fields such as historical linguistics and digital humanities. In this paper, we focus on one important type of historical literature: classical Chinese Buddhist texts. We propose an approach which can identify transliteration pairs from classical Chinese texts automatically. Our approach comprises two stages: transliteration extraction and transliteration pair identification. To extract more possible transliterations without introducing too many false positives, we adopt a hybrid method consists of a machine-learning-based extraction method with phonological features of the transliteration characters and a suffix-array-based extraction method with filtering rules. Next, the extracted transliteration candidates are compared with their phonetic similarity mutually based on the phonological pronunciation from the middle Chinese rime book "Guangyun" and then ALINE algorithm is employed to measure phonetic similarity to identify the transliteration pairs. To evaluate our method, we construct an evaluation set from several Buddhist texts such as Samyukta Agama and Mahavibhasa, which are translated into Chinese in different eras. Precision and recall are used to measure and show the effectiveness of our method.
論文PDFファイル