05月23日(Tue) 13:50〜15:30 L会場(ウインクあいち-10F 1003会議室)
演題番号 | 1L1-5 |
---|---|
題目 | Combining Multiple Dictionaries to Improve Tokenization of Ainu Language |
著者 | プタシンスキ ミハウ(北見工業大学) 伊藤 優花(釧路工業高等専門学校) ノヴァコフスキ カロル(Independent Researcher) 本間 宏利(釧路工業高等専門学校情報工学科) 中島 陽子(釧路工業高等専門学校) 桝井 文人(北見工業大学 情報システム工学科) |
時間 | 05月23日(Tue) 15:10〜15:30 |
概要 | In this paper we present our research in improving a tokenizer for Ainu language. Tokenization is a process where a sentence is separated into basic elements, such as words or morphemes. Ainu language is a critically endangered language of Ainu people living in northern parts of Japan. Since Ainu language originally did not have a writing system, document in Ainu language are usually transcribed in an systematized way. To allow effective processing and contribute to further revitalization of Ainu language, we combine multiple official Ainu language dictionaries to improve tokenization of such documents. We also compare state-of-the-art tokenizer with custom one based for the needs of this research. |
論文 | PDFファイル |