06月14日(Thu) 13:30〜18:00 M会場(-山口県自治会館/大会議室(80))
演題番号 | 3M2-IOS-3b-2 |
---|---|
題目 | Utilising Bilingual Lexical Resources for Technical Term Extraction |
著者 | Chaimongkol Panot(The University of Tokyo) Pontus Stenetorp(The University of Tokyo) Akiko Aizawa(The University of Tokyo) |
時間 | 06月14日(Thu) 14:00〜14:30 |
概要 | Technical Term Extraction (TTE) is the task of detecting mentions of technical terms in scientific texts, thus it can be framed as a special case of Named Entity Recognition (NER). TTE is a stepping-stone to perform semantic analysis of scientific texts and is essential for information extraction and knowledge retrieval. For NER, annotated resources are commonly coupled with supervised learning methods to produce and evaluate state-of-the-art systems. However, the current lack of annotated resources for TTE hampers further research efforts. To perform a preliminary study we induce annotations by exploiting author keywords assigned to scientific texts. We construct a baseline system by training a Conditional Random Field model and a set of well-established NER features. Furthermore we examine potential benefits of incorporating extra linguistic resources for TTE utilising bilingual dictionary resources. Mere dictionaries, however, is not enough to identify technical terms; notation variation, polysemy, homography, and other ambiguities must be clarified using information from co-occurrence of words or context. It is our hypothesis that bilingual dictionaries are promising for disambiguation of meanings by looking at cross-language information. We incorporate features from bilingual dictionaries and evaluate it towards our baseline model and find that there are potential benefits for our proposed model. |
論文 | PDFファイル |