/ プログラム/ 発表一覧/ 著者一覧/ 企業展示一覧/ jsai2012ホーム /

3M2-IOS-3b-2 Utilising Bilingual Lexical Resources for Technical Term Extraction

*セッションの無断動画配信はご遠慮下さい。

Tweet #jsai2012 このエントリーをはてなブックマークに追加

06月14日(Thu) 13:30〜18:00 M会場(-山口県自治会館/大会議室(80))
3M2-IOS-3b International Organized Session「Special Session on Web Intelligence & Data Mining (2)」

演題番号3M2-IOS-3b-2
題目Utilising Bilingual Lexical Resources for Technical Term Extraction
著者Chaimongkol Panot(The University of Tokyo)
Pontus Stenetorp(The University of Tokyo)
Akiko Aizawa(The University of Tokyo)
時間06月14日(Thu) 14:00〜14:30
概要Technical Term Extraction (TTE) is the task of detecting mentions of technical terms in scientific texts, thus it can be framed as a special case of Named Entity Recognition (NER). TTE is a stepping-stone to perform semantic analysis of scientific texts and is essential for information extraction and knowledge retrieval. For NER, annotated resources are commonly coupled with supervised learning methods to produce and evaluate state-of-the-art systems. However, the current lack of annotated resources for TTE hampers further research efforts. To perform a preliminary study we induce annotations by exploiting author keywords assigned to scientific texts. We construct a baseline system by training a Conditional Random Field model and a set of well-established NER features. Furthermore we examine potential benefits of incorporating extra linguistic resources for TTE utilising bilingual dictionary resources. Mere dictionaries, however, is not enough to identify technical terms; notation variation, polysemy, homography, and other ambiguities must be clarified using information from co-occurrence of words or context. It is our hypothesis that bilingual dictionaries are promising for disambiguation of meanings by looking at cross-language information. We incorporate features from bilingual dictionaries and evaluate it towards our baseline model and find that there are potential benefits for our proposed model.
論文PDFファイル