演題番号 | 3B1-01 |
---|---|
題目 | Classifying biomedical text abstracts using binary and multi-class Support Vector Machine |
著者 | Dollah Rozilawati (Toyohashi University of Technology) Masaki Aono (Toyohashi University of Technology) |
時間 | 6月13日(金) 09:00~09:20 |
概要 | "Overwhelming amount of biomedical paper abstracts has been accumulated week after week at PubMed Web site. This site is thus a rich source of life science as well as biomedical textual information, yet at the same time, it makes us a challenging task to retrieve and classify conceptually similar paper abstracts solely by the contents, not by their pre-defined categories, not by their linguistic similarities. We have observed that quite a few paper abstracts have two or more different categorical information. For instance, a paper abstract may describe both HIV/AIDS and cancer. Therefore we cannot completely rely on the categorical information based on linguistic similarity that could be extracted from abstracts alone. In this paper, we will describe a method for classifying biomedical paper abstracts not by their linguistic similarities, but by their content-based similarity with multi-class SVM, by taking four differently categorized diseases as examples. Specifically, we have collected paper abstracts which originally belong to HIV/AIDS, cancer, hepatitis, and thyroid categories. We will then merge and re-classify them with our proposed method. Finally we will compare our results with well-known MeSH terms that is a pre-defined way of providing us with different terminology of the same concepts available at PubMed Web site." |
論文 | PDFファイル |