06月14日(Thu) 13:30〜18:00 M会場(-山口県自治会館/大会議室(80))
演題番号 | 3M2-IOS-3b-3 |
---|---|
題目 | On Chinese Postal Address and Associated Information Extraction |
著者 | Chia-Hui Chang(National Central University, Taiwan) Chia-Yi Huang(National Central University, Taiwan) Yueng-Sheng Su(National Central University, Taiwan) |
時間 | 06月14日(Thu) 14:30〜15:00 |
概要 | Address information is closely linked to people's daily life. People often need to query addresses of shopping malls, schools, and organization, and use the map service of map marking to make sure reality location. MapMarker is a service, which extracts English postal addresses from general web pages and marks them with associated information on map. This paper extends the idea to Chinese postal addresses extraction on the Web and improves the extraction of associated information for each address with hierarchical clustering. We show how to prepare the data for training and conduct full address extraction using both BIEO and IO tagging methods. We compare the difference with and without Yahoo Chinese word segmentation. The results show that Chinese postal addresses can be extracted with high F-measure 0.97 using BIEO tagging without word segmentation since incorrect segmentation can lead to worse labeling of address tokens. Meanwhile, associated information for each address is also identified based on clustering of the addresses into address blocks. The F-measure is improved to 0.92 from 0.90. |
論文 | PDFファイル |