/ プログラム/ 発表一覧/ 著者一覧/ 企業展示一覧/ jsai2012ホーム /

3M2-IOS-3b-3 On Chinese Postal Address and Associated Information Extraction


Tweet #jsai2012 このエントリーをはてなブックマークに追加

06月14日(Thu) 13:30〜18:00 M会場(-山口県自治会館/大会議室(80))
3M2-IOS-3b International Organized Session「Special Session on Web Intelligence & Data Mining (2)」

題目On Chinese Postal Address and Associated Information Extraction
著者Chia-Hui Chang(National Central University, Taiwan)
Chia-Yi Huang(National Central University, Taiwan)
Yueng-Sheng Su(National Central University, Taiwan)
時間06月14日(Thu) 14:30〜15:00
概要Address information is closely linked to people's daily life. People often need to query addresses of shopping malls, schools, and organization, and use the map service of map marking to make sure reality location. MapMarker is a service, which extracts English postal addresses from general web pages and marks them with associated information on map. This paper extends the idea to Chinese postal addresses extraction on the Web and improves the extraction of associated information for each address with hierarchical clustering. We show how to prepare the data for training and conduct full address extraction using both BIEO and IO tagging methods. We compare the difference with and without Yahoo Chinese word segmentation. The results show that Chinese postal addresses can be extracted with high F-measure 0.97 using BIEO tagging without word segmentation since incorrect segmentation can lead to worse labeling of address tokens. Meanwhile, associated information for each address is also identified based on clustering of the addresses into address blocks. The F-measure is improved to 0.92 from 0.90.