演題番号 | 1G2-5 |
---|---|
題目 | WebSim: A Web-based Semantic Similarity Measure |
著者 | ボッレーガラ ダヌシカ (東京大学 情報理工学系研究科) 松尾 豊 (産業技術総合研究所/スタンフォード大学) 石塚 満 (東京大学大学院 情報理工学系研究科) |
時間 | 6月20日(水) 16:00〜16:20 |
概要 | Semantic similarity measures are important for numerous tasks innatural language processing such as word sense disambiguation,automatic synonym extraction, language modelling and document clustering. We propose a method to measure semanticsimilarity between two words using information availableon the Web. We extract page counts and snippets for the AND queryof the two words from a Web search engine. We define numerous similarity scoresbased on page counts and lexico-syntactic patterns. These similarity scoresare integrated using support vector machines to form a robust semanticsimilarity measure. Proposed method outperforms all existing Web-basedsemantic similarity measures on Miller-Charles benchmark dataset achievinga high correlation coefficient of 0.834 with human ratings. |
論文 | PDFファイル |