/ プログラム/ 発表一覧/ 著者一覧/ 企業展示一覧/ jsai2012ホーム /

3B1-R-2-1 A Distance Between Text Documents based on Topic Models and Ground Metric Learning


Tweet #jsai2012 このエントリーをはてなブックマークに追加

06月14日(Thu) 09:00〜12:20 B会場(-山口県教育会館/第一研修室(141))
3B1-R-2 機械学習「機械学習(1)」

題目A Distance Between Text Documents based on Topic Models and Ground Metric Learning
著者金 涛(京都大学情報学研究科知能情報学専攻)
Cuturi Marco(京都大学大学院情報学研究科知能情報学専攻)
山本 章博(京都大学 大学院情報学研究科)
時間06月14日(Thu) 09:00〜09:20
概要We propose a new distance between text documents that builds upon two techniques. We first represent each document in a database as a histogram of topics using the Latent Dirichlet Allocation (LDA) topic model. We then compare two documents by computing the earth mover's distance between their respective topic histograms. The Earth Mover's Distance parameter, which is in that case a metric matrix between topics, is estimated using Ground Metric Learning. We carry out experiments on different text databases that illustrate the interest of our approach.