/ プログラム/ 発表一覧/ 著者一覧/ 企業展示一覧/ jsai2012ホーム /

1K2-IOS-1b-4 A new classification for multiclass imbalanced datasets based on clustering approach

*セッションの無断動画配信はご遠慮下さい。

Tweet #jsai2012 このエントリーをはてなブックマークに追加

06月12日(Tue) 15:30〜20:00 K会場(-ゆ~あいプラザ山口県社会福祉会館/第1会議室(81))
1K2-IOS-1b International Organized Session「Application Oriented Principles of Machine Learning and Data Mining (2)」

演題番号1K2-IOS-1b-4
題目A new classification for multiclass imbalanced datasets based on clustering approach
著者Prachuabsupakij Wanthanee(Department of Computer Science, Faculty of Science, Kasetsart University)
Nuanwan Soonthornphisaj(Kasetsart University, Bangkok, Thailand)
時間06月12日(Tue) 17:00〜17:30
概要The aim of this paper is to improve the classification performance based on the multiclass imbalanced datasets. In this paper, we introduce a new classification technique based on Clustering approach for Imbalanced Multiclass datasets (CIM). CIM uses the clustering approach to create a new training set for each cluster and apply two re-sampling technique to re-balance the class distribution. CIM improves the classification performance based on the multiclass imbalanced datasets in three ways. Firstly, k-means is used to split the set of instances into two clusters. Then, for each cluster, two re-sampling tehcnique (oversampling and undersampling) are applied on the the training set in order to balance the class distribution. Finally, ensemble approaches are used to combine the models obtained with our method through a majority vote. We have conducted experiments on many multiclass datasets from the UCI. These datasets consist of two types of class distribution; balance and imbalance. We use different classifiers in order to observe the performance and suitability of our purpose within each classifier. We carry out the experimental study with the several well-know algorithms such as Decision Trees, Naïve Bayes, and K-Nearest Neighbors . The performance is measured based on G-mean and F-measure. The experimental results show that the proposed method achieved higher performance than the baseline algorithms; One-Against-One, One-Against-All, and Error-Correcting-Output-Coding (ECOC), and the baseline with oversampling algorithms in many classifiers. Moreover, the empirical results show that CIM algorithm is a practical algorithm since it can be applied to both balance and imbalance datasets. The proposed method was successfully applied to many datasets. Since CIM creates the new training sets that consist of the instances with similar characteristics and these instances are relabeled.
論文PDFファイル