LEARNING OPTIMAL THRESHOLD FOR BAYESIAN POSTERIOR PROBABILITIES TO MITIGATE THE CLASS IMBALANCE PROBLEM
Class imbalance is one of the problems which degrade the classifier's performance. Researchers have introduced many methods to tackle this problem including pre-processing, internal classifier processing, and post-processing– which mainly relies on posterior probabilities. Bayesian Network (BN) is known as a classifier which produces good posterior probabilities. This study proposes two methods which utilize Bayesian posterior probabilities to deal with imbalanced data.
In the first method, we optimize the threshold on the posterior probabilities produced by BNs to maximize the F1-Measure. Once the optimal threshold is found, we use it for the final classification. We investigate this method on several Bayesian classifiers such as Naive Bayes (NB), BN, TAN, BAN, and Markov Blanket BN. In the second method, instead of learning on each classifier separately as in the former, we combine these classifiers by a voting ensemble. The experimental results on 20 benchmark imbalanceddatasets collected from the UCI repository show that our methods significantly outperform the baseline NB. These methods also perform as good as the state-of-the-art sampling methods and significantly better in certain
Authors who publish with Vietnam Journal of Science and Technology agree with the following terms:
- The manuscript is not under consideration for publication elsewhere. When a manuscript is accepted for publication, the author agrees to automatic transfer of the copyright to the editorial office.
- The manuscript should not be published elsewhere in any language without the consent of the copyright holders. Authors have the right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of their work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are encouraged to post their work online (e.g., in institutional repositories or on their websites) prior to or during the submission process, as it can lead to productive exchanges or/and greater number of citation to the to-be-published work (See The Effect of Open Access).