| Type | Diploma thesis | | Code | DIPL-2006-15 | | Title | Learning methods of classifiers from positive examples with numeric attributes | | Author | Nikolaos Trogkanis | | Year | 2006 | | Keywords | Machine Learning, Classification, Supervised Learning, Semi-supervised Learning, Data Description, Learning from positive and unlabeled data, Imbalance Datasets Problem | | Abstract | In this work we study two main areas of learning: supervised learning and semi-supervised learning for the problem of classification. Special emphasis is given to having labeled data from one class, the class that we are interested (positive class). So we have data description and learning from positive and unlabeled data respectively. From the last area we describe the following algorithms: Naive Bayes Positive (NBP), Naive Bayes Multinomial Positive (NBMP), and Biased-SVM. The NBP algorithm was developed by us for use to be used on any dataset having nominal and/or numeric attributes. The main shortcoming of the two Naive Bayes Positive algorithms is that they require the user to give the positive class probability, which is hard for the user to provide in practice. For this reason we have developed four methods for computing it.
The above algorithms are evaluated on seven imbalance datasets and they are compared with the respective algorithms of supervised learning. The experimental results show that they are very competitive. Using a smaller number of labeled training examples, only from the positive class, and a large set of unlabeled examples, it is possible to build a classifier with equal or even better performance. Therefore, our benefit is huge considering that the classification of the training data is mostly done manually, a labor-intensive and very time consuming procedure. Experimental results show also that the methods for estimating the positive class probability have very good results. | | Category | Other | | File | View |
Return to Publications Page
|