A semisupervised classification algorithm combining noise learning theory and a disagreement cotraining framework

Tools

Yang, Zaoli, Zhang, Weijian, Han, Chunjia, Li, Yuchen, Yang, Mu and Ieromonachou, Petros ORCID: https://orcid.org/0000-0002-5842-9585 (2022) A semisupervised classification algorithm combining noise learning theory and a disagreement cotraining framework. Information Sciences, 622. pp. 889-902. ISSN 0020-0255 (doi:10.1016/j.ins.2022.11.115)

Preview

PDF (AAM)
38230_IEROMONACHOU_A_semisupervised_classification_algorithm_combining_noise_learning_theory.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (737kB) | Preview

Official URL: https://doi.org/10.1016/j.ins.2022.11.115

Abstract

In the era of big data, the data in many business scenarios are characterized by a small number of labelled samples and a large number of unlabelled samples. It is quite difficult to classify and identify such data and provide effective decision support for a business. A commonly employed processing method in this kind of data scenario is the disagreement-based semisupervised learning method, i.e., exchanging high-confidence samples among multiple models as pseudolabel samples to improve each model’s classification performance. As such pseudolabel samples inevitably contain label noise, they may interfere with the subsequent model learning and damage the robustness of the ensemble model. To solve this problem, a semisupervised classification algorithm based on noise learning theory and a disagreement cotraining framework is proposed. In this model, first, the probably approximately correct (PAC) estimation theory under label noise conditions is applied, the relationship between the label noise level and model robust estimation in the process of multiround cotraining is discussed, and a disagreement elimination algorithm framework based on multiple-model (feature argument and select (FANS) algorithm and L1 penalized logistics regression (PLR) algorithm) cotraining is constructed based on this theoretical relationship. The experimental results show that the algorithm proposed in this paper gives not only a high-confidence sample set that meets the upper bound constraint of the label noise level but also a robust ensemble model capable of resisting sampling bias. The work performed in this paper provides a new research perspective for semisupervised learning theory based on disagreement.

Item Type:	Article
Uncontrolled Keywords:	semisupervised classification; noise learning theory; disagreement cotraining; feature argument and select algorithm; L1 penalized logistics regression algorithm
Subjects:	H Social Sciences > HB Economic Theory H Social Sciences > HF Commerce Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty / School / Research Centre / Research Group:	Faculty of Business Faculty of Business > Department of Systems Management & Strategy Faculty of Business > Networks and Urban Systems Centre (NUSC) Faculty of Business > Networks and Urban Systems Centre (NUSC) > Connected Cities Research Group Greenwich Business School > Networks and Urban Systems Centre (NUSC) Greenwich Business School > Networks and Urban Systems Centre (NUSC) > Connected Cities Research Group (CCRG)
Related URLs:	https://www.sciencedirect.com/science/ar...
Last Modified:	06 Dec 2024 03:00
URI:	http://gala.gre.ac.uk/id/eprint/38230

Actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics

Altmetric