AI-Driven feature-enhanced stacking ensemble with Global-Context Vision transformers for breast cancer classification in ultrasound images
Vo, Nghia Trong, Duong, Hoang Phi Yen, Nguyen, Tuan Thanh ORCID: https://orcid.org/0000-0003-0055-8218, Le, Nhan Duc and Duong, Trung Q.
(2026)
AI-Driven feature-enhanced stacking ensemble with Global-Context Vision transformers for breast cancer classification in ultrasound images.
IEEE Internet of Things Journal.
ISSN 2327-4662 (Online)
(In Press)
(doi:10.1109/JIOT.2026.3679487)
Preview |
PDF (Author's Accepted Manuscript)
52782 NGUYEN_AI-Driven_Feature-Enhanced_Stacking_Ensemble_With_Global-Context_Vision_(AAM)_2026.pdf - Accepted Version Download (12MB) | Preview |
Abstract
Breast cancer remains a leading cause of death among women worldwide. Early detection of breast cancer is a crucial step towards improving survival rates for patients affected by the disease and is typically performed with the help of ultrasound imaging. Current rapid advancements in artificial intelligence (AI) research have produced a plethora of machine learning methods that aid in building automated diagnostic assistance systems for early cancer detection, including breast cancer detection. While deep learning has shown promise in medical image analysis, most existing approaches rely on single models or simple ensemble methods that fail to fully exploit complementary feature representations across architectures. This paper introduces a novel feature-enhanced stacking ensemble framework that combines state-of-the-art global context vision transformer (GCViT) with well-established convolutional neural network (CNN) architectures (ResNet-50V2, ConvNeXt-Tiny, and EfficientNetV2-B3) for automated breast cancer classification from ultrasound images. Unlike conventional ensembles that aggregate only prediction probabilities, our approach extracts deep feature embeddings from a dedicated CNN branch and concatenates them with base model predictions as input to a meta-learner, a multi-layer perceptron (MLP), enabling the ensemble to leverage both decision-level and feature-level information. When incorporating a meta model with feature representations from a CNN-based feature extractor, we are able to produce superior performance across multiple metrics compared to prior works. We accomplish top performance of 94.23\% accuracy, 95.47\% AUC-ROC. To further evaluate the robustness and generalizability of our approach, we conduct additional experiments on the melanoma cancer image dataset and achieve 95.4\% accuracy. We provide comprehensive explainability analysis through shapley additive explanations (SHAP) values for feature attribution, permutation importance for model contribution quantification, and saliency maps for visual interpretation from base models and the end-to-end ensemble model to explain their contributions to final predictions.
| Item Type: | Article |
|---|---|
| Additional Information: | © 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
| Uncontrolled Keywords: | ensemble learning, vision transformer, breast cancer classification, Deep Learning |
| Subjects: | Q Science > Q Science (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer) |
| Faculty / School / Research Centre / Research Group: | Faculty of Engineering & Science Faculty of Engineering & Science > School of Computing & Mathematical Sciences (CMS) |
| Related URLs: | |
| Last Modified: | 30 Mar 2026 11:10 |
| URI: | https://gala.gre.ac.uk/id/eprint/52782 |
Actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year
Tools
Tools