Skip navigation

Preprocessing framework for scholarly big data management

Preprocessing framework for scholarly big data management

Khan, Samiya ORCID: 0000-0003-0837-5125 and Alam, Mansaf (2022) Preprocessing framework for scholarly big data management. Multimedia Tools and Applications, 82. pp. 39719-39743. ISSN 1380-7501 (Print), 1573-7721 (Online) (doi:

PDF (AAM (uncorrected proof))
44549_KHAN_Preprocessing_framework_for_scholarly_big_data_management.pdf - Accepted Version

Download (2MB) | Preview


Big data technologies have found applications in disparate domains. One of the largest sources of textual big data is scientific documents and papers. Scholarly big data has been used in numerous ways to develop innovative applications such as collaborator discovery, expert finding and research management systems. With the evolution of machine and deep learning techniques, the efficacy of such applications has risen manifold. However, the biggest challenge in the development of deep learning models for scholarly applications in cloud-based environment is the under-utilization of resources because of the excessive time required for textual preprocessing. This paper presents a preprocessing pipeline that uses Spark for data ingestion and Spark ML for performing preprocessing tasks. The proposed approach is evaluated with the help of a case study, which uses LSTM-based text summarization to generate title or summaries from abstracts of scholarly articles. Results indicate a substantial reduction in ingestion, preprocessing and cumulative time for the proposed approach, which shall manifest reduction in development time and costs as well.

Item Type: Article
Uncontrolled Keywords: deep learning applications; preprocessing pipeline; scholarly big data; scholarly data applications; Spark ML
Subjects: L Education > L Education (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty / School / Research Centre / Research Group: Faculty of Engineering & Science
Faculty of Engineering & Science > School of Computing & Mathematical Sciences (CMS)
Last Modified: 23 Oct 2023 10:16

Actions (login required)

View Item View Item


Downloads per month over past year

View more statistics