Skip navigation

An event-driven serverless ETL pipeline on AWS

An event-driven serverless ETL pipeline on AWS

Pogiatzis, Antreas ORCID: 0000-0001-8887-0139 and Samakovitis, Georgios ORCID: 0000-0002-0076-8082 (2020) An event-driven serverless ETL pipeline on AWS. Applied Sciences, 11 (1):191. ISSN 2076-3417 (Online) (doi:https://doi.org/10.3390/app11010191)

[img]
Preview
PDF (Open Access Article)
30902 SAMAKOVITIS_An_Event-driven_Serverless_ETL_Pipeline_On_AWS_(OA)_2020.pdf - Published Version
Available under License Creative Commons Attribution.

Download (538kB) | Preview

Abstract

This work presents an event-driven Extract, Transform, and Load (ETL) pipeline serverless architecture and provides an evaluation of its performance over a range of dataflow tasks of varying frequency, velocity, and payload size. We design an experiment while using generated tabular data throughout varying data volumes, event frequencies, and processing power in order to measure: (i) the consistency of pipeline executions; (ii) reliability on data delivery; (iii) maximum payload size per pipeline; and, (iv) economic scalability (cost of chargeable tasks). We run 92 parameterised experiments on a simple AWS architecture, thus avoiding any AWS-enhanced platform features, in order to allow for unbiased assessment of our model’s performance. Our results indicate that our reference architecture can achieve time-consistent data processing of event payloads of more than 100 MB, with a throughput of 750 KB/s across four event frequencies. It is also observed that, although the utilisation of an SQS queue for data transfer enables easy concurrency control and data slicing, it becomes a bottleneck on large sized event payloads. Finally, we develop and discuss a candidate pricing model for our reference architecture usage.

Item Type: Article
Uncontrolled Keywords: serverless, FaaS, event-driven, distributed, AWS, ETL, architecture
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty / Department / Research Group: Faculty of Liberal Arts & Sciences
Faculty of Liberal Arts & Sciences > School of Computing & Mathematical Sciences (CAM)
Last Modified: 19 Jan 2021 11:03
Selected for GREAT 2016: None
Selected for GREAT 2017: None
Selected for GREAT 2018: None
Selected for GREAT 2019: None
Selected for REF2021: None
URI: http://gala.gre.ac.uk/id/eprint/30902

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics