Skip navigation

Modeling arsenic in European topsoils with a coupled semiparametric (GAMLSS-RF) model for censored data

Modeling arsenic in European topsoils with a coupled semiparametric (GAMLSS-RF) model for censored data

Fendrich, Arthur Nicolaus, Van Eynde, Elise, Stasinopoulos, Dimitrios M., Rigby, Robert A., Mezquita, Felipe Yunta and Panagos, Panos (2024) Modeling arsenic in European topsoils with a coupled semiparametric (GAMLSS-RF) model for censored data. Environment International, 185:108544. pp. 1-14. ISSN 0160-4120 (Print), 1873-6750 (Online) (doi:

47073_RIGBY_Modeling_arsenic_in_European_topsoils_with_a_coupled_semiparametric_GAMLSS-RF_model_for_censored_data.pdf - Published Version
Available under License Creative Commons Attribution.

Download (9MB) | Preview


Arsenic (As) is a versatile heavy metalloid trace element extensively used in industrial applications. As is carcinogen, poses health risks through both inhalation and ingestion, and is associated with an increased risk of liver, kidney, lung, and bladder tumors. In the agricultural context, the repeated application of arsenical products leads to elevated soil concentrations, which are also affected by environmental and management variables. Since exposure to As poses risks, effective assessment tools to support environmental and health policies are needed. However, the most comprehensive soil As data available, the Land Use/Cover Area frame statistical Survey (LUCAS) database, contains severe limitations due to high detection limits. Although within International Or- ganization for Standardization standards, the detection limits preclude the adoption of standard methodologies for data analysis. The present work focused on developing a new method to model As contamination in European soils using LUCAS soil samples. We introduce the GAMLSS-RF model, a novel approach that couples Random Forests with Generalized Additive Models for Location, Scale, and Shape. The semiparametric model can capture non-linear interactions among input variables while accommodating censored and non-censored observations and can be calibrated to include information from other campaign databases. After fitting and validating a spatial model, we produced European-scale As concentration maps at a 250 m spatial resolution and evaluated the patterns against reference values (i.e., two action levels and a background concentration). We found a significant variability of As concentration across the continent, with lower concentrations in Northern countries and higher concentrations in Portugal, Spain, Austria, France and Belgium. By overcoming limitations in existing databases and methodologies, the present approach provides an alternative way to handle highly censored data. The model also consists of a valuable probabilistic tool for assessing As contamination risks in soils, contributing to informed policy-making for environmental and health protection.

Item Type: Article
Uncontrolled Keywords: arsenic; GAMLSS; random forest; soil contamination; statistical modeling; trace element
Subjects: G Geography. Anthropology. Recreation > GE Environmental Sciences
H Social Sciences > HA Statistics
Q Science > Q Science (General)
Faculty / School / Research Centre / Research Group: Faculty of Engineering & Science
Faculty of Engineering & Science > School of Computing & Mathematical Sciences (CMS)
Last Modified: 14 May 2024 12:41

Actions (login required)

View Item View Item


Downloads per month over past year

View more statistics