Cross-model evaluation of LLMs for generating formal specification of distributed Industrial Control Systems

Tools

Raptis, George E., Khan, Muhammad Taimoor ORCID: https://orcid.org/0000-0002-5752-6420, Koulamas, Christos and Serpanos, Dimitrios (2025) Cross-model evaluation of LLMs for generating formal specification of distributed Industrial Control Systems. Procedia Computer Science, 270. pp. 5846-5854. ISSN 1877-0509 (Online) (doi:10.1016/j.procs.2025.10.053)

[thumbnail of Open Access Conference Article]

Preview

PDF (Open Access Conference Article)
51785 KHAN_Cross-Model_Evaluation_Of_LLMs_For_Generating_Formal_Specification_Of_Distributed_Industrial_Control_Systems_(OA)_2025.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (537kB) | Preview

Official URL: https://doi.org/10.1016/j.procs.2025.10.053

Abstract

The increasing complexity and decentralization of Industrial Control Systems (ICS) have expanded the attack surface for cyber-security threats, particularly in critical infrastructure domains. Inline monitoring using formal annotations like the Java Modeling Language (JML) offers a lightweight yet precise method to detect behavioral anomalies. However, the manual creation of such specifications is resource-intensive and requires domain expertise. This paper explores generative artificial intelligence (AI), specifically large language models (LLMs), to automate the synthesis of inline formal security monitors. We benchmark three state-of-the-art LLMs (GPT-4o, DeepSeek-V3, and Gemini 2.5 Flash) on their ability to generate JML annotations for ICS software drawn from the ASM2S water distribution system. Our evaluation across five dimensions (syntax, semantics, property coverage, alarm semantics, and effort savings) reveals distinct trade-offs. GPT-4o demonstrates strong syntactic and structural alignment with ASM2S, while DeepSeek-V3 offers richer behavioral modeling. Gemini 2.5 Flash showcases conceptual depth but introduces non-verifiable constructs. These findings demonstrate the potential of LLMs as co-pilots in secure-by-design ICS development and underscore the need for syntax-aware fine-tuning and interactive verification workflows.

Item Type:	Article
Additional Information:	29th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2025)
Uncontrolled Keywords:	inline security monitoring, industrial control systems, distributed ICS, water distribution systems, JML specifications, generative AI, large language models, cybersecurity, formal verification, runtime anomaly detection
Subjects:	Q Science > Q Science (General) Q Science > QA Mathematics Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty / School / Research Centre / Research Group:	Faculty of Engineering & Science Faculty of Engineering & Science > School of Computing & Mathematical Sciences (CMS)
Last Modified:	26 Nov 2025 10:00
URI:	https://gala.gre.ac.uk/id/eprint/51785

Actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics

Altmetric