Skip navigation

Digital forensic analysis of internet history using principal component analysis

Digital forensic analysis of internet history using principal component analysis

Gresty, David W., Gan, Diane ORCID: 0000-0002-0920-7572 and Loukas, George ORCID: 0000-0003-3559-5182 (2014) Digital forensic analysis of internet history using principal component analysis. In: PGNET Proceedings of the 15th Annual Postgraduate Symposium on the Convergence of Telecommunications, Networking and Broadcasting (PGNet 2014). Liverpool John Moores University, Liverpool, UK. ISBN 9781902560281

[img]
Preview
PDF (Author's Accepted Manuscript)
14948_Loukas_Digital forensic analysis (AAM) 2014.pdf - Accepted Version

Download (307kB) | Preview

Abstract

A modern Digital Forensic examination, even on a small-scale home computer typically involves searching large-size hard disk drive storage, a variety of host and web-based applications which may or may not be known to the investigator, and a proliferation of web-based Internet history artefacts that may be highly significant to showing the motivation of a suspect. Faster keyword searching and larger and more accurate sets of file hashes may point the investigator to relevant artefacts but when dealing with the new or the unknown, or there is a need to holistically profile the activity of the computer, the investigator is left with a manual and labour-intensive investigation. This paper proposes using an unsupervised statistical learning technique called Principal Component Analysis to provide a novel approach to the analysis of Digital Forensic Internet history. The approach groups and analyses artefacts to produce a high-level context view of the timeline data. The paper proposes a Principal Component Analysis approach and the selection of the appropriate number of Principal Components is described using the Scree test method. A case study of the approach is shown, first using a simulated set of data test comprising of 820 Mozilla Internet History artefacts and then using a set of 5900 Internet Explorer history artefacts from real-world browser data. The results of the analysis are presented in a tabular format that provides an accessible overall view of the activity within the timeline. They show a promising approach to effectively and simply represent large quantities of timeline data at a high-level where basic patterns of usage can be determined. Further work on enhancing the proposed approach to include low-level pattern rules is discussed.

Item Type: Conference Proceedings
Title of Proceedings: PGNET Proceedings of the 15th Annual Postgraduate Symposium on the Convergence of Telecommunications, Networking and Broadcasting (PGNet 2014)
Additional Information: 15th Anniversary of the National Annual Symposium on the Convergence of Telecommunications, Networking and Broadcasting, 23-24 June 2014, Liverpool, UK
Uncontrolled Keywords: Digital forensics, statistical learning
Faculty / School / Research Centre / Research Group: Faculty of Engineering & Science > School of Computing & Mathematical Sciences (CMS)
Last Modified: 26 Nov 2020 22:35
URI: http://gala.gre.ac.uk/id/eprint/14948

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics