Skip navigation

Transformer-based identification of stochastic information cascades in social networks using text and image similarity

Transformer-based identification of stochastic information cascades in social networks using text and image similarity

Kasnesis, Panagiotis, Heartfield, Ryan, Liang, Xing, Toumanidis, Lazaros, Sakellari, Georgia ORCID: 0000-0001-7238-8700, Patrikakis, Charalampos and Loukas, George ORCID: 0000-0003-3559-5182 (2021) Transformer-based identification of stochastic information cascades in social networks using text and image similarity. Applied Soft Computing:107413. ISSN 1568-4946 (In Press) (doi:https://doi.org/10.1016/j.asoc.2021.107413)

[img] PDF (Author Accepted Manuscript)
32167 LOUKAS_Transformer-Based_Identification_(AAM)_2021.pdf - Accepted Version
Restricted to Registered users only until 22 April 2022.
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (9MB) | Request a copy

Abstract

Identifying the origin of information posted on social media and how this may have changed over time can be very helpful to users in determining whether they trust it or not. This currently requires disproportionate effort for the average social media user, who instead has to rely on fact-checkers or other intermediaries to identify information provenance for them. We show that it is possible to disintermediate this process by providing an automated mechanism for determining the information cascade where a post belongs. We employ a transformer-based language model as well as pretrained ResNet50 model for image similarity, to decide whether two posts are sufficiently similar to belong to the same cascade. By using semantic similarity, as well as image in addition to text, we increase accuracy where there is no explicit diffusion of reshares. In a new dataset of 1,200 news items on Twitter, our approach is able to increase clustering performance above 7% and 4.5% for the validation and test sets respectively over the previous state of the art. Moreover, we employ a probabilistic subsampling mechanism, reducing significantly cascade creation time without affecting the performance of large-scale semantic text analysis and the quality of information cascade generation. We have implemented a prototype that offers this new functionality to the user and have deployed it in our own instance of social media platform Mastodon.

Item Type: Article
Uncontrolled Keywords: Information cascade; Semantic textual similarity; Image similarity; Deep learning
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty / Department / Research Group: Faculty of Liberal Arts & Sciences
Faculty of Liberal Arts & Sciences > Internet of Things and Security (ISEC)
Faculty of Liberal Arts & Sciences > School of Computing & Mathematical Sciences (CAM)
Last Modified: 24 Apr 2021 23:54
Selected for GREAT 2016: None
Selected for GREAT 2017: None
Selected for GREAT 2018: None
Selected for GREAT 2019: None
Selected for REF2021: None
URI: http://gala.gre.ac.uk/id/eprint/32167

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics