PhonEthic: advancing AI-powered defence against voice-based phishing
Voice-based phishing (vhishing) attacks, one of the most effective social engineering methods in modern cybercrime, have become increasingly widespread. In simple terms, vhishing is a type of attack in which scammers attempt to gain access to personal information through voice communication by impersonating a trustworthy person or organisation. These attacks can be carried out through phone calls, automated calls, voice messages, or malicious software. The primary motivation behind such attacks is financial gain. Scammers seek to prevent victims from making rational decisions and try to achieve their goal by creating a sense of urgency and pressure.
As one of the four use cases within the ITEA project VESTA, Turkish project partner Orion Innovation is developing a library and tool called PhonEthic. This is an audio analysis system designed to process audio files sent via email, voicemail, and other communication channels, and to perform various AI-based analyses to prevent voice-based phishing attacks. This work is being carried out in collaboration with Kocaeli University and Istanbul Technical University, and the results have been published in nine conference papers and a blueprint article: RansomTrack: A Hybrid Behavioral Analysis Framework for Ransomware Detection.
PhonEthic technology
PhonEthic processes incoming audio streams, recorded calls, or audio files through a multi-stage AI workflow. First, audio inputs from sources such as voicemails, email attachments, or live call streams are pre-processed and segmented. These segments are then analysed across four complementary research dimensions:
- First, it analyses whether the audio file is indeed an audio file. If a suspicious situation is detected, malware analysis modules evaluate whether the audio content is associated with malicious payloads or related attack vectors.
- Next, deepfake detection models examine whether the audio signal was synthetically generated or manipulated.
- Speech analysis modules identify the spoken language, accent, and emotional tone to detect inconsistencies or anomalies compared to expected caller profiles.
- Call disruption detection components evaluate signal quality and transmission flaws that may indicate fraud or tampering in VoIP communications.
Finally, PhonEthic combines the information obtained from these four dimensions to generate a comprehensive risk assessment score, enabling the early detection of voice-based phishing attempts and providing actionable alerts to users or security systems.
PhonEthic library
Voice-based phishing is a highly complex problem, involving many different areas of analysis. In this context, the PhonEthic library was created in which relevant studies are grouped into four dimensions:
- Audio deepfake detection
- Audio Deepfake Detection by using Machine and Deep Learning (https://ieeexplore.ieee.org/document/10323004/)
- Spoken language, accent and emotion recognition
- End-to-End Spoken Language Recognition Using Self-Attention Speech Models (https://ieeexplore.ieee.org/document/11095447/)
- Automatic Language Identification from Speech using Transformer-Based Models (https://ieeexplore.ieee.org/document/11017199/)
- Spoken Accent Detection in English using Audio-Based Transformer Models (https://ieeexplore.ieee.org/document/10773414/)
- Assessing Audio-Based Transformer Models for Speech Emotion Recognition (https://ieeexplore.ieee.org/document/10391313/)
- Call corruption detection
- Identifying Degradation in Voice over Internet Protocol (VoIP) Communications through Audio-Based Transformer Models (https://ieeexplore.ieee.org/document/10757222/)
- Degradation Detection on Voice over Internet Protocol (VoIP) Calls using Deep Learning Techniques (https://ieeexplore.ieee.org/document/10779216/)
- Malware detection
- The Recent Trends in Ransomware Detection and Behaviour Analysis (https://ieeexplore.ieee.org/document/10871663/)
- Dynamic Ransomware Analysis using CAPEv2 and Retrieval-Augmented Generation (https://ieeexplore.ieee.org/document/11206796/)
- RansomTrack: A Hybrid Behavioral Analysis Framework for Ransomware Detection (https://arxiv.org/abs/2604.08739)
The PhonEthic library and tool will be released as an open-source project and will continue to evolve through the development of new methods and technologies. The rapid advancement of generative AI is making both the offensive and defensive sides of voice-based phishing more complex, making the continuously development and implementation of new approaches essential.
This development of the Orion Innovation in the ITEA project VESTA is supported by TÜBİTAK.