The Development of a Deep Voice Model Implemented for Cyber Warfare

Main Article Content

Payap Sirinam
Prasong Praneetpolgrang

Abstract

     


The objectives of this research were 1) to study the implementations of a deep voice technology, 2) to investigate an appropriate model for the development of a deep voice model for Thai language use, 3) to analyze and evaluate the efficiency of the deep voice model after being used by the Internet users, and 4) to present guidelines for using the deep voice technology in the field of cyber warfare.


This study showed that the model for text-to-speech synthesis could be employed in the application of a deep voice technology (Voice Cloning using Deep Learning) in general work, especially in the case of English speech. However, the model had significant limitations including the requirements of the parallel data and the preparation time required for a Thai-language model training. In this work, the researcher had studied and presented guidelines to develop and apply deep voice models for cyber warfare using a GAN (Generative Adversarial Networks) model to overcome the limitations of human speech synthesis models from text sentences. The results revealed that the StarGAN-VC (StarGAN Voice Conversion) and CycleGAN-VC (CycleGAN Voice Conversion) models could be used to transform the voices of ordinary people into those of the targeted people such as politicians and national executives to create fake news in cyber warfare. Moreover, spoofed voices generated from this model could achieve a highest mean opinion score (MOS) of 3.59 and had the potential to deceive Internet users into believing that the spoofed voice was the voice of a real target. In most severe cases, 40% of the samples were duped by fake voices. These findings highlight and raise awareness of new threats that might affect national security in the future, and the search for ways to detect and prevent them.

Article Details

Section
บทความวิจัย (Research Articles)

References

Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 21-36.

Cooke, N. A. (2017). Posttruth, truthiness, and alternative facts: Information behavior and critical information consumption for a new age. The library quarterly, 87(3), 211-221.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2020).

Generative adversarial networks. Communications of the ACM, 63(11), 139-144.

Jia, Y., Zhang, Y., Weiss, R.J., Wang, Q., Shen, J., Ren, F., ...Wu, Y. (2018). Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis. NeurIPS, 1-11.

Kameoka, H., Kaneko, T., Tanaka, K., & Hojo, N. (2018). Stargan-vc: Non- parallel many-to-many voice conversion using star generative adversarial networks. In The Institute of Electrical and Electronics Engineers (Ed.), 2018 IEEE Workshop Spoken Language Technology Workshop (SLT) (p.266-273). IEEE

Kaneko, T., & Kameoka, H. (2018). Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks. In 26th European Signal Processing Conference (EUSIPCO) (p.2100-2104).

Karlsen, R., & Aalberg, T. (2021). Social Media and Trust in News: An Experimental Study of the Effect of Facebook on News Story Credibility. Digital Journalism, 1-17.

Kietzmann, J., Lee, L. W., McCarthy, I. P., & Kietzmann, T. C. (2020). Deepfakes: Trick or treat?.Business Horizons. 63(2), 135-146

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.

Li, J. (2018). Cyber security meets A.I.: A survey. Frontiers of Information Technology & Electronic Engineering, 19(12), 1462-1474.

Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech : an asr corpus based on public domain audio books. In The Institute of Electrical and Electronics Engineers (Ed.). 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (p.5206-5210).

Ren, Y., Hu, C., Tan, X., Qin, T., Zhao, S., Zhao, Z., ...Liu, T. (2021). FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Sindermann, C., Cooper, A., & Montag, C. (2020). A short review on susceptibility to falling for fake political news. Current opinion in psychology, 36, 44-48.

Streijl, R. C., Winkler, S., & Hands, D. S. (2016). Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives. Multimedia Systems, 22(2), 213-227.