Title |
A 1D CNN-LSTM using Wav2vec 2.0 for Violent Scene Discrimination |
Authors |
(Huiyong Bak) ; (Sangmin Lee) |
DOI |
https://doi.org/10.5573/IEIESPC.2022.11.2.92 |
Keywords |
Violent scene discrimination; Wav2vec 2.0; Audio signal processing |
Abstract |
In this paper, an effective system for discriminating violent scenes in movies from audio signals alone is proposed. The technology for automatic discrimination of violent scenes is one of the most crucial aspects of media filtering, protecting users from undesired media. Previous studies have conducted violent scene discrimination using a mel spectrogram and 2D convolutional neural networks (CNNs); however, the mel spectrogram cannot extract mutual information from audio, and 2D CNNs are unsuitable for audio. Therefore, these models do not yield good performance. The system proposed in this paper extracts audio features by using Wav2vec 2.0, which can extract mutual information from audio. The features of the extracted audio are inputted to a 1D CNN and long short-term memory (LSTM), which are algorithms suitable for audio, and violent scenes are discriminated through fully connected and softmax layers. To evaluate the proposed system, violent scenes are discriminated using the Violent Movie Scenes Dataset (VMD). As a result, the accuracy of the proposed system when discriminating violent scenes is 96.25%, providing better performance than in previous studies. |