Visual Design of Emotional Expressions of Music Art on Mobile Devices
HouYihao1,*
LinZongzhe2
-
(College of Music, Guangxi Arts University, Nanning, 530022, China)
-
(Fielding School of Public Health, UCLA, California, 90024, US )
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Mobile terminal, Music visualization, Emotion recognition, Convolutional neural network
1. Introduction
Music is an indispensable part of human activities that can link rhythm and expression
content and generate emotions [1]. An important branch of information visualization is emotional expression, which
is one of music’s creative purposes. Visual design of emotional expression of music
art has become a new research hotspot [2]. At the same time, with the rapid development of electronic technology and information
technology, mobile terminal devices are used more and more widely. These terminals
not only can carry different mobile applications but also have more intelligent and
diversified functions with the development of communication technology and mobile
Internet. This makes the visual design of music emotion and its combination a potential
development trend.
The recognition of music’s emotion is the only way to realize the visual design of
music emotional expression. However, current researchers often use typical machine
learning methods such as support vector machines to classify music emotions. There
is a lack of unified standards for emotion recognition systems and difficulty of music
feature analysis, which hinder the further development of music emotion recognition
technology and visual design [3]. Therefore, this research presents a method of music visual design on mobile terminals
and innovatively applies a deep neural network to the recognition of music emotion.
The goal was to improve the accuracy of music emotion recognition and provide further
technical support for the realization of music visual design.
2. Related Works
Music emotion recognition technology is a key element in the design of visualization
of emotion expressions in music art, and by studying music emotion recognition models,
we can lay the necessary technical foundation for visualization design [4]. In current music emotion recognition, better extraction of emotion features in music
and improving the performance of emotion recognition classifiers are the main research
aspects. Deep learning has good feature extraction and recognition capabilities, and
research on neural networks provides a strong reference for the improvement of music
emotion recognition technology.
Liu et al. designed an interference recognition framework consisting of convolutional
neural networks and feature fusion. They obtained recognition input through preprocessing,
introduced a residual neural network to extract deep features, and used a fully connected
layer to output the recognition content of interference signals. The results showed
that it effectively reduced the loss of potential features and improved the generalization
ability to deal with uncertainty [5]. Xing’s team proposed a convolutional neural network-based interference recognition
framework in the field of fraudulent phone call recognition. A deep learning method
based on convolutional neural networks that learns call behavior and phone number
features was shown to improve classification accuracy [6].
Luo and other professionals combined an extreme learning machine with a deep convolutional
neural network when dealing with poor generalization performance and low accuracy
in finger vein recognition. They removed the fully connected layer from the deep convolutional
network and added an extreme learning layer to recognize the extracted feature vectors.
The results show that it can automatically extract finger vein features and reduce
the loss of valid information with high accuracy and generalization ability [7].
Pushokhina et al. proposed a recognition method based on optimal K-means and convolutional
neural networks for intelligent license plate recognition in traffic processes. This
method divides the license plate recognition process into three stages, namely license
plate detection, license plate image segmentation, and license plate number recognition.
The simulation results show that this method has high operational efficiency [8]. Nandankar and other scholars proposed a prediction model based on long short-term
memory networks for related pneumonia diseases, in order to achieve statistical analysis
of disease data. Use different hidden layers to process the data during the process.
The results show that the finely tuned LSTM model can accurately predict the relevant
results of pneumonia data [9].
Recurrent neural networks in various fields of recognition provide reference ideas
for musical emotion recognition. Taeseung’s team developed a skeleton-less gesture
signal detection algorithm for traffic control gesture recognition. It used recurrent
neural networks to process gesture time length variations mixed with noise and random
pauses as a way to recognize six gesture signals. The results showed that its accuracy
was as high as 91% [10].
Wu et al. combined a progressive scale expansion network and a convolutional floor
cabinet neural network to detect and identify video image content and obtained the
image serial number. The results showed that the recognition accuracy was 96% [11]. Bah and other researchers designed an end-to-end recognition system that used deep
residual networks for emotion and face expression. The test results on the FERGIT
dataset showed accuracy of 75% and 97% in classifying facial emotions [12]. Yang et al. developed a weighted hybrid deep neural network for automatic extraction
of facial expression recognition, which used the outputs of two channels to be fused
in a weighted manner. The final recognition results were calculated using the SoftMax
function. The test results showed that it was able to recognize six basic facial expressions
with high average accuracy of 92.3% [13].
Wang’s research team improved an artificial neural network with three layers. They
used solar radiation and temperature as inputs and five physical parameters of a single
diode model as outputs. The results showed that it was able to accurately recognize
current changes [14]. Yu et al. proposed a student emotion classification model based on a neural network
model to reduce the difficulty of understanding sentence text expression. The regularization
method is introduced into LSTM in the process, so that the output at any time has
a different correlation with the output from the previous time. The results indicate
that this model has better comprehensive performance than traditional models and can
correctly classify student emotions [15].
In summary, in practical tests applying convolutional neural networks and recurrent
neural networks to various fields, such as image feature extraction, face recognition,
and signal recognition, most researchers have improved them accordingly and achieved
high accuracy. But there is less research on the extraction and recognition of musical
emotion features. Therefore, this research is based on improving deep neural networks
for the recognition of musical emotion expressions, establishing a better design model
for the visualization of musical emotion expressions, and further developing the design
of music visualization on mobile terminals.
3. Visual Design of Emotional Expressions of Music Art on Mobile Terminals
3.1 Music Visualization Design and Emotion Modeling on Mobile Terminals
To realize music visualization design on mobile terminals, we first analyzed the characteristics
of music visualization design on mobile terminals. Music visualization design is presented
in visual form on the mobile terminal, which can realize a transformation from auditory
information to visual information and make the emotional and physical properties of
music (the two main aspects of music) convey corresponding music information. In practical
applications, the visual design of music is expressed in various ways. Graphics, text,
pictures, and videos are classified according to the types of visual elements [16]. To achieve better design results, we took functional objectives and design objectives
as the main service objects of visual design and took harmony, accuracy, readability,
and beauty as the macro design objectives.
Music visualization design on mobile terminals has similarities with conventional
data visualization, but music information has certain particularity, so it is more
special in terms of methods and processes [17]. Music visualization design should build conversion rules according to functional
objectives and specify the types of data to be converted. Music physical data types
generally include timbre, pitch, and intensity, while music emotional data sets a
fixed matching mode according to music content.
Based on the selected music data type, we match it with the appropriate visual representation
and formulate the final visual conversion rules. Then, we need to cooperate with the
interface and interaction design to embed the visualization into the whole application
and achieve a high degree of cooperation with the product in the two aspects of interaction
function and visual interface. Finally, it is necessary to make relevant instructions
for visual design to reduce the learning cost of users [18]. The proposed music visualization design strategy for mobile terminals is shown in
Fig. 1.
Fig. 1. Music visualization design strategy on mobile terminals.
Fig. 2. V-A emotion model diagram.
The visual design of music on mobile terminals poses significant challenges in practical
operation. Many musical visualizations are seen as mere works of art lacking more
systematic and scientific research. In the visualization of musical information, the
first task is to convert the information into a visual form that facilitates organization
and presentation. Dealing with information that lacks visual images requires making
an artificial visual image to establish a link between the visual image and the musical
message.
The visualization of emotional expressions in music requires an understanding of the
definition of and access to emotion in music [19]. The first step is to establish a common, systematic model of emotion in music by
defining and consistently quantifying the different emotional factors. The second
step is to collect relevant music data based on the established emotion model, analyze
the emotion characteristics, and finally output the emotion information results (i.e.,
music emotion recognition) [20].
The valence-arousal music emotion model proposed by Russeell was used, which represents
emotion states as points in a two-dimensional space containing activation (arousal)
and valence. The horizontal axis represents valence, and the vertical axis represents
activation. Valence reflects the degree of negative and positive emotion, with smaller
values indicating a higher degree of negative musical emotion, and vice versa. The
different discrete points in the V-A emotion model are obtained according to the relationship
between specific emotion and validity (horizontal axis) and activation (vertical axis).
For example, in the activation dimension, calm to energetic is used to indicate the
degree of excitement of individual emotions (that is, the intensity of emotions).
Positive emotions are excited emotions, negative emotions are calm emotions, and the
horizontal distance from the emotion to the origin represents the degree of calm or
excitement. The V-A emotional model is shown in Fig. 2.
The V-A two-dimensional space is mapped into four discrete categories: (-V+A), (+V+A),
(+V-A), and (-V-A). The four discrete categories correspond to the four typical emotions
contained in the emotion model to obtain the musical emotion categories. The relationships
corresponding to the four musical emotions are shown in Table 1.
Table 1. Correspondence of four categories of music emotion.
Category
|
Emotion
|
V-A value
|
The first kind of emotion
|
Happy
|
+V+A
|
The second kind of emotion
|
Anxious
|
-V+A
|
The third kind of emotion
|
Sentimental
|
-V-A
|
The fourth kind of emotion
|
Relaxed
|
+V-A
|
After confirming the music emotion model, the mel sound spectrum was chosen as the
sound spectrum feature. The feature extraction first requires a short-time Fourier
transform of the sound signal of the music [21]. Then, the frequencies on the amplitude spectrum are transformed by the Meier scale,
and the amplitude is transformed by the Meier filter to obtain a representation of
the Meier sound spectrum for all frames. Finally, the sound spectrum within the analysis
window length is spliced to obtain the corresponding mel sound spectrum.
To improve the efficiency of mining the emotional features of the music, a weighted
combination of the residual phase (RP) and mel frequency cepstral coefficient (MFCC)
is introduced. RP is the cosine of the phase function of the resolved signal derived
from the linearly predicted residuals in the music signal at the moment$t$ of the
music sample. It can be estimated as a linear combination of multiple samples from
the past [22]. The predicted music samples are shown in Eq. (1).
In Eq. (1),$p$ represents the order of the predicted moments, $\overset{\cdot }{s}(t)$ is the
predicted music sample,$s(t)$ represents the actual values, and$a_{k}$ represents
the set of linear prediction coefficients. The prediction error formula is shown in
Eq. (2).
In Eq. (2), $e(t)$ represents the prediction error. The prediction error is minimized to obtain
the linear prediction coefficient, which is the linear prediction residual of the
music signal. From this, the resolved signal is calculated as shown in Eq. (3).
In Eq. (3),$r(t)$ represents the linear prediction residual of the music signal, $r_{a}(t)$
is the resolved signal, $R_{h}(w)$ represents the Fourier transform of$r(t)$, $r_{h}(t)$
represents the Hilbert transform of $r(t)$, and$IFT$ is the inverse Fourier transform.
The resolved signal can be expressed as shown in Eq. (4).
There is much information related to musical emotion in the linear prediction residuals,
and it is beneficial to extract emotion-specific information from the musical signal
by calculating the residual phase, which is also known as the cosine of the phase
of the resolved signal, as shown in Eq. (5).
In Eq. (5), $\cos (\theta (t))$ represents the cosine of the phase of the resolved signal. The
RP features and MFCC features are weighted together to find the final output, thus
improving the model’s ability to extract sentiment features.
3.2 Deep Neural Network-based Music Emotion Recognition
After using the speech spectrogram as feature input, deep learning is used for music
emotion recognition. Deep learning can learn the relationship between high-level concepts
and underlying features from audio data and thus the difference between the emotional
semantics of the music and the features of the audio signal for the purpose of emotion
recognition [23]. Among the deep learning neural networks, recurrent neural networks (RNNs) and convolutional
neural networks (CNNs) have shown better ability in image synthesis features and time
series extraction [24]. A CNN can effectively identify the underlying patterns in the data and obtain more
abstract features by superimposing convolution kernels, while one-dimensional convolution
is often used for the analysis of sensor data or time series and is suitable for the
analysis of audio signal data [25]. The original audio signal is converted to an acoustic spectrum, which is represented
as a grey-scale picture, whose convolution is calculated as shown in Eq. (6).
In Eq. (6), $a_{i,j}$ represents the height and width of the feature map, $a_{i,j}$ is the activation
function of the convolution layer, $f_{h}$ is the height of the convolution kernel,
$b$ is the bias of the convolution,$f_{w}$ is the width of the convolution kernel,
$w$ is the weight matrix of the convolution kernel, and $x$ is the data input of the
convolution kernel. The frequency range in the sound spectrum is equal to the height
of the convolution kernel, and the convolution operation is shown in Eq. (7).
The output of the convolution kernel is represented by $R$. The convolution operation
is simplified as shown in Eq. (8).
In Eq. (8), $B$ represents the bias matrix. The width of the output of the convolution kernel
is shown in Eq. (9).
In Eq. (9), $t$ represents the width of the sound spectrum, $q$ is the size of the fill, and
$R_{w}$ is the width of $R$. Since the one-dimensional convolution only translates
in the time dimension of the sound spectrum, the dimension of the sound spectrum becomes
1 after convolution. A gated linear unit is then added, and the expression is shown
in Eq. (10).
In Eq. (10), $Conv1D_{1}$ and $Conv1D_{2}$ represent the one-dimensional convolution, both of
which are identical but do not share weights, $\sigma $ represents the sigmoid activation
function, $I$ represents the output, and$L$ represents the sound spectrum sequence
to be processed. A gated linear unit is added to form a one-dimensional gated convolution
unit, and then a residual structure is introduced to cope with the gradient disappearance
problem, as shown in Eq. (11).
In Eq. (11), $x_{l+1}$ represents the output of the$x_{l}$ layer of the network, and $F(x_{l},W_{l})$
is the network mapping (or the convolution operation in the case of convolutional
networks). The residual gated convolution unit is shown in Eq. (12).
The resulting residual gating unit also enables information to be transmitted over
multiple channels, as shown in Eq. (13).
Fig. 3. Basic network structure diagram of RNN.
The stacking of convolutional layers enables the extraction of more abstract acoustic
spectral features, but the music signal is ultimately temporal information and still
has a serial nature in the temporal dimension after conversion to a mel sound spectrum.
Therefore, it is combined with a recurrent neural network. In a recurrent neural network,
the output state of the hidden layer is related to the input at the current moment
and the state of the hidden layer at the previous moment with memory-like properties
[25]. The basic network structure is shown in Fig. 3.
The state of the hidden layer at step $i$ is expressed as $H_{i}$ and is calculated
as shown in Eq. (14).
In Eq. (14), $H_{i-1}$ represents the state of the hidden layer at the previous moment, $b$ represents
the bias term, $f(\cdot )$ represents the non-linear activation function (usually
$\tanh $), and $X_{i}$ represents the input of step $i$. The output of the network
at step $i$ is represented by $O_{i}$ and is calculated as shown in Eq. (15).
In Eq. (15), $U$ is the connection matrix, and$d$ is the bias term. When the sequence is too
long, it is difficult for the RNN to transfer information from the preceding time
step to the following time step, and the RNN will have a problem of gradient disappearance
in backpropagation, so a long short term memory (LSTM) network was added. The LSTM
structure is shown in Fig. 4.
Fig. 4. LSTM network structure unit diagram.
Fig. 5. Convolutional cyclic neural network music emotion recognition model diagram.
To better extract the multi-directional dependencies in the musical feature sequences
and closely match the way the brain perceives musical emotions, a bidirectional recurrent
neural network (BRNN) was used for the classification process of temporal features.
The BRNN takes into account both preceding and following inputs, and the final output
of the network is the sum of the reverse and forward at each step, as shown in Eq.
(16).
In Eq. (16), $\overset{\rightarrow }{H_{i}}$ represents the state of the hidden layer in the
bidirectional RNN. A convolutional bidirectional recurrent neural network (CBRNN)
was formed by combining a convolutional network based on a residual gated convolutional
structure with a bidirectional RNN, and a music emotion recognition model was established.
The process is that the sound spectrum is first learned by convolutional layers to
obtain a feature map containing high-level abstract features, and then the feature
map is expanded by time to obtain a convolutional feature sequence, which is then
fed into the BRNN to extract time-series features and do the final classification
process. The overall structure of the model is shown in Fig. 5.
4. Analysis of the Effectiveness of the Application of the Music Emotion Recognition
Model
To validate the application of the proposed CBRNN method, it was compared with four
commonly used music emotion recognition models, namely the K-nearest neighbor (KNN),
support vector machine (SVM), ensemble learning (EL), and sound acoustic emotion Gaussians
(AEG) models. Information on the experimental environment is shown in Table 2. The five methods were first used for experiments in two publicly available datasets:
the Sound-track dataset and Song’s dataset. The random identification experiments
were repeated 10 times in each dataset, and the results obtained are shown in Fig. 6.
Table 2. Information of experimental environment.
Index
|
Performance parameter
|
Operating system version
|
Android 11
|
System digits
|
System bit 64
|
Internal storage
|
8.00 GB
|
Processor
|
Snapdragon 865 CPU 2.84 GHz
|
Experimental platform
|
AI Benchmark v5.0.0
|
Time efficiency
|
82%~85%
|
Fig. 6(a) shows the music sentiment recognition results of SVM, KNN, EL, AEG, and the proposed
method in the Sound-track dataset, and Fig. 6(b) shows the sentiment recognition results of the five methods in Song’s dataset. The
results are presented in the form of accuracy. From Fig. 6(a), it can be seen that there is some fluctuation in the results of the 10 random tests
for all 5 methods in the Sound-track dataset. The SVM algorithm reached its highest
accuracy of 79% in the 8th recognition, and the lowest occurred for the first iteration
at 74%. The KNN algorithm’s accuracy ranged from 73% to 78%, the EL algorithm’s highest
accuracy was 80%, and the AEG algorithm’s accuracy was stable at around 84%. The accuracy
of the method proposed in this study remained above 88%, reaching a maximum of 90%.
Fig. 6. Experimental results of five methods in two open datasets.
As can be seen from Fig. 6(b), the accuracies of both SVM and KNN methods were relatively similar, both reaching
up to 79%. The EL algorithm and AEG algorithm reached 83% and 87%, respectively, while
the proposed CBRNN stabilized above 90% with the highest accuracy and showed the least
fluctuation. To fully evaluate the recognition performance of CBRNN, the accuracy,
recall, and F1 values were selected to be tested again in the selected Sound-track
dataset and Song’s dataset and compared with the other four methods. The results are
shown in Fig. 7.
Fig. 7. Comparison results of precision, recall, and f1 values of five methods.
In Fig. 7, for the Sound-track dataset, the performance of the two algorithms KNN and SVM is
similar. The three metrics of both EL and AEG are less different, while the accuracy
of CBRNN has the highest improvement of 21.31% compared to the other three methods.
In Song’s dataset, SVM had the worst performance, while CBRNN still had the highest
accuracy of the five methods with better performance. The five methods were then used
for experiments on the AMG1608 dataset. It has 1608 music clips and is the largest
continuous sentiment-based music database with sentiment labels in the V-A sentiment
space with generalized features. The results of the recognition error rate of the
five methods on the AMG1608 dataset are shown in Fig. 8.
Fig. 8. Error rate of music emotion recognition using five methods in the AMG1608 dataset.
Fig. 9. Accuracy of five methods for different music emotion recognition.
In Fig. 8, the error rate of the SVM algorithm fluctuates to different degrees as the number
of music data samples increases but eventually reaches a relatively stable value.
The error rate of the KNN algorithm fluctuates more than that of the SVM with a maximum
of 35% for up to 800 data samples and a minimum of 15-20% throughout the experiment.
The error rates of the EL and AEG methods are relatively similar and eventually stabilize
at around 15%.
The proposed CBRNN showed a more obvious decreasing trend in error rate before the
sample size reached 800. The error rate started to stabilize after 800 samples, basically
remaining around 10%, which was better than the other four methods. Finally, practical
validation was applied by selecting 4000 songs from a well-known music platform in
China, where 79% were Chinese songs and 21% were English songs. The song genres were
classified as four types: happy, sad, angry, and relaxed, and then the five methods
were used to identify their emotions. The results obtained are shown in Fig. 9.
Fig. 9 shows the emotion recognition results obtained by running each of the five methods
three times on the selected music dataset. Figs. 9(a)-(d) correspond to the recognition
accuracy results for the four emotion types of happy, sad, angry, and relaxed, respectively.
From Fig. 9(a), it can be seen that for the recognition of happy emotion, the accuracy of SVM and
KNN is below 90%, but both are higher than 84%. AEG and EL are slightly higher than
the first two methods, but CBRNN has the highest accuracy of 96%.
For the recognition of sad emotions, SVM showed the lowest value of 62%, while CBRNN
still had a high accuracy of up to 90%. In the recognition of anger emotions, the
difference in accuracy between the five methods for all three experiments was small,
with the proposed CBRNN obtaining the highest accuracy of 88%, a maximum improvement
of 13% compared to the other four methods. In the recognition of the relaxation category
of emotions, the highest accuracy of 95% was obtained by CBRNN, which is still higher
than the other four methods. In summary, the proposed CBRNN method has higher accuracy
and better performance in the recognition of the four emotion types and can be better
used for the recognition of musical emotions.
5. Conclusions
The development of information visualization has made the design of music visualization
on mobile terminals a current research hotspot in this field. To realize the visual
design of music emotion, it is crucial to establish a fast and accurate music emotion
recognition model. This research was based on the valence-arousal music emotion model
with a weighted combination of MFCC and RP for emotion feature extraction, and an
optimized convolutional neural network was combined with a recurrent neural network
and applied to emotion recognition. The experimental results show that the method
achieves an accuracy of up to 92% in 10 random recognitions in the Sound-track dataset
and Song’s dataset.
In the Sound-track dataset, the method achieved an accuracy improvement of up to 21.31%.
In the AMG1608music dataset, the error rate of the method started to plateau after
the sample size increased to 800 and remained around 10%. In the selected dataset
consisting of 4000 songs, the method was able to effectively identify the four emotion
types of relaxation, sadness, happiness, and anger with an accuracy of up to 96%,
providing superior performance. However, the improvement of the convolutional neural
network did not incorporate an attention mechanism during the study, which affected
the further improvement of the performance, so it needs to be explored further in
this area.
REFERENCES
Ma J, Du K, Zheng F, et al. A recognition method for cucumber diseases using leaf
symptom images based on deep convolutional neural network. Computers and Electronics
in Agriculture, 2018(154): 154-158.
Masayuki, Satoh. Cognitive and emotional processing in the brain of music. Japanese
Journal of Neuropsychology, 2018, 34(4): 274-288.
Liu G, Abolhasani M, Hang H. Disentangling effects of subjective and objective characteristics
of advertising music. European Journal of Marketing, 2022, 56(4): 1153-1183.
Ma Y. Research on the Arrangement and Visual Design of Aerobics under the New Situation.
International Core Journal of Engineering, 2019, 5(9): 170-173.
Liu S, Zhu C. Jamming Recognition Based on Feature Fusion and Convolutional Neural
Network. Journal of Beijing Institute of Technology, 2022, 31(2): 169-177.
Xing J,Wang, Shupeng D, Yu. Fraudulent phone call recognition method based on convolutional
neural network. High Technology Letters, 2020, v.26(04): 21-25.
Luo R, Zhang K. Research on Finger Vein Recognition Based on Improved Convolutional
Neural Network. International Journal of Social Science and Education Research, 2020,
3(4): 107-114.
Pustokhina I V, Pustokhin D A, Rodrigues J, Gupta D, Khanna A & Shankar K. Automatic
Vehicle License Plate Recognition Using Optimal K-Means with Convolutional Neural
Network for Intelligent Transportation Systems. IEEE Access, 2020, 8(12): 92907-92917.
Nandankar P V, Nalla A R, Gaddam R R, Gampala V, Kathiravan M & Karunakaran S. Early
prediction and analysis of corona pandemic outbreak using deep learning technique.
World Journal of Engineering, 2022, 19(4): 559-569.
Taeseung B, Yong-Gu L. Traffic control hand signal recognition using convolution and
recurrent neural networks. Journal of Computational Design and Engineering, 2022(2):
2-5.
Wu, Xing G, Yuxi Z, Qingfeng C & Liming. Text Recognition of Barcode Images under
Harsh Lighting Conditions. Wuhan University Journal of Natural Sciences, 2020, v.25;
No.134(06): 60-66.
Bah I, Yu X. Facial expression recognition using adapted residual based deep neural
network. Intelligence & Robotics, 2022, 2(1): 72-88.
Yang B, Cao J, Ni R & Zhang Y. Facial Expression Recognition Using Weighted Mixture
Deep Neural Network Based on Double-Channel Facial Images. IEEE Access, 2018, 6:4630-4640.
Wang S, Zhang Y, Zhang C & Yang M. Improved artificial neural network method for predicting
photovoltaic output performance. Global Energy Interconnection, 2021, 3(6): 553-561.
Yu H, Ji Y, Li Q. Student sentiment classification model based on GRU neural network
and TF-IDF algorithm. Journal of Intelligent & Fuzzy Systems: Applications in Engineering
and Technology, 2021(2): 40-45.
Malandrino D, Pirozzi D, Zaccagnino R. Visualization and music harmony: Design, implementation,
and evaluation[C]//2018 22nd International Conference Information Visualization (IV).
IEEE, 2018: 498-503.
Wu K, Rege M. Hibiki: A Graph Visualization of Asian Music[C]//2019 IEEE 20th International
Conference on Information Reuse and Integration for Data Science (IRI). IEEE, 2019:
291-294.
Alvarado K G. Accessibility of music festivals: a British perspective. International
Journal of Event and Festival Management, 2022, 13(2): 203-218.
Kim H R. Development of the Artwork Using Music Visualization Based on Sentiment Analysis
of Lyrics. The Journal of the Korea Contents Association, 2020, 20(10): 89-99.
Hizlisoy S, Yildirim S, Tufekci Z. Music emotion recognition using convolutional long
short term memory deep neural networks. Engineering Science and Technology, an International
Journal, 2021, 24(3): 760-767.
Mirzazadeh Z S, Hassan J B, Mansoori A. Assignment model with multi-objective linear
programming for allocating choice ranking using recurrent neural network. RAIRO -
Operations Research, 2021, 55(5): 3107-3119.
Chen T P, Lin C L, Fan K C, Lin W Y & Kao C W. Radar Automatic Target Recognition
Based on Real-Life HRRP of Ship Target by Using Convolutional Neural Network. Journal
of information science and engineering: JISE, 2021(4): 37-39.
Jindal N, Kaur H. Graphics Forgery Recognition using Deep Convolutional Neural Network
in Video for Trustworthiness. International journal of software innovation, 2020(4):
8-11.
Leonan E, Falqueto, José A & S. Oil Rig Recognition Using Convolutional Neural Network
on Sentinel-1 SAR Images. Geoscience and Remote Sensing Letters, IEEE, 2019, 16(8):
1329-1333.
Xu J, Lv H, Zhuang Z, Lu Z, Zou D & Qin W. Control Chart Pattern Recognition Method
Based on Improved One-dimensional Convolutional Neural Network-ScienceDirect. IFAC-PapersOnLine,
2019, 52(13): 1537-1542.
Yihao Hou earned a Bachelor's Degree in Keyboard Performance from Guangxi Arts
Institute in 1992. She has worked at the institute since 1994 and is currently an
Associate Professor and Head of the Piano Department. She has published an academic
monograph, authored a core journal paper, and led five research projects. Her focus
is on piano performance and teaching.
Zongzhe Lin completed a Bachelor's degree at George Mason University in 2020 after
studying there from 2015 to 2020. He went on to earn a Master's degree in Computer
Science from the same institution in 2022. Currently, he is pursuing studies in Data
Science at the University of California, Los Angeles, as of October 2024. Zongzhe
has published two academic papers, with his research focusing on advanced predictive
analytics in public health.