Mobile QR Code QR CODE
Title Research on Vocal Information Processing using a Main Melody Extraction Algorithm
Authors (Shengnan Liu) ; (Xu Wang)
DOI https://doi.org/10.5573/IEIESPC.2024.13.4.322
Page pp.322-327
ISSN 2287-5255
Keywords Accuracy; Conditional random field; Main melody extraction; Vocal information
Abstract Precise extraction of the main melody from polyphonic music is a critical challenge in vocal information processing. This paper starts with a brief introduction to extracting vocal music information features. Two distinct feature types were selected: the Mel-frequency cepstral coefficient (MFCC) and chroma. An innovative main melody extraction algorithm was then developed using a convolutional neural network (CNN) and conditional random field (CRF). The performance of the algorithm was validated on datasets. The main melody extraction effects were improved significantly using MFCC and chroma as inputs to the CNN-CRF algorithm for feature extraction. The algorithm achieved an overall accuracy (OA) of 86.72% and a voicing false alarm (VFA) of 6.84% on the ADC2004 dataset. On the MIREX05 dataset, the algorithm attained an OA and VFA of 85.21% and 11.16%, respectively. The algorithm exhibited pronounced enhancement when being tested on the MIREX05 dataset, and chroma played a notable role in enhancing the raw chroma accuracy. This algorithm also performed better than the SegNet and FTANet algorithms.