| Title |
Deep Learning Models for Automatic Animation Generation and Active Learning |
| DOI |
https://doi.org/10.5573/IEIESPC.2025.14.6.776 |
| Keywords |
Scene planning; Automatic generation of animations; Active learning; Deep learning models |
| Abstract |
In response to the problem of automatic animation generation, designs a deep learning network structure that effectively avoids the problem of losing original temporal information in animations, and a multi-channel feature fusion mechanism for temporal enhancement that increases the effective utilization of mixed models in time-frequency domain characteristics. Regardless of any task in the field of acoustics, the spectrogram of sound signals also presents obvious discrimination visually. At present, there are also methods to directly convert the spectrum map into an image, and use some methods of image processing to act on the converted spectrum image, rather than learning and analyzing the original spectrum map matrix data. Experiments show that the performance of the model learned from the converted image is not as good as the performance of the model learned directly from the spectrum map matrix data. After all, the information loss caused by the conversion of the spectrum map matrix data into the image is more direct frequency loss. The CLDNN network structure model improves the word error rate index by 4% compared to LSTM based network structure models. At the same time, if multi-scale features are used, the word error rate index is also improved by 5%. Later, their team also achieved a 45% performance improvement on a 2000-hour large-scale speech search task, which proves that their proposed CLDNN network architecture has good learning ability and robustness in various data scales or environments. Innovative deep learning model, was proposed, which improved the classification accuracy of animation automatic generation tasks. Speech animation in animation signals, animation recognition models can be divided into the following three categories: frame based models, where frames contain too little animation information, and frames near connected animations have too high similarity; Animation based models require additional animation start and end time information; Utilize convolutional networks to learn powerful representations of animations, as well as a weight sharing multi-objective classifier and its loss function. |