Title |
Discrete Cosine Transformed Images Are Easy to Recognize in Vision Transformers |
Authors |
(Jongho Lee) ; (Hyun Kim) |
DOI |
https://doi.org/10.5573/IEIESPC.2023.12.1.48 |
Keywords |
Computer vision; Image classification; Deep learning; Discrete cosine transform (DCT); Vision transformer |
Abstract |
Deep learning models for image classification with adequate parameters show excellent classification performance because they can effectively extract the features of input images. On the other hand, there is a limit to the abilities of deep learning models to interpret images using only spatial information because an image is a signal with great spatial redundancy. Therefore, in this study, the discrete cosine transform was applied to an input image in units of an N×N block size to allow the deep learning model to employ both frequency and spatial information. The proposed method was implemented and verified by selecting a vision transformer using a 16×16 nonoverlapping patch as a baseline and training various datasets of Cifar-10, Cifar-100, and Tiny- ImageNet from the very beginning without pre-trained weights. The experimental results showed that the top-1 accuracy is improved by approximately 3-5% for every dataset with little increase in computational cost. |