Mobile QR Code
Title Discrete Cosine Transformed Images Are Easy to Recognize in Vision Transformers
Authors (Jongho Lee) ; (Hyun Kim)
Page pp.48-54
ISSN 2287-5255
Keywords Computer vision; Image classification; Deep learning; Discrete cosine transform (DCT); Vision transformer
Abstract Deep learning models for image classification with adequate parameters show excellent classification performance because they can effectively extract the features of input images. On the other hand, there is a limit to the abilities of deep learning models to interpret images using only spatial information because an image is a signal with great spatial redundancy. Therefore, in this study, the discrete cosine transform was applied to an input image in units of an N×N block size to allow the deep learning model to employ both frequency and spatial information. The proposed method was implemented and verified by selecting a vision transformer using a 16×16 nonoverlapping patch as a baseline and training various datasets of Cifar-10, Cifar-100, and Tiny- ImageNet from the very beginning without pre-trained weights. The experimental results showed that the top-1 accuracy is improved by approximately 3-5% for every dataset with little increase in computational cost.