Mobile QR Code QR CODE

REFERENCES

1 
A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.DOI
2 
Z. Dai, H. Liu, Q. V. Le, and M. Tan, “Coatnet: Marrying convolution and attention for all data sizes,” in Proc. Advances in Neural Information Processing Systems, 2021, vol. 34, pp. 3965-3977.DOI
3 
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in Proceedings of the International Conference on Machine Learning, 2021, pp. 10347-10357.DOI
4 
K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” arXiv preprint arXiv:2111.06377, 2021.DOI
5 
Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012-10022.DOI
6 
Z. Xia, X. Pan, S. Song, L. E. Li, and G. Huang, “Vision Transformer with Deformable Attention,” arXiv preprint arXiv:2201.00520, 2022.DOI
7 
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.DOI
8 
A. G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.DOI
9 
Z. Liu et al., “Swin Transformer V2: Scaling Up Capacity and Resolution,” arXiv preprint arXiv:2111.09883, 2021.DOI
10 
N. Ahmed and K. R. Natarajan T_ and Rao, “Discrete cosine transform,” IEEE Transactions on Computers, vol. 100, no. 1, pp. 90-93, 1974.DOI
11 
J. Shin and H. Kim, “RL-SPIHT: Reinforcement Learning based Adaptive Selection of Compression Ratio for 1-D SPIHT Algorithm,” IEEE Access, vol. 9, pp. 82485-82496, 2021.DOI
12 
H. Kim, A. No, and H.-J. Lee, “SPIHT Algorithm with Adaptive Selection of Compression Ratio Depending on DWT Coefficients,” IEEE Transactions on Multimedia, vol. 20, no. 12, pp. 3200-3211, Dec. 2018.DOI
13 
Y. Rao, W. Zhao, Z. Zhu, J. Lu, and J. Zhou, “Global filter networks for image classification,” in Proceedings of the Advances in Neural Information Processing Systems, 2021, vol. 34.DOI
14 
K. Xu, M. Qin, F. Sun, Y. Wang, Y.-K. Chen, and F. Ren, “Learning in the Frequency Domain,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 1740-1749.DOI
15 
X. Shen et al., “DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2021, pp. 8720-8729.DOI
16 
C. Scribano, G. Franchini, M. Prato, and M. Bertogna, “DCT-Former: Efficient Self-Attention with Discrete Cosine Transform,” arXiv preprint arXiv:2203.01178, 2022.DOI
17 
A. Krizhevsky, G. Hinton, and others, “Learning multiple layers of features from tiny images,” 2009.URL
18 
Y. Le and X. S. Yang, “Tiny ImageNet Visual Recognition Challenge,” 2015.URL
19 
J. Choi, D. Chun, H. Kim, and H.-J. Lee, “Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 502-511.DOI
20 
G. K. Wallace, “The JPEG still picture compression standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992.DOI
21 
A. Vaswani et al., “Attention is all you need,” in Proc. Advances in Neural Information Processing Systems, 2017, vol. 30.DOI
22 
S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM Computing Surveys (CSUR), 2021.DOI
23 
NVIDIA, P. Vingelmann, and F. H. P. Fitzek, CUDA, release: 10.2.89. 2020. [Online]. Available:URL
24 
R. Wightman, PyTorch Image Models. GitHub, 2019. doi: 10.5281/zenodo.4414861.DOI
25 
M. Ehrlich, L. Davis, S.-N. Lim, and A. Shrivastava, “Quantization Guided JPEG Artifact Correction,” 2020.DOI
26 
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.DOI
27 
M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in Proceedings of the International conference on machine learning. PMLR, 2019, pp. 6105-6114.DOI
28 
M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in Proceedings of the International Conference on Machine Learning. PMLR, 2021, pp. 10 096-10 106.DOI
29 
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” vol. 28, 2015.DOI
30 
J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2017, pp. 764-773DOI
31 
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.DOI