Title |
Siamese Feedback Network for Visual Object Tracking |
Authors |
(Mi-Gyeong Gwon) ; (Jinhee Kim) ; (Gi-Mun Um) ; (HeeKyung Lee) ; (Jeongil Seo) ; (Seong Yong Lim) ; (Seung-Jun Yang) ; (Wonjun Kim) |
DOI |
https://doi.org/10.5573/IEIESPC.2021.11.1.24 |
Keywords |
Visual object tracking; Siamese feedback network; Target-relevant features |
Abstract |
Visual object tracking, one of the main topics in computer vision, aims to chase a target object in every frame of the video sequences. In particular, Siamese-based network architectures have been adopted widely for visual object tracking due to their correlation-based nature. On the other hand, the features encoded from the target template and the search image in Siamese branches still suffer from ambiguities, which are driven by complicated real-world environments, e.g., occlusions and rotations. This paper proposes the Siamese feedback network for robust object tracking. The key idea of the proposed method is to encode target-relevant features accurately via the feedback block, which is defined by a combination of attention and refinement modules. Specifically, interdependent features are extracted through self- and cross-attention operations. Subsequently, such re-calibrated features are refined in both spatial and channel-wise manner. Those are fed back to the input of the feedback block again via the feedback loop. This is desirable because the high-level semantic information guides the feedback block to learn more meaningful properties of the target object and its surroundings. The experimental results show that the proposed method outperforms the state-of-the-art Siamese-based methods with a gain of 0.72% and 1.69% for the expected average overlap on the VOT2016 and VOT2018 datasets, respectively. Overall, the proposed method is effective for visual object tracking, even with complicated real-world scenarios. |