Research on Tracking Athletes and Trajectory Analysis in Skiing with YOLO Algorithm
and Kalman Filter
Xiaoguo Chang1
Wei Gao1*
-
(Harbin Institute of Finance, Harbin 150030, China)
Copyright © 2026 The Institute of Electronics and Information Engineers(IEIE)
Keywords
YOLO, Kalman filter, Halfpipe snowboard, Target tracking
1. Introduction
Moving object detection has always been a popular field in computer vision. It has
been widely used in automatic driving, intelligent transportation and military affairs.
Target detection is divided into traditional feature-based and deep learning methods,
which are more popular nowadays.
The task of object detection is to extract and classify the features of the input
image and output the frame and category information of all labelled objects. Since
the 1990s, its development has included two waves in the past few decades. As early
as the first end-to-end object detection algorithm was proposed, with the continuous
progress of computing devices, object detection technology based on Convolutional
Neural Networks has developed rapidly. The traditional target detection method selects
the candidate region of the target to be detected, selects the appropriate feature
extraction method (including SIFT, HOG and other feature operators) through the sliding
window, and then selects the classification according to the target classifier such
as SVM, AdaBoost and other classifiers. With the continuous development of convolutional
neural networks, deep convolutional neural networks have been applied to target detection
and have incomparable advantages over traditional convolutional neural networks. Due
to the continuous changes in application scenarios, target detection algorithms based
on dual-mode feature fusion appear, and these detectors gradually achieve higher and
higher accuracy and efficiency.
At present, there are two main types of target detection in deep learning. One is
the Two-stage target detection algorithm [1], including R-CNN Fast-RCNN, Faster-RCNN and other algorithms. This kind of algorithm
generates a large number of candidate regions that may contain targets from the image
to be detected through selective search, and uses convolutional neural networks to
extract features in the candidate regions, so as to carry out target classification
and detection. Another One-stage target detection algorithm, including YOLO and SSD
algorithms. This kind of algorithm does not need the candidate region, classifies
or extracts the target through the end-to-end target detection method, and directly
extracts the feature information through the convolutional neural network to classify
and recognize the target, so it has a faster detection speed than the two-stage target
detection algorithm. For occasions with real-time requirements, not only the speed
is faster, but also has a higher accuracy.
In snowboard U-shaped field sports, athletes’ skill movements are divided into basic
gliding movements, fancy movements in the air, and grasping board movements in the
air [2]. Using YOLO algorithm to detect and track athletes and combined with Kalman filtering
method to give athletes’ trajectory has high practicability, it can optimize athletes’
lack of movement skills in training or mistakes in competition and correct them in
time, and coaches can better train and guide athletes’ skills through this method
at the same time.
The innovative points of this article in the tracking and trajectory analysis of skiers
mainly include:
(1) A hybrid tracking framework that combines the YOLO object detection algorithm
in deep learning with the classical Kalman filtering algorithm is proposed. It leverages
the advantages of the YOLO algorithm in quickly and accurately detecting athlete positions
in complex backgrounds and uses Kalman filtering to smooth the detection results and
predict future positions, effectively solving tracking loss problems caused by occlusion,
rapid movement, and other factors.
(2) A dynamic adjustment strategy for Kalman filter parameters based on changes in
the motion state is proposed. It can automatically adjust key parameters such as noise
covariance matrix during the filtering process based on real-time data such as athlete’s
speed and acceleration, further improving the accuracy and robustness of trajectory
prediction.
(3) A multidimensional trajectory analysis and performance evaluation system is constructed.
Through in-depth analysis of athletes’ trajectory data, key performance indicators
such as speed, acceleration, turning radius, and stability can be evaluated, providing
a scientific basis for athletes’ technical improvement and training plan formulation.
2. Target Detection Based on YOLO Algorithm
2.1. YOLO Algorithm
Compared with the previous version, YOLOv5 has improved detection accuracy and speed,
and provides multiple different versions of the network to meet the needs of different
application scenarios. The YOLOv5 algorithm includes five different versions, namely
YOLOv5-n, YOLOv5-s, YOLOv5-m, YOLOv5-l, YOLOv5-x, implemented by the PyTorch framework
of deep learning. The YOLOv5-n model is the shallowest network. In the later versions,
the network is gradually deepened, the channel is widened, the number of parameters
becomes more, and the reasoning speed drops. The network structure adopted by YOLOv5-s
is relatively simple, and the reasoning speed of the s model is very fast, but the
accuracy is lower than that of other versions, so it is suitable for scenarios that
require real-time detection.
2.1.1 CSP1 module
The CSP1 module is used in the backbone network to learn residual features. The module
structure is divided into two branches, one is through a series of operations such
as convolution and residual, and the residual features transmit information through
cross-layer direct connection to extract higher-level abstract features; The other
branch only performs a Conv operation, and finally the two branches are concatted
for subsequent operations, which increases the nonlinear expression ability of the
network.
2.1.2 CSP2 module
CSP2 module is applied to the feature fusion part. CSP2 is similar to CSP1, which
is also composed of a residual block and two convolution layers. First, the input
feature map is divided into two branches along the channel dimension, one is the traditional
convolution integral branch, the other is halved by CBL, and then through n CBLs,
the resulting feature map is connected with the original input residual, specifically,
the CSP2 module further optimizes the structure on the basis of the CSP1 module, and
improves the detection performance and calculation efficiency of the model.
2.1.3 SPPF module
SPPF is an improved SPP module that reduces runtime. The SPPF module uses three 5
× 5 maximum pooling operations, 5 × 5 convolution is used to increase the receptive
field, and multiple small-sized pooling kernels are cascaded to fuse the feature maps
of different receptive fields and improve the expression ability of features. Compared
with SPP, SPPF not only further enhances the expression ability of feature maps, but
also improves the running speed.
For different data sets, the YOLO algorithm will have initial default length and width
anchor boxes to predict bounding boxes, and these bounding boxes will be used to mark
the targets in the video to be detected. Each bounding box predicts 4 coordinates:
the center coordinate of the bounding box and the width and height of the bounding
box. Then in network training, different bounding boxes are adapted to different training
sets [3,
4]. Fig. 1 shows the feature extraction of the underground detection model of the ski resort.
In the replicated U-shaped ski resort underground, a corresponding dataset is established,
and the YOLO algorithm will provide the best bounding box to detect the corresponding
target. The model first takes photos and captures the athlete’s movement trajectory
during skiing, and then constructs point cloud features. The obtained point cloud
features are fused and fed into the Pill feature extraction network. The features
are then filtered using principal component analysis, focusing on the ROI region of
interest. Finally, multimodal features are fused to obtain the detection results of
the athlete’s movement trajectory.
Fig. 1. Feature extraction of SKI resort underground detection model.
2.2. YOLOv5s Overall Structure
Global average pooling is an effective baseline method for extracting richer feature
information by preserving global context information. GAP is applied to semantic segmentation
to preserve context information. A multi-perceptual feature extraction module is constructed
by deconvolution and GAP [5]. The global context information and the sub-region information of the context help
to better distinguish the target categories. Fig. 2 shows the overall structural design of skiing YOLOv5s. YOLOv5s consists of a multi-scale
pooling module UIE and a segmented attention module DNL connected in series. The model
extracts feature separately in a parallel manner, fuses the features in the DNL section,
and then sends them separately to the Detect Head to output feature results. This
approach can extract global information from the features and combine global and local
information.
Within the Multi-Path Module (MPM), the input is bifurcated, with the first pathway
leveraging a 3 × 3 depth-wise convolution (DWConv) to expand the receptive field and
enhance contextual information capture [6,
7]. Addressing the limitations of traditional GAP, the MPM incorporates three tailored
GAP operations, targeting distinct sub-regions (1 × 1, 3 × 3, 6 × 6) and generating
feature maps [X1, X2, X3] that capture varied spatial characteristics. Channel-wise
execution ensures specific information retention. A novel fusion strategy is then
implemented to harness the complementary nature of spatial and channel features.
Specifically, a 1 × 1 pointwise convolution (PWConv) integrates diverse spatial features,
enhancing the module’s representational power. The output then passes through batch
normalization (BN) and a ReLU activation for non-linearity and improved discrimination.
This spatial and channel-wise feature fusion approach enhances input data representation,
boosting network performance [8]. Feature maps of different sizes are resized to match the input using bilinear interpolation,
minimizing information loss. Channel counts in sub-regions are aligned with input
channels to determine global feature significance. These refined feature maps are
then concatenated with other multi-scale features, preserving original inputs while
incorporating multi-scale info. Attention mechanisms accentuate key features, downplay
minor details, and highlight semantic regions of interest [9,
10]. We introduce the Split-and-Concat (SPC) module to underpin the Scale-Attentive Module
(SAM) construction. The SPC module partitions the input feature map into multiple
groups, denoted as [X0, X1, ..., XN-1], where each group comprises a reduced number
of channels, specifically C′ = C/N. Each group is then processed with multi-scale
convolutional kernels, whose sizes increase systematically within each group: 1, 3,
5, and 7. This approach enables the model to capture features at various scales, enhancing
its capacity to represent complex and nuanced information. The receptive fields and
the extracted context information differ in generating feature maps with different
resolutions. Shuffle attention (SA) is introduced to weigh the features, making full
use of spatial and channel feature information while reducing the parameter amount
of the model. Each group of features is first divided into sub-feature groups according
to the channel dimension; the number of channels is C ’/G, and the sub-feature groups
are divided into two branches. The unit extracts channel attention ‘1 × K and spatial
attention’ 2XK respectively, and the calculation process is shown in Formulas (1)-(3):
Fig. 2. Overall structure design of ski sports YOLOv5s.
Weighted features of subgroups were obtained by splicing according to channel dimensions.
The feature information of each group is linked together to generate a feature map
with pixel-level attention in the global range. Then the weight vector is obtained
by using the weight module [11,
12]. Weights are recalculated using SoftMax, and the new weights which can be represented
by formula (4):
Multiply the weight of each group with the feature Fi obtained after SA, and the output
is denoted as Yi, which can be expressed by Eq. (5):
The YOLOv5s network first performs the slicing operation of the feature map through
the Focus module, then performs the Concatenate mosaic operation, and then performs
the feature extraction through the CBL module, and then through the subsequent multiple
CBL modules and residual units, extract the features of the input image, and then
perform an Up sampling operation on the feature map, and then perform the mosaic of
the feature map to obtain three different size prediction results [13,
14]. The snowboarder is a small target relative to the U-shaped ski field, and the YOLO
algorithm can achieve good stability and high accuracy for small target detection
in the underground of the replica U-shaped ski field.
3. Tracking Athletes and Trajectory Analysis in Skiing Based on YOLO Algorithm and
Kalman Filter
3.1. Kalman Filter Principle
The Kalman filtering method calculates the current state value according to the estimated
value of the previous state and the observed value of the current state, and realizes
the optimal estimation of the state quantity. The system state is a set of the smallest
parameters that summarize the effects of all past inputs and perturbations of the
system on the system. Knowing the state of the system can determine the entire behavior
of the system together with future inputs and system perturbations [15,
16]. The principle of Kalman filtering is as follows:
The previous state prediction value predicts the current state value as shown in Eq.
(6), and then predicts the variance as shown in Eq. (7).
The gain matrix is calculated according to the variance prediction and the state prediction
values as shown in Eq. (8), and the newly updated data series, i.e., the innovation series, is calculated as
shown in Eq. (9).
Thereafter, the state estimate at time K is obtained as shown in Eq. (10). Updated variance estimates are shown in Eq. (11).
3.2. Feature Fusion Module
Deeper features can extract richer semantic information, but the image resolution
is lower. The feature maps are recorded as C1, C2, C3, C4, C5, and the sizes are 208
× 208, 104 × 104, 52 × 52, 26 × 26, 13 × 13 [17]. The algorithm in this chapter regards C3′ as the main branch of the prediction of
medium and small targets. On the basis of FPN, C2 layer is introduced. After down-sampling,
C2 layer is spliced with C3 [18,
19]. On this basis, C1 layer is added, C1 is up-sampled and spliced with C2 to get a
new feature map C2′, and C2′ is continuously up-sampled and spliced with C3. Subsequent
experiments show that the introduction of C1 and C2 makes the infrared feature fusion
effect reach the best.
Fig. 3 shows the distribution of semantic information of features. Input C5 into GCIAM to
obtain a feature map C5 with a size of 13 × 13 [20,
21]. This feature map is used as a branch for detecting large targets. C4′ is obtained
after down-sampling and splicing with C4, and then spliced with C3 after continuous
down-sampling. The detection head consists of 1 × 1 convolution and 3 × 3 convolution.
First, the number of channels is adjusted by 1 × 1 convolution, and the features after
FFM output are integrated by 3 × 3 convolution.
Weighted average fusion, if there are two eigenvectors, then weighted average fusion
can be expressed as shown in Eq. (12).
Concatenation is to join two or more eigenvectors end to end to form a longer eigenvector,
as shown in Eq. (13).
Maximum/minimum value fusion, this method selects the maximum or minimum value of
the corresponding position in the two feature vectors as the fused feature, as shown
in Eq. (14) and (15).
Dot product fusion computes the dot product of two eigenvectors and may treat it as
a single eigenvalue, as shown in Eq. (16).
The bilinear model is a more complex fusion method, which first performs a linear
transformation on each eigenvector, and then calculates the outer product of the transformed
features, as shown in Eq. (17).
Attention mechanisms allow the model to dynamically determine the importance of different
features. In fusion, it uses an attention weight vector to weight the features, as
shown in Eq. (18).
Fig. 3. Semantic information distribution of features.
The multilayer perceptron can be seen as a general function approximator that can
learn how to fuse features, as shown in Eq. (19).
3.3. Establishment of Trajectory Detection Model
YOLO algorithm can treat skiers as targets in images and perform rapid detection.
YOLO model is trained to identify skier characteristics, such as clothing colour,
body type, etc. [22,
23]. Athletes move very fast in ski races, so a method that can respond quickly and update
detection results in real-time is needed. The YOLO algorithm fits this requirement,
as it can simultaneously predict multiple bounding boxes and category probabilities
in a single forward pass, greatly speeding up detection [24]. By adjusting the structure and parameters of the YOLO model, the accuracy of skier
detection can be improved. For example, the depth or width of the model can be increased,
more complex feature extractors can be used, or the diversity of the training dataset
can be increased.
The updated state estimate and updated covariance formulas are shown in Eqs. (20) and (21).
Kalman filter is an optimal estimation method based on the state equation of linear
system. In skiers tracking, the position and speed of skiers can be taken as state
variables, and these state variables can be estimated optimally by Kalman filter algorithm
[25]. In practice, due to the influence of sensor noise, image noise and other factors,
the directly observed position information of athletes may have errors. Kalman filtering
can reduce the impact of these noises on the results by weighted fusion of observed
and predicted values. Kalman filtering can not only provide the athlete’s position
estimation at the current moment, but also predict the athlete’s position at the future
moment. Through the continuous application of Kalman filter algorithm, a smooth trajectory
can be obtained.
Combining the YOLO algorithm and Kalman filtering to establish a skier tracking and
motion trajectory detection model, as shown in Fig. 4, the detection steps are as follows:
Initial detection: Using the YOLO algorithm to perform initial detection of skiers
and obtain the initial position information of athletes.
Tracking initialization: The player position information detected by the YOLO algorithm
is used as the initial state input of the Kalman filter.
Continuous tracking: In each time step, the athlete in the current frame is first
detected using the YOLO algorithm, and the detection results are fused with the prediction
results of the Kalman filter. Then, the state of the Kalman filter is updated using
the fused results, and the athlete’s position for the next step is predicted.
Trajectory construction: The position information of athletes obtained from continuous
tracking is connected to form a smooth motion trajectory.
Fig. 4. Combining YOLO algorithm and Kalman filter histogram.
Model Optimization: According to actual application scenarios and requirements, the
parameters of the YOLO algorithm and Kalman filter can be adjusted and optimized to
improve the detection accuracy, tracking stability and real-time performance of the
model.
Combining the YOLO algorithm and Kalman filter technology can construct an efficient
and accurate skier tracking and trajectory detection model. The model can work well
in situations with high real-time requirements and adapt to the requirements of athlete
tracking in various complex scenes.
4. Experimental Results and Analysis
4.1. Experimental Model Establishment
Snowboard U-shaped ski competition requires athletes to use a ski to take off through
the run-up slope in the specified U-shaped field, complete difficult movements in
the air, create the maximum vertical distance and complete high-quality aerial swivels
and other movements. There are two types of existing U-shaped sites, which are divided
into horizontal slope and non-horizontal slope. Fig. 5 shows the broken line of sports performance under the actual standard competition
field. In this experiment, the U-shaped field without horizontal track is used for
data collection to realize the target detection of athletes and track and characterize
the trajectory of athletes. Through the domestic and foreign standard U-shaped snowboard
ski field research and analysis, and then the actual replica model building. By scaling
the size parameters of the standard competition field, we can get the snowboard U-shaped
field under the laboratory conditions, in order to reproduce the sports performance
of athletes under the actual standard competition field under the laboratory model
to the greatest extent. The U-shaped site under this experimental condition uses an
adjustable angle elevator as the main structure of the model, as shown in Fig. 6. Its slope is adjustable and can meet the test and analysis under different conditions.
The site is 1.2 M long and 20CM wide, and the slope adopts the international standard
18? as the experimental condition.
Fig. 5. Athletic performance polyline under actual standard playing field.
Fig. 6. Structure distribution of adjustable angle lifter body.
4.2. Experimental Dataset
According to the actual training scene and competition situation, the data set required
by the U-shaped snowboard sports target detection scene under laboratory conditions
is established, and the author chooses to build the data set himself. The parameter
distribution of the data set is shown in Fig. 7. By replacing the player with a sphere in a replica U-shaped field, the relevant
data set is established. A 12-megapixel f/1. 8 aperture CMOS wide-angle lens is mainly
used to shoot the ball multiple times in different scenes. For the video obtained,
a frame of picture is taken every 0.3 S as a data set picture. The test ball has a
diameter of 40.00 mm and a weight of 2.53 grams. The data set consists of 1000 pictures,
and the accuracy of the algorithm in the snowboard U-shaped field is tested through
ten-fold cross-validation. Nine data sets are used as the training set, and one data
set is used as the test set for network training.
Fig. 7. Parameter distribution of ski motion target detection scene dataset.
4.3. Training Settings
The corresponding dataset is established, and the cross-validation method is used
to train the athletes in U-shaped ski resorts using the YOLO network model under laboratory
conditions to detect the targets of the athletes under the U-shaped ski resorts. The
experiment is on the Ubuntu 16.04 platform, using the open-source deep learning framework
Pytorch, through a server equipped with NVidiaRTX2080ti graphics card for model training,
in which the YOLO algorithm uses Mosaic data enhancement method to rotate, translate,
zoom and other methods of the image to increase the data set, to improve the robustness
of the network, the feature distribution is shown in Fig. 8.
Fig. 8. Mosaic data enhancement feature distribution.
4.4. Partial Test Result
In this paper, the YOLO network is used to detect the target of simulating athletes
with spheres in multiple videos with different shooting angles, and the accurate bounding
box can be obtained; At the same time, Kalman filter is used to accurately draw the
trajectory of the ball in the replica U-shaped field, so that the tracking is more
accurate. After 300 epoch training by the YOLO model, the partial detection results
are shown in Fig. 9.
Table 1 shows the target detection accuracy of the YOLO algorithm in multiple experiments
with different frame rates. By comparing “detected frames” to “correctly detected
frames,” you can calculate “accuracy,” which is the percentage of correctly detected
frames to the total detected frames. After the ball is detected by the YOLO model
in the snowboard U-shaped field under laboratory conditions, the predicted position
of the athlete in the video sequence is obtained by Kalman filtering, and the trajectory
of the athlete is drawn, as shown in Fig. 10.
Fig. 9. YOLO model training results.
Table 1. YOLO algorithm target detection accuracy.
|
Experimental serial number
|
Detect frame number
|
Correct detection of frames
|
Accuracy
|
|
1
|
100
|
97
|
97%
|
|
2
|
200
|
193
|
96.5%
|
|
3
|
300
|
290
|
96.7%
|
|
4
|
400
|
388
|
97%
|
|
5
|
500
|
491
|
98.2%
|
Fig. 10. Athlete trajectory analysis.
It can be seen from Fig. 11 that the red curve is the movement curve of the sphere in the U-shaped ski field.
The sphere in the video sequence simulates the movement of the athlete in the U-shaped
ski field. This method draws the movement curve better. Combined with the stability
of the YOLO algorithm. Since the subject of this experimental condition is a sphere,
and skiers are the main ones in the actual scene, it is necessary to re-establish
the corresponding data set for network training. After detecting the target to be
detected, that is, the athlete, according to YOLO, the athlete’s position is predicted
through Kalman filtering. Get the movement curve of the athlete.
At the same time, athletes have movement posture changes in training and competition,
especially for difficult rotation movements, so it is necessary to carry out subsequent
human posture estimation based on the results of the model. Through the posture estimation
results, the expected goals can be achieved according to the competition rules of
snowboard U-shaped skiing and the corresponding movement difficulty evaluation and
scoring methods.
Table 2 shows the Kalman filter trajectory tracking error.
Fig. 11. U-shaped field motion curve.
Table 2. Kalman filter trajectory tracking error.
|
Experimental serial number
|
Total frames
|
Average tracking error (cm)
|
Maximum tracking error (cm)
|
|
1
|
100
|
2.3
|
6.5
|
|
2
|
200
|
2.1
|
7.2
|
|
3
|
300
|
2.0
|
8.1
|
|
4
|
400
|
1.9
|
8.9
|
|
5
|
500
|
1.8
|
9.3
|
Table 3. Performance comparison.
|
Performance indicator
|
YOLO + KF
|
Baseline method
|
|
Detection accuracy (%)
|
95
|
85
|
|
Trajectory prediction accuracy (RMSE, pixels)
|
10
|
12
|
|
Tracking stability (target loss rate, %)
|
5
|
7
|
|
Processing speed (FPS)
|
30
|
28
|
|
Adaptability in complex scenarios (score, 1-10)
|
9
|
7
|
This experiment mainly reproduces the movement trajectory of the sphere simulation
athlete in the replica U-shaped snowboard model to analyze the feasibility of the
model in practical application. The results show that the drawing of the movement
trajectory of the sphere has certain accuracy, as shown in Fig. 12, the application in actual training and competition analysis has certain reference
value.
Table 3 shows that the YOLO algorithm and Kalman filter (YOLO+KF) combination performs significantly
better than the benchmark method in ski athlete tracking and trajectory analysis.
YOLO+KF has demonstrated higher performance in detection accuracy, trajectory prediction
accuracy, and tracking stability while maintaining good real-time processing speed.
Fig. 12. Sphere trajectory curve.
5. Conclusion
Based on the YOLO algorithm and Kalman filter, this paper puts forward a model of
a ski field under laboratory conditions to detect and track athletes by simulating
spheres. It draws the motion curve of the ball in the field based on how to better
train snowboarders and improve their movement skills in the sports scene. The experimental
results show that the method is feasible in the actual field and has a certain degree
of stability and accuracy. Through the analysis of several sets of experimental data,
we find that the accuracy rate of the YOLO algorithm in target detection is as high
as 95%. This means that in most cases, the algorithm can accurately identify the simulated
player’s position and size, providing a reliable basis for subsequent trajectory tracking.
Regarding trajectory tracking, the Kalman filter technique shows good stability. Kalman
filter can keep track of the target continuously and give a more accurate trajectory
prediction even when the simulated athlete moves faster, or the trajectory is more
complex. By comparing the actual trajectory with the predicted trajectory, we find
that the error between the two is small, proving the Kalman filter’s effectiveness
in trajectory tracking. According to the tracking data, we draw the movement curve
of the simulated athletes in the U-shaped field. By analyzing these curves, we can
find the characteristics and laws of athletes in different stages. In the initial
stage of entering the field, the athletes’ speed is faster, and the trajectory is
relatively smooth. While completing the movement, the athlete’s speed will slow down,
and the trajectory will become more complicated. These results provide a valuable
reference for athletes to improve their training and movement skills.
Funding
Heilongjiang Provincial Natural Science Foundation Project: Research on the Realization
Mechanism of Fintech Boosting the Promotion and Expansion of Ice-Snow Consumption
in Heilongjiang Province Project No.: PL2025G023
References
AlShami A. , Boult T. , Kalita J. , 2023, Pose2Trajectory: Using transformers
on body pose to predict tennis player’s trajectory, Journal of Visual Communication
and Image Representation, Vol. 97, pp. 103954

Cao Z. , Liao T. , Song W. , Chen Z. , Li C. , 2021, Detecting the shuttlecock
for a badminton robot: A YOLO based approach, Expert Systems with Applications, Vol.
164, pp. 113833

Ciaparrone G. , Sánchez F. L. , Tabik S. , Troiano L. , Tagliaferri R. ,
Herrera F. , 2020, Deep learning in video multi-object tracking: A survey, Neurocomputing,
Vol. 381, pp. 61-88

Dai Y. , Hu Z. , Zhang S. , Liu L. , 2022, A survey of detection-based video
multi-object tracking, Displays, Vol. 75, pp. 102317

Dunnhofer M. , Micheloni C. , 2024, Visual tracking in camera-switching outdoor
sport videos: Benchmark and baselines for skiing, Computer Vision and Image Understanding,
Vol. 243, pp. 103978

Saada M. , Kouppas C. , Li B. , Meng Q. , 2022, A multi-object tracker using
dynamic Bayesian networks and a residual neural network based similarity estimator,
Computer Vision and Image Understanding, Vol. 225, pp. 103569

Wang T. , 2024, Development of a multi-level feature fusion model for basketball
player trajectory tracking, Systems and Soft Computing, Vol. 6, pp. 200119

Yazici I. , Shayea I. , Din J. , 2023, A survey of applications of artificial
intelligence and machine learning in future mobile networks-enabled systems, Engineering
Science and Technology, an International Journal, Vol. 44, pp. 101455

Zhang J. , Han D. , Han S. , Li H. , Lam W.-K. , Zhang M. , 2024, ChatMatch:
Exploring the potential of hybrid vision-language deep learning approach for the intelligent
analysis and inference of racket sports, Computer Speech & Language, Vol. 89, pp.
101694

Li Z. , Xu B. , Wu D. , Zhao K. , Che S. , Lu M. , Cong J. , 2023, A
YOLO-GGCNN based grasping framework for mobile robots in unknown environments, Expert
Systems with Applications, Vol. 225, pp. 119993

Liu C. , Li X. , Li Q. , Xue Y. , Liu H. , Gao Y. , 2021, Robot recognizing
humans intention and interacting with humans based on a multi-task model combining
ST-GCN-LSTM model and YOLO model, Neurocomputing, Vol. 430, pp. 174-184

Mohamed H. El-D. , Fadl A. , Anas O. , Wageeh Y. , ElMasry N. , Nabil
A. , Atia A. , 2020, MSR-YOLO: Method to enhance fish detection and tracking in
fish farms, Procedia Computer Science, Vol. 170, pp. 539-546

Pinault L. J. , Yano H. , Okudaira K. , Crawford I. A. , 2024, YOLO-ET: A
machine learning model for detecting, localising and classifying anthropogenic contaminants
and extraterrestrial microparticles optimised for mobile processing systems, Astronomy
and Computing, Vol. 47, pp. 100828

Souza B. J. , Stefenon S. F. , Singh G. , Freire R. Z. , 2023, Hybrid-YOLO
for classification of insulators defects in transmission lines based on UAV, International
Journal of Electrical Power & Energy Systems, Vol. 148, pp. 108982

Tsai T.-H. , Wu P.-H. , 2024, Design and implementation of deep learning-based
object detection and tracking system, Integration, Vol. 99, pp. 102240

Wang X. , Wang X. , Li C. , Zhao Y. , Ren P. , 2022, Data-attention-YOLO
(DAY): A comprehensive framework for mesoscale eddy identification, Pattern Recognition,
Vol. 131, pp. 108870

Xu X. , Chen X. , Wu B. , Wang Z. , Zhen J. , 2022, Exploiting high-fidelity
kinematic information from port surveillance videos via a YOLO-based framework, Ocean
& Coastal Management, Vol. 222, pp. 106117

Yoshioka S. , Fujita Z. , Hay D. C. , Ishige Y. , 2018, Pose tracking with
rate gyroscopes in alpine skiing, Sports Engineering, Vol. 21, pp. 177-188

Yuan Y. , Wu Y. , Zhao L. , Chen H. , Zhang Y. , 2024, Multiple object detection
and tracking from drone videos based on GM-YOLO and multi-tracker, Image and Vision
Computing, Vol. 143, pp. 104951

Zheng Z. , Li J. , Qin L. , 2023, YOLO-BYTE: An efficient multi-object tracking
algorithm for automatic monitoring of dairy cows, Computers and Electronics in Agriculture,
Vol. 209, pp. 107857

Zi N. , Li X.-M. , Gade M. , Fu H. , Min S. , 2024, Ocean eddy detection
based on YOLO deep learning algorithm by synthetic aperture radar data, Remote Sensing
of Environment, Vol. 307, pp. 114139

Qi J. , Li D. , Zhang C. , Wang Y. , 2022, Alpine skiing tracking method based
on deep learning and correlation filter, IEEE Access, Vol. 10, pp. 39248-39260

Huang J. , Zhang H. , Wang X. , Qiu X. , 2024, A novel adaptive trajectory
tracking control for complex environments based on accelerated back-propagation neural
network, Journal of the Franklin Institute, Vol. 361, No. 13, pp. 107024

Shao Y. , Huang Q. , Mei Y. , Chu H. , 2024, MOD-YOLO: Multispectral object
detection based on transformer dual-stream YOLO, Pattern Recognition Letters, Vol.
183, pp. 26-34

Wan D. , Lu R. , Hu B. , Yin J. , Shen S. , Xu T. , Lang X. , 2024,
YOLO-MIF: Improved YOLOv8 with multi-information fusion for object detection in gray-scale
images, Advanced Engineering Informatics, Vol. 62, pp. 102709

Xiaoguo Chang received his master of science degree in physical education, and is
a lecturer. He graduated from Suzhou University in 2009. He worked in Harbin Institute
of Finance. His research interests include Skiing and tennis.
Wei Gao received her master of science degree in management, and she is an associate
professor. She graduated from Harbin University of Commerce in 2009. She worked in
Harbin Institute of Finance. Her research interests include Ice economy and marketing.