Mobile QR Code QR CODE

2025

Reject Ratio

81.5%


  1. (Harbin Institute of Finance, Harbin 150030, China)



YOLO, Kalman filter, Halfpipe snowboard, Target tracking

1. Introduction

Moving object detection has always been a popular field in computer vision. It has been widely used in automatic driving, intelligent transportation and military affairs. Target detection is divided into traditional feature-based and deep learning methods, which are more popular nowadays.

The task of object detection is to extract and classify the features of the input image and output the frame and category information of all labelled objects. Since the 1990s, its development has included two waves in the past few decades. As early as the first end-to-end object detection algorithm was proposed, with the continuous progress of computing devices, object detection technology based on Convolutional Neural Networks has developed rapidly. The traditional target detection method selects the candidate region of the target to be detected, selects the appropriate feature extraction method (including SIFT, HOG and other feature operators) through the sliding window, and then selects the classification according to the target classifier such as SVM, AdaBoost and other classifiers. With the continuous development of convolutional neural networks, deep convolutional neural networks have been applied to target detection and have incomparable advantages over traditional convolutional neural networks. Due to the continuous changes in application scenarios, target detection algorithms based on dual-mode feature fusion appear, and these detectors gradually achieve higher and higher accuracy and efficiency.

At present, there are two main types of target detection in deep learning. One is the Two-stage target detection algorithm [1], including R-CNN Fast-RCNN, Faster-RCNN and other algorithms. This kind of algorithm generates a large number of candidate regions that may contain targets from the image to be detected through selective search, and uses convolutional neural networks to extract features in the candidate regions, so as to carry out target classification and detection. Another One-stage target detection algorithm, including YOLO and SSD algorithms. This kind of algorithm does not need the candidate region, classifies or extracts the target through the end-to-end target detection method, and directly extracts the feature information through the convolutional neural network to classify and recognize the target, so it has a faster detection speed than the two-stage target detection algorithm. For occasions with real-time requirements, not only the speed is faster, but also has a higher accuracy.

In snowboard U-shaped field sports, athletes’ skill movements are divided into basic gliding movements, fancy movements in the air, and grasping board movements in the air [2]. Using YOLO algorithm to detect and track athletes and combined with Kalman filtering method to give athletes’ trajectory has high practicability, it can optimize athletes’ lack of movement skills in training or mistakes in competition and correct them in time, and coaches can better train and guide athletes’ skills through this method at the same time.

The innovative points of this article in the tracking and trajectory analysis of skiers mainly include:

(1) A hybrid tracking framework that combines the YOLO object detection algorithm in deep learning with the classical Kalman filtering algorithm is proposed. It leverages the advantages of the YOLO algorithm in quickly and accurately detecting athlete positions in complex backgrounds and uses Kalman filtering to smooth the detection results and predict future positions, effectively solving tracking loss problems caused by occlusion, rapid movement, and other factors.

(2) A dynamic adjustment strategy for Kalman filter parameters based on changes in the motion state is proposed. It can automatically adjust key parameters such as noise covariance matrix during the filtering process based on real-time data such as athlete’s speed and acceleration, further improving the accuracy and robustness of trajectory prediction.

(3) A multidimensional trajectory analysis and performance evaluation system is constructed. Through in-depth analysis of athletes’ trajectory data, key performance indicators such as speed, acceleration, turning radius, and stability can be evaluated, providing a scientific basis for athletes’ technical improvement and training plan formulation.

2. Target Detection Based on YOLO Algorithm

2.1. YOLO Algorithm

Compared with the previous version, YOLOv5 has improved detection accuracy and speed, and provides multiple different versions of the network to meet the needs of different application scenarios. The YOLOv5 algorithm includes five different versions, namely YOLOv5-n, YOLOv5-s, YOLOv5-m, YOLOv5-l, YOLOv5-x, implemented by the PyTorch framework of deep learning. The YOLOv5-n model is the shallowest network. In the later versions, the network is gradually deepened, the channel is widened, the number of parameters becomes more, and the reasoning speed drops. The network structure adopted by YOLOv5-s is relatively simple, and the reasoning speed of the s model is very fast, but the accuracy is lower than that of other versions, so it is suitable for scenarios that require real-time detection.

2.1.1 CSP1 module

The CSP1 module is used in the backbone network to learn residual features. The module structure is divided into two branches, one is through a series of operations such as convolution and residual, and the residual features transmit information through cross-layer direct connection to extract higher-level abstract features; The other branch only performs a Conv operation, and finally the two branches are concatted for subsequent operations, which increases the nonlinear expression ability of the network.

2.1.2 CSP2 module

CSP2 module is applied to the feature fusion part. CSP2 is similar to CSP1, which is also composed of a residual block and two convolution layers. First, the input feature map is divided into two branches along the channel dimension, one is the traditional convolution integral branch, the other is halved by CBL, and then through n CBLs, the resulting feature map is connected with the original input residual, specifically, the CSP2 module further optimizes the structure on the basis of the CSP1 module, and improves the detection performance and calculation efficiency of the model.

2.1.3 SPPF module

SPPF is an improved SPP module that reduces runtime. The SPPF module uses three 5 × 5 maximum pooling operations, 5 × 5 convolution is used to increase the receptive field, and multiple small-sized pooling kernels are cascaded to fuse the feature maps of different receptive fields and improve the expression ability of features. Compared with SPP, SPPF not only further enhances the expression ability of feature maps, but also improves the running speed.

For different data sets, the YOLO algorithm will have initial default length and width anchor boxes to predict bounding boxes, and these bounding boxes will be used to mark the targets in the video to be detected. Each bounding box predicts 4 coordinates: the center coordinate of the bounding box and the width and height of the bounding box. Then in network training, different bounding boxes are adapted to different training sets [3, 4]. Fig. 1 shows the feature extraction of the underground detection model of the ski resort. In the replicated U-shaped ski resort underground, a corresponding dataset is established, and the YOLO algorithm will provide the best bounding box to detect the corresponding target. The model first takes photos and captures the athlete’s movement trajectory during skiing, and then constructs point cloud features. The obtained point cloud features are fused and fed into the Pill feature extraction network. The features are then filtered using principal component analysis, focusing on the ROI region of interest. Finally, multimodal features are fused to obtain the detection results of the athlete’s movement trajectory.

Fig. 1. Feature extraction of SKI resort underground detection model.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig1.png

2.2. YOLOv5s Overall Structure

Global average pooling is an effective baseline method for extracting richer feature information by preserving global context information. GAP is applied to semantic segmentation to preserve context information. A multi-perceptual feature extraction module is constructed by deconvolution and GAP [5]. The global context information and the sub-region information of the context help to better distinguish the target categories. Fig. 2 shows the overall structural design of skiing YOLOv5s. YOLOv5s consists of a multi-scale pooling module UIE and a segmented attention module DNL connected in series. The model extracts feature separately in a parallel manner, fuses the features in the DNL section, and then sends them separately to the Detect Head to output feature results. This approach can extract global information from the features and combine global and local information.

Within the Multi-Path Module (MPM), the input is bifurcated, with the first pathway leveraging a 3 × 3 depth-wise convolution (DWConv) to expand the receptive field and enhance contextual information capture [6, 7]. Addressing the limitations of traditional GAP, the MPM incorporates three tailored GAP operations, targeting distinct sub-regions (1 × 1, 3 × 3, 6 × 6) and generating feature maps [X1, X2, X3] that capture varied spatial characteristics. Channel-wise execution ensures specific information retention. A novel fusion strategy is then implemented to harness the complementary nature of spatial and channel features.

Specifically, a 1 × 1 pointwise convolution (PWConv) integrates diverse spatial features, enhancing the module’s representational power. The output then passes through batch normalization (BN) and a ReLU activation for non-linearity and improved discrimination. This spatial and channel-wise feature fusion approach enhances input data representation, boosting network performance [8]. Feature maps of different sizes are resized to match the input using bilinear interpolation, minimizing information loss. Channel counts in sub-regions are aligned with input channels to determine global feature significance. These refined feature maps are then concatenated with other multi-scale features, preserving original inputs while incorporating multi-scale info. Attention mechanisms accentuate key features, downplay minor details, and highlight semantic regions of interest [9, 10]. We introduce the Split-and-Concat (SPC) module to underpin the Scale-Attentive Module (SAM) construction. The SPC module partitions the input feature map into multiple groups, denoted as [X0, X1, ..., XN-1], where each group comprises a reduced number of channels, specifically C′ = C/N. Each group is then processed with multi-scale convolutional kernels, whose sizes increase systematically within each group: 1, 3, 5, and 7. This approach enables the model to capture features at various scales, enhancing its capacity to represent complex and nuanced information. The receptive fields and the extracted context information differ in generating feature maps with different resolutions. Shuffle attention (SA) is introduced to weigh the features, making full use of spatial and channel feature information while reducing the parameter amount of the model. Each group of features is first divided into sub-feature groups according to the channel dimension; the number of channels is C ’/G, and the sub-feature groups are divided into two branches. The unit extracts channel attention ‘1 × K and spatial attention’ 2XK respectively, and the calculation process is shown in Formulas (1)-(3):

(1)
$ X'_{K1} = Sigmoid(W_1s+b_1)\cdot X_{k1}, $
(2)
$ s = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{i=1}^{W} X_{k1}(i, j), $
(3)
$ X'_{K2} = Sigmoid[W_2 \cdot GN(X_{k2} +b_2)] \cdot X_{k2}. $

Fig. 2. Overall structure design of ski sports YOLOv5s.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig2.png

Weighted features of subgroups were obtained by splicing according to channel dimensions. The feature information of each group is linked together to generate a feature map with pixel-level attention in the global range. Then the weight vector is obtained by using the weight module [11, 12]. Weights are recalculated using SoftMax, and the new weights which can be represented by formula (4):

(4)
$ att_i = softmax(Z_i) = \frac{e^{Z_i}}{\sum_{i=0}^{N-1} e^{Z_i}}. $

Multiply the weight of each group with the feature Fi obtained after SA, and the output is denoted as Yi, which can be expressed by Eq. (5):

(5)
$ Y_i = F_i \odot att_i. $

The YOLOv5s network first performs the slicing operation of the feature map through the Focus module, then performs the Concatenate mosaic operation, and then performs the feature extraction through the CBL module, and then through the subsequent multiple CBL modules and residual units, extract the features of the input image, and then perform an Up sampling operation on the feature map, and then perform the mosaic of the feature map to obtain three different size prediction results [13, 14]. The snowboarder is a small target relative to the U-shaped ski field, and the YOLO algorithm can achieve good stability and high accuracy for small target detection in the underground of the replica U-shaped ski field.

3. Tracking Athletes and Trajectory Analysis in Skiing Based on YOLO Algorithm and Kalman Filter

3.1. Kalman Filter Principle

The Kalman filtering method calculates the current state value according to the estimated value of the previous state and the observed value of the current state, and realizes the optimal estimation of the state quantity. The system state is a set of the smallest parameters that summarize the effects of all past inputs and perturbations of the system on the system. Knowing the state of the system can determine the entire behavior of the system together with future inputs and system perturbations [15, 16]. The principle of Kalman filtering is as follows:

The previous state prediction value predicts the current state value as shown in Eq. (6), and then predicts the variance as shown in Eq. (7).

(6)
$ X_{k|k-1} = \phi_{k,k-1}X_{k-1|k-1}, $
(7)
$ P_{k|k-1} = \phi_{k,k-1}P_{k-1|k-1}\phi^T_{k,k-1}. $

The gain matrix is calculated according to the variance prediction and the state prediction values as shown in Eq. (8), and the newly updated data series, i.e., the innovation series, is calculated as shown in Eq. (9).

(8)
$ K_k = P_{k|k-1}H^T_k [H_kP_{k|k-1}H^T_k +R_k]^{-1}, $
(9)
$ Z_{k|k-1} = Z_k -H_kX_{k|k-1}. $

Thereafter, the state estimate at time K is obtained as shown in Eq. (10). Updated variance estimates are shown in Eq. (11).

(10)
$ X_I = X_{I|I-1} +K_IX_{I|I-1}, $
(11)
$ P_{k|c} = [1-H_K]P_{k-1}. $

3.2. Feature Fusion Module

Deeper features can extract richer semantic information, but the image resolution is lower. The feature maps are recorded as C1, C2, C3, C4, C5, and the sizes are 208 × 208, 104 × 104, 52 × 52, 26 × 26, 13 × 13 [17]. The algorithm in this chapter regards C3′ as the main branch of the prediction of medium and small targets. On the basis of FPN, C2 layer is introduced. After down-sampling, C2 layer is spliced with C3 [18, 19]. On this basis, C1 layer is added, C1 is up-sampled and spliced with C2 to get a new feature map C2′, and C2′ is continuously up-sampled and spliced with C3. Subsequent experiments show that the introduction of C1 and C2 makes the infrared feature fusion effect reach the best.

Fig. 3 shows the distribution of semantic information of features. Input C5 into GCIAM to obtain a feature map C5 with a size of 13 × 13 [20, 21]. This feature map is used as a branch for detecting large targets. C4′ is obtained after down-sampling and splicing with C4, and then spliced with C3 after continuous down-sampling. The detection head consists of 1 × 1 convolution and 3 × 3 convolution. First, the number of channels is adjusted by 1 × 1 convolution, and the features after FFM output are integrated by 3 × 3 convolution.

Weighted average fusion, if there are two eigenvectors, then weighted average fusion can be expressed as shown in Eq. (12).

(12)
$ X_{fi} = \alpha X_1 + (1-\alpha)X_2. $

Concatenation is to join two or more eigenvectors end to end to form a longer eigenvector, as shown in Eq. (13).

(13)
$ X_k = [X_1, X_2]. $

Maximum/minimum value fusion, this method selects the maximum or minimum value of the corresponding position in the two feature vectors as the fused feature, as shown in Eq. (14) and (15).

(14)
$ X_{fuscd,i} = \max(X_1,i),X_2(i)), $
(15)
$ X_{fuscd,i} = \min(X_1,i),X_2(i)). $

Dot product fusion computes the dot product of two eigenvectors and may treat it as a single eigenvalue, as shown in Eq. (16).

(16)
$ X_{fused} = X_1 \cdot X_2. $

The bilinear model is a more complex fusion method, which first performs a linear transformation on each eigenvector, and then calculates the outer product of the transformed features, as shown in Eq. (17).

(17)
$ X_{fused} = (W_1X_1)\otimes(W_2X_2). $

Attention mechanisms allow the model to dynamically determine the importance of different features. In fusion, it uses an attention weight vector to weight the features, as shown in Eq. (18).

(18)
$ X_{fused} = \sum_i \alpha_iX_i. $

Fig. 3. Semantic information distribution of features.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig3.png

The multilayer perceptron can be seen as a general function approximator that can learn how to fuse features, as shown in Eq. (19).

(19)
$ X_{fused} = MLP([X_1, X_2]) $

3.3. Establishment of Trajectory Detection Model

YOLO algorithm can treat skiers as targets in images and perform rapid detection. YOLO model is trained to identify skier characteristics, such as clothing colour, body type, etc. [22, 23]. Athletes move very fast in ski races, so a method that can respond quickly and update detection results in real-time is needed. The YOLO algorithm fits this requirement, as it can simultaneously predict multiple bounding boxes and category probabilities in a single forward pass, greatly speeding up detection [24]. By adjusting the structure and parameters of the YOLO model, the accuracy of skier detection can be improved. For example, the depth or width of the model can be increased, more complex feature extractors can be used, or the diversity of the training dataset can be increased.

The updated state estimate and updated covariance formulas are shown in Eqs. (20) and (21).

(20)
$ \hat{x}|k = \hat{x}|k -1+K_k(z_k -H_k\hat{x}_{k|k-1}) $
(21)
$ P_{k|k} = (1 - K_k H_k) P_{k|k -1} $

Kalman filter is an optimal estimation method based on the state equation of linear system. In skiers tracking, the position and speed of skiers can be taken as state variables, and these state variables can be estimated optimally by Kalman filter algorithm [25]. In practice, due to the influence of sensor noise, image noise and other factors, the directly observed position information of athletes may have errors. Kalman filtering can reduce the impact of these noises on the results by weighted fusion of observed and predicted values. Kalman filtering can not only provide the athlete’s position estimation at the current moment, but also predict the athlete’s position at the future moment. Through the continuous application of Kalman filter algorithm, a smooth trajectory can be obtained.

Combining the YOLO algorithm and Kalman filtering to establish a skier tracking and motion trajectory detection model, as shown in Fig. 4, the detection steps are as follows:

Initial detection: Using the YOLO algorithm to perform initial detection of skiers and obtain the initial position information of athletes.

Tracking initialization: The player position information detected by the YOLO algorithm is used as the initial state input of the Kalman filter.

Continuous tracking: In each time step, the athlete in the current frame is first detected using the YOLO algorithm, and the detection results are fused with the prediction results of the Kalman filter. Then, the state of the Kalman filter is updated using the fused results, and the athlete’s position for the next step is predicted.

Trajectory construction: The position information of athletes obtained from continuous tracking is connected to form a smooth motion trajectory.

Fig. 4. Combining YOLO algorithm and Kalman filter histogram.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig4.png

Model Optimization: According to actual application scenarios and requirements, the parameters of the YOLO algorithm and Kalman filter can be adjusted and optimized to improve the detection accuracy, tracking stability and real-time performance of the model.

Combining the YOLO algorithm and Kalman filter technology can construct an efficient and accurate skier tracking and trajectory detection model. The model can work well in situations with high real-time requirements and adapt to the requirements of athlete tracking in various complex scenes.

4. Experimental Results and Analysis

4.1. Experimental Model Establishment

Snowboard U-shaped ski competition requires athletes to use a ski to take off through the run-up slope in the specified U-shaped field, complete difficult movements in the air, create the maximum vertical distance and complete high-quality aerial swivels and other movements. There are two types of existing U-shaped sites, which are divided into horizontal slope and non-horizontal slope. Fig. 5 shows the broken line of sports performance under the actual standard competition field. In this experiment, the U-shaped field without horizontal track is used for data collection to realize the target detection of athletes and track and characterize the trajectory of athletes. Through the domestic and foreign standard U-shaped snowboard ski field research and analysis, and then the actual replica model building. By scaling the size parameters of the standard competition field, we can get the snowboard U-shaped field under the laboratory conditions, in order to reproduce the sports performance of athletes under the actual standard competition field under the laboratory model to the greatest extent. The U-shaped site under this experimental condition uses an adjustable angle elevator as the main structure of the model, as shown in Fig. 6. Its slope is adjustable and can meet the test and analysis under different conditions. The site is 1.2 M long and 20CM wide, and the slope adopts the international standard 18? as the experimental condition.

Fig. 5. Athletic performance polyline under actual standard playing field.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig5.png

Fig. 6. Structure distribution of adjustable angle lifter body.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig6.png

4.2. Experimental Dataset

According to the actual training scene and competition situation, the data set required by the U-shaped snowboard sports target detection scene under laboratory conditions is established, and the author chooses to build the data set himself. The parameter distribution of the data set is shown in Fig. 7. By replacing the player with a sphere in a replica U-shaped field, the relevant data set is established. A 12-megapixel f/1. 8 aperture CMOS wide-angle lens is mainly used to shoot the ball multiple times in different scenes. For the video obtained, a frame of picture is taken every 0.3 S as a data set picture. The test ball has a diameter of 40.00 mm and a weight of 2.53 grams. The data set consists of 1000 pictures, and the accuracy of the algorithm in the snowboard U-shaped field is tested through ten-fold cross-validation. Nine data sets are used as the training set, and one data set is used as the test set for network training.

Fig. 7. Parameter distribution of ski motion target detection scene dataset.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig7.png

4.3. Training Settings

The corresponding dataset is established, and the cross-validation method is used to train the athletes in U-shaped ski resorts using the YOLO network model under laboratory conditions to detect the targets of the athletes under the U-shaped ski resorts. The experiment is on the Ubuntu 16.04 platform, using the open-source deep learning framework Pytorch, through a server equipped with NVidiaRTX2080ti graphics card for model training, in which the YOLO algorithm uses Mosaic data enhancement method to rotate, translate, zoom and other methods of the image to increase the data set, to improve the robustness of the network, the feature distribution is shown in Fig. 8.

Fig. 8. Mosaic data enhancement feature distribution.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig8.png

4.4. Partial Test Result

In this paper, the YOLO network is used to detect the target of simulating athletes with spheres in multiple videos with different shooting angles, and the accurate bounding box can be obtained; At the same time, Kalman filter is used to accurately draw the trajectory of the ball in the replica U-shaped field, so that the tracking is more accurate. After 300 epoch training by the YOLO model, the partial detection results are shown in Fig. 9.

Table 1 shows the target detection accuracy of the YOLO algorithm in multiple experiments with different frame rates. By comparing “detected frames” to “correctly detected frames,” you can calculate “accuracy,” which is the percentage of correctly detected frames to the total detected frames. After the ball is detected by the YOLO model in the snowboard U-shaped field under laboratory conditions, the predicted position of the athlete in the video sequence is obtained by Kalman filtering, and the trajectory of the athlete is drawn, as shown in Fig. 10.

Fig. 9. YOLO model training results.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig9.png

Table 1. YOLO algorithm target detection accuracy.

Experimental serial number Detect frame number Correct detection of frames Accuracy
1 100 97 97%
2 200 193 96.5%
3 300 290 96.7%
4 400 388 97%
5 500 491 98.2%

Fig. 10. Athlete trajectory analysis.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig10.png

It can be seen from Fig. 11 that the red curve is the movement curve of the sphere in the U-shaped ski field. The sphere in the video sequence simulates the movement of the athlete in the U-shaped ski field. This method draws the movement curve better. Combined with the stability of the YOLO algorithm. Since the subject of this experimental condition is a sphere, and skiers are the main ones in the actual scene, it is necessary to re-establish the corresponding data set for network training. After detecting the target to be detected, that is, the athlete, according to YOLO, the athlete’s position is predicted through Kalman filtering. Get the movement curve of the athlete.

At the same time, athletes have movement posture changes in training and competition, especially for difficult rotation movements, so it is necessary to carry out subsequent human posture estimation based on the results of the model. Through the posture estimation results, the expected goals can be achieved according to the competition rules of snowboard U-shaped skiing and the corresponding movement difficulty evaluation and scoring methods.

Table 2 shows the Kalman filter trajectory tracking error.

Fig. 11. U-shaped field motion curve.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig11.png

Table 2. Kalman filter trajectory tracking error.

Experimental serial number Total frames Average tracking error (cm) Maximum tracking error (cm)
1 100 2.3 6.5
2 200 2.1 7.2
3 300 2.0 8.1
4 400 1.9 8.9
5 500 1.8 9.3

Table 3. Performance comparison.

Performance indicator YOLO + KF Baseline method
Detection accuracy (%) 95 85
Trajectory prediction accuracy (RMSE, pixels) 10 12
Tracking stability (target loss rate, %) 5 7
Processing speed (FPS) 30 28
Adaptability in complex scenarios (score, 1-10) 9 7

This experiment mainly reproduces the movement trajectory of the sphere simulation athlete in the replica U-shaped snowboard model to analyze the feasibility of the model in practical application. The results show that the drawing of the movement trajectory of the sphere has certain accuracy, as shown in Fig. 12, the application in actual training and competition analysis has certain reference value.

Table 3 shows that the YOLO algorithm and Kalman filter (YOLO+KF) combination performs significantly better than the benchmark method in ski athlete tracking and trajectory analysis. YOLO+KF has demonstrated higher performance in detection accuracy, trajectory prediction accuracy, and tracking stability while maintaining good real-time processing speed.

Fig. 12. Sphere trajectory curve.

../../Resources/ieie/IEIESPC.2026.15.2.176/fig12.png

5. Conclusion

Based on the YOLO algorithm and Kalman filter, this paper puts forward a model of a ski field under laboratory conditions to detect and track athletes by simulating spheres. It draws the motion curve of the ball in the field based on how to better train snowboarders and improve their movement skills in the sports scene. The experimental results show that the method is feasible in the actual field and has a certain degree of stability and accuracy. Through the analysis of several sets of experimental data, we find that the accuracy rate of the YOLO algorithm in target detection is as high as 95%. This means that in most cases, the algorithm can accurately identify the simulated player’s position and size, providing a reliable basis for subsequent trajectory tracking.

Regarding trajectory tracking, the Kalman filter technique shows good stability. Kalman filter can keep track of the target continuously and give a more accurate trajectory prediction even when the simulated athlete moves faster, or the trajectory is more complex. By comparing the actual trajectory with the predicted trajectory, we find that the error between the two is small, proving the Kalman filter’s effectiveness in trajectory tracking. According to the tracking data, we draw the movement curve of the simulated athletes in the U-shaped field. By analyzing these curves, we can find the characteristics and laws of athletes in different stages. In the initial stage of entering the field, the athletes’ speed is faster, and the trajectory is relatively smooth. While completing the movement, the athlete’s speed will slow down, and the trajectory will become more complicated. These results provide a valuable reference for athletes to improve their training and movement skills.

Funding

Heilongjiang Provincial Natural Science Foundation Project: Research on the Realization Mechanism of Fintech Boosting the Promotion and Expansion of Ice-Snow Consumption in Heilongjiang Province Project No.: PL2025G023

References

1 
AlShami A. , Boult T. , Kalita J. , 2023, Pose2Trajectory: Using transformers on body pose to predict tennis player’s trajectory, Journal of Visual Communication and Image Representation, Vol. 97, pp. 103954DOI
2 
Cao Z. , Liao T. , Song W. , Chen Z. , Li C. , 2021, Detecting the shuttlecock for a badminton robot: A YOLO based approach, Expert Systems with Applications, Vol. 164, pp. 113833DOI
3 
Ciaparrone G. , Sánchez F. L. , Tabik S. , Troiano L. , Tagliaferri R. , Herrera F. , 2020, Deep learning in video multi-object tracking: A survey, Neurocomputing, Vol. 381, pp. 61-88DOI
4 
Dai Y. , Hu Z. , Zhang S. , Liu L. , 2022, A survey of detection-based video multi-object tracking, Displays, Vol. 75, pp. 102317DOI
5 
Dunnhofer M. , Micheloni C. , 2024, Visual tracking in camera-switching outdoor sport videos: Benchmark and baselines for skiing, Computer Vision and Image Understanding, Vol. 243, pp. 103978DOI
6 
Saada M. , Kouppas C. , Li B. , Meng Q. , 2022, A multi-object tracker using dynamic Bayesian networks and a residual neural network based similarity estimator, Computer Vision and Image Understanding, Vol. 225, pp. 103569DOI
7 
Wang T. , 2024, Development of a multi-level feature fusion model for basketball player trajectory tracking, Systems and Soft Computing, Vol. 6, pp. 200119DOI
8 
Yazici I. , Shayea I. , Din J. , 2023, A survey of applications of artificial intelligence and machine learning in future mobile networks-enabled systems, Engineering Science and Technology, an International Journal, Vol. 44, pp. 101455DOI
9 
Zhang J. , Han D. , Han S. , Li H. , Lam W.-K. , Zhang M. , 2024, ChatMatch: Exploring the potential of hybrid vision-language deep learning approach for the intelligent analysis and inference of racket sports, Computer Speech & Language, Vol. 89, pp. 101694DOI
10 
Li Z. , Xu B. , Wu D. , Zhao K. , Che S. , Lu M. , Cong J. , 2023, A YOLO-GGCNN based grasping framework for mobile robots in unknown environments, Expert Systems with Applications, Vol. 225, pp. 119993DOI
11 
Liu C. , Li X. , Li Q. , Xue Y. , Liu H. , Gao Y. , 2021, Robot recognizing humans intention and interacting with humans based on a multi-task model combining ST-GCN-LSTM model and YOLO model, Neurocomputing, Vol. 430, pp. 174-184DOI
12 
Mohamed H. El-D. , Fadl A. , Anas O. , Wageeh Y. , ElMasry N. , Nabil A. , Atia A. , 2020, MSR-YOLO: Method to enhance fish detection and tracking in fish farms, Procedia Computer Science, Vol. 170, pp. 539-546DOI
13 
Pinault L. J. , Yano H. , Okudaira K. , Crawford I. A. , 2024, YOLO-ET: A machine learning model for detecting, localising and classifying anthropogenic contaminants and extraterrestrial microparticles optimised for mobile processing systems, Astronomy and Computing, Vol. 47, pp. 100828DOI
14 
Souza B. J. , Stefenon S. F. , Singh G. , Freire R. Z. , 2023, Hybrid-YOLO for classification of insulators defects in transmission lines based on UAV, International Journal of Electrical Power & Energy Systems, Vol. 148, pp. 108982DOI
15 
Tsai T.-H. , Wu P.-H. , 2024, Design and implementation of deep learning-based object detection and tracking system, Integration, Vol. 99, pp. 102240DOI
16 
Wang X. , Wang X. , Li C. , Zhao Y. , Ren P. , 2022, Data-attention-YOLO (DAY): A comprehensive framework for mesoscale eddy identification, Pattern Recognition, Vol. 131, pp. 108870DOI
17 
Xu X. , Chen X. , Wu B. , Wang Z. , Zhen J. , 2022, Exploiting high-fidelity kinematic information from port surveillance videos via a YOLO-based framework, Ocean & Coastal Management, Vol. 222, pp. 106117DOI
18 
Yoshioka S. , Fujita Z. , Hay D. C. , Ishige Y. , 2018, Pose tracking with rate gyroscopes in alpine skiing, Sports Engineering, Vol. 21, pp. 177-188DOI
19 
Yuan Y. , Wu Y. , Zhao L. , Chen H. , Zhang Y. , 2024, Multiple object detection and tracking from drone videos based on GM-YOLO and multi-tracker, Image and Vision Computing, Vol. 143, pp. 104951DOI
20 
Zheng Z. , Li J. , Qin L. , 2023, YOLO-BYTE: An efficient multi-object tracking algorithm for automatic monitoring of dairy cows, Computers and Electronics in Agriculture, Vol. 209, pp. 107857DOI
21 
Zi N. , Li X.-M. , Gade M. , Fu H. , Min S. , 2024, Ocean eddy detection based on YOLO deep learning algorithm by synthetic aperture radar data, Remote Sensing of Environment, Vol. 307, pp. 114139DOI
22 
Qi J. , Li D. , Zhang C. , Wang Y. , 2022, Alpine skiing tracking method based on deep learning and correlation filter, IEEE Access, Vol. 10, pp. 39248-39260DOI
23 
Huang J. , Zhang H. , Wang X. , Qiu X. , 2024, A novel adaptive trajectory tracking control for complex environments based on accelerated back-propagation neural network, Journal of the Franklin Institute, Vol. 361, No. 13, pp. 107024DOI
24 
Shao Y. , Huang Q. , Mei Y. , Chu H. , 2024, MOD-YOLO: Multispectral object detection based on transformer dual-stream YOLO, Pattern Recognition Letters, Vol. 183, pp. 26-34DOI
25 
Wan D. , Lu R. , Hu B. , Yin J. , Shen S. , Xu T. , Lang X. , 2024, YOLO-MIF: Improved YOLOv8 with multi-information fusion for object detection in gray-scale images, Advanced Engineering Informatics, Vol. 62, pp. 102709DOI
Xiaoguo Chang
../../Resources/ieie/IEIESPC.2026.15.2.176/au1.png

Xiaoguo Chang received his master of science degree in physical education, and is a lecturer. He graduated from Suzhou University in 2009. He worked in Harbin Institute of Finance. His research interests include Skiing and tennis.

Wei Gao
../../Resources/ieie/IEIESPC.2026.15.2.176/au2.png

Wei Gao received her master of science degree in management, and she is an associate professor. She graduated from Harbin University of Commerce in 2009. She worked in Harbin Institute of Finance. Her research interests include Ice economy and marketing.