Mobile QR Code QR CODE

  1. (Physical Education Department, North China Electric Power University, Baoding, 071003, China liuchen199207@126.com)



Basketball match, Video details enhancement, Three frame difference method, Background subtraction, HMM, CamShift tracking

1. Introduction

In the competition, whether an athlete's action is foul or not is mainly judged by the visual observation of the referee, but due to different observation results from different angles, it is easy to be affected by subjective judgment and make mistakes [1]. Therefore, the use of machine vision technology to detect illegal movements in the basketball game has become an important research direction of game judgment [2]. However, there are still many problems to be solved when using video for detection, such as dealing with the uncertainty of the target in the moving state and the vision of obstacle occlusion detection. In addition, the impact of light changes should also be considered when using machine vision for detection [3]. Therefore, in order to enhance the fairness of basketball games and reduce the negative impact caused by misjudgments, research is being conducted on the use of intelligent penalty methods based on machine vision. The research objective is to improve the accuracy of detecting and refereeing fouls in basketball games based on a basketball foul tracking model enhanced by video details. The fusion algorithm is used to enhance the target detection in the image, and then the CamShift tracking algorithm (CamShift) and Hidden Markov model (HMM) are used to achieve object tracking [4]. On the basketball court, the tracking target may be blocked by other athletes' movements such as movement and pick and remove, so Kalman filtering is introduced to improve the CamShift to eliminate the problem of line of sight occlusion [5]. The innovation lies in the significant improvement of the detection and recognition ability of athletes' movements by combining three frame difference (TFD), BT, and wavelet transform technology. The established model provides a new research direction for the refereeing of basketball games and further ensures the notarization of sports events.

2. Related Works

The image and video acquisition process is susceptible to a number of variables that can impact the visual quality of the final output. These include the lighting environment, the equipment used, background noise, and others. To address this, numerous scholars have conducted experiments with the objective of enhancing the visual impact of images. Wang et al. proposed a transformer-based weak light enhancement method to address the issue of insufficient performance in optical sensor quality processing of large-scale images. The core component of the method was an axis-based multi-head self-attention and cross layer attention fusion block, which significantly reduced linear complexity. The experimental results showed that the proposed method was superior to state-of-the-art methods [6]. Guo X et al. proposed a new framework inspired by the principle of divide and conquer, which greatly alleviated the problem of complex degradation in images captured in low light environments. The experimental results indicated that the proposed method demonstrated superiority over state-of-the-art alternative solutions [7]. Wang et al. proposed an SR network structure, which can super-resolve optical flow and images. The video super-resolution network effectively used the correlation asphyxia between consecutive frames, and used depth learning technology to provide time dependence on optical flow. Through relevant experiments, SR network achieved advanced performance in Vid4 dataset [8]. Through the pseudo color imaging protocol of Caitong, Zheng and Zhang converted MFL information into image representation. The maximum modulus method was applied to extract various features of the image at the point of steel wire fracture, and then an MFL measuring device with unsaturated magnetic excitation was constructed [9]. Tang et al. used feature enhancement and land burial network technology, and combined depth learning target detection technology with image processing methods. In the experiment, the model had good detection performance, which solved the problem that SAR imaging mechanism causes a lot of noise similar to ships in the image [10].

In intelligent video analysis system, moving object tracking is broadly applied in smart monitoring, human-computer interaction, automatic driving and other fields. To track and recognize accurately under the challenges of environment changes, occlusion deformation of tracking objects and scale changes, more and more research has been done on motion tracking and recognition. Kim and Chi proposed a vision-based animal recognition framework, which used pre-trained detectors to locate excavators, and realized dynamic tracking of excavators through tracking learning detection algorithm. Through the analysis of association detection results, the accuracy of the model framework reached 93.8%, with good recognition effect [11]. Jaouedi et al. constructed a human body recognition model through hybrid depth learning. This model used the evaluation algorithm to extract the upper and lower feature data, and realized the classification of sequence data and video by the gated recurrent neural network. The method was tested in KTH dataset, and its average accuracy reached 96.3% [12]. Angelini et al. proposed a new attitude level HAR method based on 2D posture, which used the human posture provided by OpenPose to reduce the gap. Experiments were conducted on relevant data sets. In the experiments, the method was robust to occlusion and missing data, and could obtain practical application effects [13]. Zhang et al. enhanced spatiotemporal attention by using two LSTM structures to recognize human actions. They designed a spatiotemporal dual attention network to deal with the temporal characteristics of context information. In the simulation experiment, the model showed good recognition ability [14]. Ge H et al. proposed a new convolutional LSTM action recognition algorithm based on attention mechanism, which used spatial converter network to process GoogleNet to extract features in video frames, and finally used convolutional LSTM for modeling. In the experiment, the method effectively represented the dynamic and static space of video [15].

In summary, significant progress has been made in motion detection and recognition technology in existing research. However, in basketball games, relying solely on the referee's visual judgment to determine fouls can easily lead to misjudgments. Therefore, there is still a need to further improve the accuracy of detecting and recognizing athlete movements. The basketball foul tracking model proposed in this article based on video detail enhancement not only comprehensively utilizes various existing methods (three frame difference method, background subtraction method, and wavelet transform), but also incorporates improved CamShift algorithm and hidden Markov model to enhance the detection ability of foul behavior in complex scenes. The innovation of the study is that the TFD method, wavelet transform (WT) methodand background subtraction (BS) method are organically combined when target detection is conducted, which greatly enhances the target recognition effect in motion. The tracking algorithm (TA) is improved through Kalman filter, so that the foul action of athletes can be tracked and recognized quickly and accurately.

3. Construction of a Model for Tracking and Identifying Foul Actions in Basketball Matches

3.1 Image Data Preprocessing and Target Detection

Common mistakes and wrong decisions have always been the cause of conflict on the court. The action standard in basketball matches is very strict. To reduce mistakes and wrong decisions in basketball matches, it is necessary to accurately detect players' foul behaviors. The steps for tracking and identifying violations in sports games include image data preprocessing, motion target detection, object tracking, and foul behavior recognition.VDE technology is utilized to process and analyze the game video. Before tracking and identifying the foul actions in the game, this technology needs to preprocess the motion video image data, and then achieve the tracking and recognition of foul actions in the ball game [16]. First, the image is transformed into a grayscale image through the red, green, and blue (RGB) color mode. The processing function is denoted in Eq. (1) below.

(1)
$ I_{i}\left(x,y\right)=\alpha r\left(x,y\right)+\beta g\left(x,y\right)+\gamma b\left(x,y\right) $

In Eq. (1), $\alpha $, $\beta $ and $\gamma $ are the values of the basketball players' action images that are weighted.$I_{i}(x,y)$ represents the value of the processed grayscale image at point $(x,y)$.When the athlete's action image is captured in natural light and in a single light, the image weight is set to 1 and 0. Following the grayscale processing and weighting of the image, the WT algorithm is employed to denoise the data signal. The WTequation is denoted in the following Eq. (2).

(2)
$ WT\left(a,\tau \right)=\frac{1}{\sqrt{a}}\int _{-\infty }^{\infty }f\left(t\right)\cdot \varphi \left(\frac{t-\tau }{a}\right)dt $

In Eq. (2), $a$ represents variable scale, and $\tau $ represents variable translation. After multiple WTs, the obtained signal components are denoted in Eq. (3).

(3)
$ S=A_{n}+D_{1}+D_{2}+\cdot \cdot \cdot +D_{n} $

In Eq. (3), $S$ represents the initial signal, $D_{n}$ represents the noise signal got after $n$WTs, and $A_{n}$ represents the effective signal got after $n$WTs. Through experiment, the 3-layer WT has the optimum denoising effect. The preceding content demonstrates the denoising and graying processing of basketball game video action data. It is necessary to detect moving objects, which is to take the color, shape, position, size and other information of the object in each frame of the video stream image. The essence of the video sequence is 3D data containing time dimensions. The TFD method is used to extract the moving target, as shown in Fig. 1.

Fig. 1. Diagram of TFD method.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig1.png

First, the adjacent three frame degree values $I_{k-2}$, $I_{k-1}$ and $I_{k}$ are collected for operation, and the absolute value of the adjacent two difference is calculated to the difference graph equation, as shown in Eq. (4).

(4)
$ \left\{\begin{array}{l} D_{1}\left(x,y\right)=\left| I_{k-1}-I_{k-2}\right| \\ D_{2}\left(x,y\right)=\left| I_{k}-I_{k-1}\right| \end{array}\right. $

Two adjacent binary differences are binarized, and the expression is shown in Eq. (5).

(5)
$ \left\{\begin{array}{l} T_{1}=d_{1}+\beta \delta _{1}\\ T_{2}=d_{2}+\beta \delta _{2}\\ D_{1}\left(x,y\right)=\left\{\begin{array}{l} \begin{array}{ll} 255 & d_{1}\geq T_{1} \end{array}\\ \begin{array}{ll} 0 & d_{2}<T_{1} \end{array} \end{array}\right.\\ D_{1}\left(x,y\right)=\left\{\begin{array}{l} \begin{array}{ll} 255 & d_{2}\geq T_{2} \end{array}\\ \begin{array}{ll} 0 & d_{2}<T_{2} \end{array} \end{array}\right. \end{array}\right. $

In Eq. (5), $d$and $\delta $ mean the average value and standard deviationof the difference graph, and $T$ is the threshold value. The moving target information is obtained by performing or operation on two binary images, as denoted in Eq. (6).

(6)
$ DI\left(x,y\right)=D_{1}\left(x,y\right)\oplus D_{2}\left(x,y\right) $

The Eq. (6) is applied to process the image. There are still some limitations in the research. When moving objects are extracted, there will be holes. Therefore, the research combines the BS method to avoid this phenomenon. Eq. (7) is obtained from the difference of BS method.

(7)
$ DB\left(x,y\right)=\left| I_{k}\left(x,y\right)-B\left(x,y\right)\right| $

In Eq. (7), $I_{k}(x,y)$represents the current degree value and $B\left(x,y\right)$represents thebackgroundgray value. $d\left(x,y\right)$ is $\overline{d}$ and $\delta $ of $DB\left(x,y\right)$ pixels of the value image of the difference absolute value image $DB\left(x,y\right)$, and the sum standard deviation is set to $T$, and then the Eq. (8) is obtained by binarization.

(8)
$\left\{\begin{array}{l} \overline{d}=\frac{\sum _{x=0}^{x<w}\sum _{y=0}^{y<h}d\left(x,y\right)}{wh}\\ \delta =\sqrt{\frac{\sum _{x=0}^{x<w}\sum _{y=0}^{y<h}\left[d\left(x,y\right)-\overline{d}\right]^{2}}{wh}}\\ T=\overline{d}+\alpha \delta \\ DB\left(x,y\right)=\left\{\begin{array}{l} 225\begin{array}{ll} & d\geq T \end{array}\\ 0\begin{array}{ll} & d<T \end{array} \end{array}\right. \end{array}\right.$

In Eq. (8), $\alpha $ represents a number; $w$ indicates degree; $h$ represents afigure. Eq. (9) is obtained by "AND" operation of the moving target information obtained by the combination method.

(9)
$ D\left(x,y\right)=DI\left(x,y\right)\otimes DB\left(x,y\right) $

Once the requisite data has been gathered, the background is updated in a manner that is responsive to the specific circumstances. The update expression is shown in Eq. (10).

(10)
$ B\left(x,y\right)=\left\{\begin{array}{l} B\left(x,y\right)\begin{array}{ll} & \begin{array}{ll} & \begin{array}{ll} & \begin{array}{ll} & \begin{array}{ll} & \begin{array}{ll} & D\left(x,y\right)=0 \end{array} \end{array} \end{array} \end{array} \end{array} \end{array}\\ uB\left(x,y\right)+\left(1-u\right)I_{k}\left(x,y\right)\begin{array}{ll} & D\left(x,y\right)\neq 0 \end{array} \end{array}\right. $

In Eq. (10), $u$ is a fixed value, and the best value is 0.997 through research and calculation. The fusion algorithm flow chart is shown in Fig. 2.

Based on the aforementioned contents, this paper studies and realizes the preprocessing of basketball game action video images, and applies the integrated algorithm of TFD and BS methods to detect and extract image objects.

Fig. 2. Flow chart of fusion algorithm for object detection.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig2.png

3.2 Object Tracking based on Improved CamShift

Once the object has been identified and extracted, it is essential to track it in order to identify subsequent violations in real time. To save the running time and achieve the effect of close tracking, the improved CamShift is utilized to realize object tracking in basketball game video. CamShift algorithm is a continuous adaptive mean shift (MS) algorithm. CamShift algorithm is mainly used for iterative processing of video sequences [17,18]. The MS algorithm is a member of the kernel density estimation method, which describes both the target model and the candidate model through the probability of the pixel value within the specified and the candidate regions. It can be supposed that there is a high-dimensional space and $n$ sample points exist in the space, then the MS vector of the sample points is shown in Eq. (11).

(11)
$ M_{h}\left(x\right)=\frac{1}{k}\sum _{x_{i}\in S_{h}}\left(x_{i}-x\right) $

In Eq. (11), $x_{i}$ refers to the sample point, $k$ refers to the number of samples falling into $S_{h}$ regions, $S_{h}$ refers to the high-dimensional sphere region with radius of $h$, and is the set of $y$ points meeting the relationship of Eq. (12).

(12)
$ S_{h}\left(x\right)=\left\{y\colon \left(y-x\right)^{T}\left(y-x\right)\leq h^{2}\right\} $

Considering the influence of the distance of each pixel, the basic MS form is extended to Eq. (13) by introducing kernel function.

(13)
$ M_{h}\left(x\right)=\frac{\sum _{i=1}^{n}G\left(\frac{x_{i}-x}{h}\right)\omega \left(x_{i}\right)\left(x_{i}-x\right)}{\sum _{i=1}^{n}G\left(\frac{x_{i}-x}{h}\right)\omega \left(x_{i}\right)} $

In Eq. (13), $G\left(x\right)$represents a unit kernel function, and $\omega $means the weight value assigned to the sampling point. Use Eq. (13) to iterate and get the following Eq. (14).

(14)
$ m_{h}\left(x\right)=M_{h}\left(x\right)+x $

The value of $m_{h}\left(x\right)$ is calculated and assigned to $x$. After that, $M_{h}\left(x\right)$ is calculated again. If the $M_{h}\left(x\right)$’s absolute value is lower than the fault tolerance error, the cycle is ended to obtain the final object location. Unless, the calculation continues. The diagram of the MS TA is denoted in Fig. 3.

Fig. 3. Schematic diagram of MS TA.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig3.png

The CamShift algorithm employs the mode of the probability distribution (PD) image identified by the MS algorithm. In the context of subsequent video detection, a feedback loop is introduced, whereby the outcome of the previous detection process is employed as the input for the subsequent detection. Furthermore, the search region is constrained to a region surrounding the most recent known target position. Once the tracking probability model has been established, the tracking object should be positioned at the center of the tracking window, and its expression is indicated in Eq. (15).

(15)
$ \left\{\begin{array}{l} \widehat{\overline{p}}_{k}\left(W\right)=\frac{1}{\left| W\right| }\sum _{j\in W}p_{j}\\ \widehat{\overline{p}}_{k}\left(W\right)-p_{k}\approx \frac{f'\left(p_{k}\right)}{f\left(p_{k}\right)} \end{array}\right. $

In Eq. (15), $W$represents a search window, $p_{k}$ is the initial center point of the search window, and $f\left(p\right)$ is the MS climb gradient equation. A novel center point $\widehat{\overline{p}}_{k}$ is found through dynamic iteration. During the course of research, it was discovered that the target may be lost because of occlusion of the tracking object. To address this issue, a Kalman filter is employed to forecast the motion parameters, thereby enabling the adjustment of the position of the search window and the compensation for the lost target [19].

3.3 Analysis and Recognition of Foul Action based on HMM

In a basketball game, mistakes usually occur when judging foul and approximate foul actions only by the referee's eyes. In the previous research, the fusion algorithm combining TFD method and BS method has been used to detect and extract moving objects, and the improved CamShift algorithm has been applied to realize the tracking of moving objects in the sports video of ball games. After that, it is necessary to recognize the foul actions of athletes, and the research uses HMM algorithm to extract and recognize the foul actions of video target objects. The HMM identification is indicated in Fig. 4.

Fig. 4. HMM foul action identification.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig4.png

There are 6 types of foul action data sets of basketball players: invading the foul with hands, blocking the foul, excessive elbow swing, pulling, pushing, and hitting with the ball. First of all, each violation action is targeted to be modeled, and the data volume of each violation action is set to 120, and the action is decomposed into $n$ meta actions. These meta actions in the process of sports have time sequence, so each game violation action is regarded as an observation sequence with a length of $n$, and it is trained and learned to explore the best HMM parameters. Once the optimal parameters have been identified, the extracted observation sequence data can be utilized as the input data for the HMM, with the Viterbi algorithm subsequently employed to determine the probability of each action within the video. The action corresponding to the model with the highest output probability is the identification result of the current observation sequence. The expression of HMM is shown in Eq. (16).

(16)
$\lambda =\left(A,B,\pi \right)$

In Eq. (16), $A$represents the state PD, $B$represents the observation PD, and $\pi $represents the initial PD. To get proper HMM parameters, Baum Welch algorithm is used for training, and Eq. (17) is obtained.

(17)
$ P\left(O\left| \lambda \right.\right)=\sum _{J}P\left(O\left| I,\lambda \right.\right)P\left(I\left| \lambda \right.\right) $

In Eq. (17), $I$ represents unobservable hidden data; $O$means the observation sequence data. The max expectation algorithm is applied to realize the parameter learning of HMM algorithm. $Q$ function is denoted in Eq. (18).

(18)
$ Q\left(\lambda ,\overline{\lambda }\right)=\sum _{I}\log P\left(O,I\left| \lambda \right.\right)P\left(O,I\left| \overline{\lambda }\right.\right) $

In Eq. (18), $\overline{\lambda }$ represents the model parameters’ current estimate, and $\lambda $ represents the maximized model parameters. After the value of the $Q$ function is obtained, the HMM’s parameters are got by maximizing the $Q$function and combining with the Lagrange multiplier method. In the identification of illegal actions in basketball games, after the training of illegal action model, the research uses Viterbi algorithm to get the best solution of HMM. For a given HMM model and observation sequence data, the optimal path $I^{*}=\left(i_{1}^{*},i_{2}^{*},\ldots ,i_{T}^{*}\right)$, $T$ represents the length of sequence $I$. The aforementioned operations should be employed in order to identify and analyse any illicit actions that may be occurring within the game. The Viterbi algorithmprinciple is denoted in Fig. 5.

The research employs VDE to conduct object detection and tracking in the game video image. Subsequently, the action model is utilized to identify and construct a BFT model based on VDE technology. This model effectively judges the memory of each illegal action in the basketball game.

Fig. 5. Principle of Viterbi algorithm.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig5.png

4. Performance Analysis of Basketball Foul Tracking Method based on Video Detail Enhancement

4.1 Performance Analysis of Improved CamShift in Different Models

The sample used in the study consisted of video data from multiple basketball games, covering game scenes on different courts, teams, and under different lighting conditions. The criteria for sample selection included: Each game must contain at least three different types of fouls. The video should be of high definition quality to ensure clear and visible images. The research hardware configuration was an Intel Core i7 CPU, NVIDIA GeForce RTX 3060 GPU, 16GB RAM, high definition camera, supporting 1080p resolution. The WT method is applied to reduce the noise of the model. Therefore, the research analyzes the denoising effect of the model, and conducts performance detection for different recognition models. The image was denoised by WT. To verify the denoising effect, the denoising effect of the image of the x-axis acceleration of the same group of illegal actions: hit with a ball was compared as shown in Fig. 6.

Fig. 6. Noise reduction effect before and after wavelet change.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig6.png

In Fig. 6, before using WT to denoise, the fluctuation amplitude of the original signal was wider. The peak value of the fluctuation represents the redundant data and noise generated in the action image of hitting people with a ball, which has a great impact on the subsequent extraction and recognition; After the WT, the peak value of the signal was significantly reduced, indicating that the redundant data and noise were effectively eliminated. The results showed that the WT method was combined into the model, which could achieve the effect of denoising, while maintaining the trend of the original curve, and its graph was clearer, which could improve the tracking accuracy of subsequent video images [20].

Fig. 7. Recognition effects comparison of four models.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig7.png

The performance analysis of the action recognition model (model 1) to be built was studied, and the basketball game video in the video website was used to identify the illegal actions, and the training and testing of the recognition effect were compared, using the convolutional neural network (CNN) (model 2), support vector machines (SVM) (model 3) and BP neural network (model 4) to build the recognition model. The outcomes are denoted in Fig. 7. In Fig. 7, the more samples the model has, the lower the recognition rate of violations in Models 1, 2, 3 and 4. When the sample size was about 500, the recognition degree of the proposed model was 99.76%, and the recognition degree of the SVM model was 98.68%, 0.87% lower than that of the proposed model; When the sample was 4000, the identifications of the all models all reduce, but Model 1 had the smallest decline, with a recognition accuracy of 99.52%. The recognition of SVM model was 98.39%, 1.13% lower than Model 1. The recognition of CNN model was 98.49%, 1.03% lower than Model 1. The recognition of BP model was 98.32%, 1.20% lower than Model 1. In Fig. 7, Model 1 outperforms in stability and recognition accuracy.

To further verify the performance of the identification models, it comparedthe identification errors of the four models, as illustrated in Fig. 8. In Fig. 8, the maximum, minimum and average errors of the identification error curve of model 1 was 0.009,0.001, and 0.003, respectively, whichwas lower than that of the other three models as a whole. The maximum identification error of model 3, model 4, and model 2was 0.012, 0.018, and 0.016, respectively. Based on the above analysis, BFT could track and identify illegal actions in the game, and provide a strong guarantee for fair play.

Fig. 8. Identification error of four models.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig8.png

4.2 Performance Comparison of Tracking Algorithms Under Different Noises

The motion recognition model built by the research was verified with good performance. Now, the effectiveness of the improved CamShift algorithm used in the model was verified, including the accuracy test in different noise environments and the performance test in different scenes.

To assess the algorithm’s tracking effect, the improved CamShiftwascontrasted with the current popular TA with superior performance: CT, TLD, IVT, L1PAG, to prove that the target shape subset, light change subset, background interference subset, and subsets under three kinds of interference were tracked using the algorithm. The precision plot under different conditions were compared as shown in Fig. 9. In Fig. 9(a), when the threshold error was set at 50 in the target deformation environment, the accuracy of the improved CamShift was about 0.7, while the accuracy of the TLD algorithm was about 0.5; When the threshold error was small, the improved CamShift algorithm had more obvious advantages in recognition accuracy. In Fig. 9(b), when the threshold error was 50, the recognition accuracy of the improved CamShift algorithm and the TLD algorithm was almost the same, while the recognition accuracy of the CT algorithm was about 0.52 in the light changing environment; When the threshold error was 8, the accuracy of the improved CamShift algorithm was 0.1 - 0.3 higher than that of other algorithms. In Fig. 9(c), in the background interference environment, when the threshold error was 50, the recognition accuracy of the improved CamShift algorithm was about 0.72, the accuracy of the TLD algorithm was about 0.6, the accuracy of the CT algorithm was about 0.5, the accuracy of the IVT algorithm was about 0.47, and the accuracy of the LIPAG algorithm was about 0.42. The accuracy of the last four algorithms was far less than the improved CamShift algorithm. Fig. 9(d) reflects the average accuracy of each algorithm, while that of the improved CamShift algorithm was the highest, about 0.78, and was still higher than that of other algorithms when the error thresholdwas small. Therefore, in such complex scenes as object deformation, ray change and background disturbance, studying and improving CamShift had better tracking effect.

Fig. 9. Comparison of accuracy curves.
../../Resources/ieie/IEIESPC.2024.13.6.598/fig9.png

To further identify the effectiveness of the TA, recall (Re), precision (Pr), and F-measure were introduced to compare the quantitative indicators of the performance of the five TAs in follow scenarios: multi-modal background, light changes and bad weather, as denoted in Table 1. In Table 1, the improved CamShift using particle filter greatly raisedthe processing ability of the lifting algorithm for complex backgrounds. The performance index value of the algorithm was the highest in all scenes. The average Re value, average Pr value and F-measure value of the improved CamShift in the three scenarios were 0.90, 0.89 and 0.89, respectively; It is slightly higher than the TLD TA. The average Re value, average Pr value and F-measure value of the TLD TAwas 0.81, 0.80, and 0.80, respectively; The average Re value, average Pr value and F-measure value of theCT TAwas 0.81, 0.81, and 0.81, respectively. The other two algorithms had low quantization index values. The results showed that the improved CamShift had higher effectiveness and better robustness. Meanwhile, algorithms such as TLD and CT often had high time complexity due to the need for complex feature matching and model updates, resulting in certain performance bottlenecks in real-time applications. The tracking algorithm proposed by the research institute showed through data comparison that it had lower complexity and more advantages in processing efficiency and resource utilization compared to the comparative algorithm. To verify the performance of the proposed algorithm in terms of runtime and memory overhead, the results compared with the comparative algorithm are shown in Table 2.

According to Table 2, the algorithm proposed by the research had a running time of 0.24 seconds, which was the shortest compared to the comparative algorithms. The system memory overhead was 22% and it was in a low load state. Compared with the comparison algorithm, the algorithm proposed by the research had a lower memory usage ratio. Experimental data showed that the proposed algorithm outperformed the compared algorithms in terms of running time and memory overhead. The algorithm proposed by the research could quickly respond and maintain system stability when processing large-scale image data.

Table 1. Comparison of Average Performance of Tracking Algorithms.

Algorithm

Scene

Re

Pr

F-measure

Improved CamShift

Highway

0.90

0.88

0.89

Fountain

0.88

0.87

0.87

Wet Snow

0.92

0.91

0.91

TLD

Highway

0.82

0.81

0.81

Fountain

0.84

0.80

0.82

Wet Snow

0.76

0.79

0.78

CT

Highway

0.82

0.83

0.82

Fountain

0.79

0.80

0.79

Wet Snow

0.80

0.81

0.80

IVT

Highway

0.73

0.75

0.74

Fountain

0.69

0.70

0.69

Wet Snow

0.77

0.79

0.78

L1PAG

Highway

0.74

0.72

0.73

Fountain

0.71

0.74

0.72

Wet Snow

0.68

0.69

0.68

Table 2. Comparison Results of Running Time and Memory Overhead Performance of Different Algorithms.

Algorithm

Run time (s)

Memory overhead (%)

Improved CamShift

0.24

22

TLD

0.68

36

CT

0.74

38

IVT

0.82

42

L1PAG

0.88

44

5. Conclusion

In basketball matches, referees are easy to making subjective judgment errors. The reason lies in the influence of different observation results from different angles in the course of watching the game. The research purpose was to achieve objective judgment of athletes' movements, in order to reduce misjudgments and disputes caused by human factors. Therefore, RGB was used to grayscale the image, and WT algorithm was used to denoise the video image. Then the combination algorithm of TFD method and BS method was used to detect and extract moving objects in the image. The extracted image set of violations was to be inputted into the HMM recognition model to facilitate the tracking and identification of violations within the motion video. Therefore, a BFT model based on VDEwas built. Through experimentverification, the detection accuracy of the basketball game action recognition model built by the research was 0.9976, and the average error was only about 0.3%. The improved CamShift algorithm used in the model had a recall rate of 0.92, an accuracy rate of 0.91, and a comprehensive performance score of 0.91 in the Wet Snow scenario. The proposed model achieved good results in the experiment, however, there were still shortcomings in the research. The error caused by not combining the texture and shape information to process the same color information when considering the color information of the target was studied. Future work considers using CNN to automatically extract features for multi-level learning of color, texture, and shape information. By training CNN, they can more effectively distinguish the actual features of similar color targets, thereby improving accuracy.

REFERENCES

1 
B. Hessert, “The protection of minor athletes in sports investigation proceedings,” The International Sports Law Journal, Vol. 21, No. 1, pp. 62-73, Oct. 2021.URL
2 
W. Ma, and Y. Lv, “Feature extraction method of football fouls based on deep learning algorithm,” International Journal of Information and Communication Technology, Vol. 22, pp. 404-421, Aug. 2023.URL
3 
R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo, “Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light,” IEEE Trans. Circuits Systems Video T., Vol. 30, No, 12, pp. 4861-4875, Dec. 2020.URL
4 
L. Lai, and Y. Fang, “Automatic analysis and event detection technology of sports competition video based on deep learning,” Journal of Electrical Systems, Vol. 20, No. 6s, pp. 2025-2036, Dec. 2024.URL
5 
R. Hou, D. Zhou, R. Nie, D. Liu, L. Xiong, Y. B. Guo, and C. B. Yu, “VIF-Net: An unsupervised framework for infrared and visible image fusion,” IEEE Transactions on Computational Imaging, Vol. 6, pp. 640-651, Jan. 2020.URL
6 
T. Wang, K. Zhang, T. Shen, W. Luo, B. Stenger, and T. Lu, “Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method,” Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, pp. 2654-2662, Jan. 2023.DOI
7 
X. Guo, and Q. Hu, “Low-light image enhancement via breaking down the darkness,” International Journal of Computer Vision, Vol. 131, pp. 48-66, Mar. 2023.URL
8 
L. Wang, Y. Guo, L. Liu, Z. Lin, and W. An, “Deep video super-resolution using HR optical flow estimation,” IEEE Transactions on Image Processing, Vol. 29, pp. 4323-4336, Jan. 2020.URL
9 
P. Zheng, J. Zhang, “Quantitative nondestructive testing of wire rope based on pseudo-color image enhancement technology,” Nondestructive Testing and Evaluation, Vol. 34, No. 3, pp. 221-242, Mar. 2019.URL
10 
G. Tang, H. Zhao, C. Claramunt, and S. Men, “FLNet: A near-shore ship detection method based on image enhancement technology,” Remote Sensing, Vol. 14, No. 19, pp. 4857-4857, 2022.DOI
11 
J. Kim, and S. Chi, “Action recognition of earthmoving excavators based on sequential pattern analysis of visual features and operation cycles,” Automation in Construction, Vol. 104, pp. 255-264, Aug. 2019.URL
12 
N. Jaouedi, N. Boujnah, and M. S. Bouhlel, “A new hybrid deep learning model for human action recognition,” Journal of King Saud University - Computer and Information Sciences, Vol. 32, No. 4, pp. 447-453, May. 2020.URL
13 
F. Angelini, Z. Fu, Y. Long, L. Shao, and S. Naqvi, “2D pose-based real-time human action recognition with occlusion-handling,” IEEE Transactions on Multimedia, Vol. 22, No. 6, pp. 1433-1446, 2019.URL
14 
Z. Zhang, Z. Lv, C. Gan, and Q. Zhu, “Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions,” Neurocomputing, Vol. 410, pp. 304-316, Oct. 2020.URL
15 
H. Ge, Z. Yan, W. Yu, and L.Sun, “An attention mechanism based convolutional LSTM network for video action recognition,” Multimedia Tools and Applications, Vol. 78, No. 14, pp. 20533-20556, 2019.URL
16 
L. Duan, J. Liu, W. Yang, T. Huang, and W. Gao, “Video coding for machines: A paradigm of collaborative compression and intelligent analytics,” IEEE Transactions on Image Processing, Vol. 29, pp. 8680-8695, Aug. 2020.URL
17 
W. Zhang, S. Tang, Y. Cao, S. Pu, F. Wu, and Y. Zhuang, “Frame augmented alternating attention network for video question answering,” IEEE Transactions on Multimedia, Vol. 22, No. 4, pp. 1032-1041, 2019.URL
18 
M. Lu, Z. N. Li, Y. Wang, and G. Pan, “Deep attention network for egocentric action recognition,” IEEE Transactions on Image Processing, Vol. 28, No. 8, pp. 3703-3713, Aug. 2019.URL
19 
P. Antonik, N. Marsal, D. Brunner, and D. Rontani, “Human action recognition with a large-scale brain-inspired photonic computer,” Nature Machine Intelligence, Vol. 1, No. 11, pp. 530-537, 2019.URL
20 
M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, and Q.Tian, “Symbiotic graph neural networks for 3d skeleton-based human action recognition and motion prediction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, No. 6, pp. 3316-3333, Jun. 2021.URL
Chen Liu
../../Resources/ieie/IEIESPC.2024.13.6.598/au1.png

Chen Liu, born in July 1992, male, from Siping City, Jilin Province, Han ethnicity. He obtained a Bachelor's degree in Sports Training from Beijing Sport University in 2016 and a Master's degree in Sports Training from Beijing Sport University in 2019. His research focuses on basketball teaching and training. Work experience: From 2019 to 2024, worked as a physical education teacher at North China Electric Power University. Academic situation: Published 5 academic papers and 2 textbooks.