3.1 Image Data Preprocessing and Target Detection
Common mistakes and wrong decisions have always been the cause of conflict on the
court. The action standard in basketball matches is very strict. To reduce mistakes
and wrong decisions in basketball matches, it is necessary to accurately detect players'
foul behaviors. The steps for tracking and identifying violations in sports games
include image data preprocessing, motion target detection, object tracking, and foul
behavior recognition.VDE technology is utilized to process and analyze the game video.
Before tracking and identifying the foul actions in the game, this technology needs
to preprocess the motion video image data, and then achieve the tracking and recognition
of foul actions in the ball game [16]. First, the image is transformed into a grayscale image through the red, green, and
blue (RGB) color mode. The processing function is denoted in Eq. (1) below.
In Eq. (1), $\alpha $, $\beta $ and $\gamma $ are the values of the basketball players' action
images that are weighted.$I_{i}(x,y)$ represents the value of the processed grayscale
image at point $(x,y)$.When the athlete's action image is captured in natural light
and in a single light, the image weight is set to 1 and 0. Following the grayscale
processing and weighting of the image, the WT algorithm is employed to denoise the
data signal. The WTequation is denoted in the following Eq. (2).
In Eq. (2), $a$ represents variable scale, and $\tau $ represents variable translation. After
multiple WTs, the obtained signal components are denoted in Eq. (3).
In Eq. (3), $S$ represents the initial signal, $D_{n}$ represents the noise signal got after
$n$WTs, and $A_{n}$ represents the effective signal got after $n$WTs. Through experiment,
the 3-layer WT has the optimum denoising effect. The preceding content demonstrates
the denoising and graying processing of basketball game video action data. It is necessary
to detect moving objects, which is to take the color, shape, position, size and other
information of the object in each frame of the video stream image. The essence of
the video sequence is 3D data containing time dimensions. The TFD method is used to
extract the moving target, as shown in Fig. 1.
Fig. 1. Diagram of TFD method.
First, the adjacent three frame degree values $I_{k-2}$, $I_{k-1}$ and $I_{k}$ are
collected for operation, and the absolute value of the adjacent two difference is
calculated to the difference graph equation, as shown in Eq. (4).
Two adjacent binary differences are binarized, and the expression is shown in Eq.
(5).
In Eq. (5), $d$and $\delta $ mean the average value and standard deviationof the difference
graph, and $T$ is the threshold value. The moving target information is obtained by
performing or operation on two binary images, as denoted in Eq. (6).
The Eq. (6) is applied to process the image. There are still some limitations in the research.
When moving objects are extracted, there will be holes. Therefore, the research combines
the BS method to avoid this phenomenon. Eq. (7) is obtained from the difference of BS method.
In Eq. (7), $I_{k}(x,y)$represents the current degree value and $B\left(x,y\right)$represents
thebackgroundgray value. $d\left(x,y\right)$ is $\overline{d}$ and $\delta $ of $DB\left(x,y\right)$
pixels of the value image of the difference absolute value image $DB\left(x,y\right)$,
and the sum standard deviation is set to $T$, and then the Eq. (8) is obtained by binarization.
In Eq. (8), $\alpha $ represents a number; $w$ indicates degree; $h$ represents afigure. Eq.
(9) is obtained by "AND" operation of the moving target information obtained by the combination
method.
Once the requisite data has been gathered, the background is updated in a manner that
is responsive to the specific circumstances. The update expression is shown in Eq.
(10).
In Eq. (10), $u$ is a fixed value, and the best value is 0.997 through research and calculation.
The fusion algorithm flow chart is shown in Fig. 2.
Based on the aforementioned contents, this paper studies and realizes the preprocessing
of basketball game action video images, and applies the integrated algorithm of TFD
and BS methods to detect and extract image objects.
Fig. 2. Flow chart of fusion algorithm for object detection.
3.2 Object Tracking based on Improved CamShift
Once the object has been identified and extracted, it is essential to track it in
order to identify subsequent violations in real time. To save the running time and
achieve the effect of close tracking, the improved CamShift is utilized to realize
object tracking in basketball game video. CamShift algorithm is a continuous adaptive
mean shift (MS) algorithm. CamShift algorithm is mainly used for iterative processing
of video sequences [17,18]. The MS algorithm is a member of the kernel density estimation method, which describes
both the target model and the candidate model through the probability of the pixel
value within the specified and the candidate regions. It can be supposed that there
is a high-dimensional space and $n$ sample points exist in the space, then the MS
vector of the sample points is shown in Eq. (11).
In Eq. (11), $x_{i}$ refers to the sample point, $k$ refers to the number of samples falling
into $S_{h}$ regions, $S_{h}$ refers to the high-dimensional sphere region with radius
of $h$, and is the set of $y$ points meeting the relationship of Eq. (12).
Considering the influence of the distance of each pixel, the basic MS form is extended
to Eq. (13) by introducing kernel function.
In Eq. (13), $G\left(x\right)$represents a unit kernel function, and $\omega $means the weight
value assigned to the sampling point. Use Eq. (13) to iterate and get the following Eq. (14).
The value of $m_{h}\left(x\right)$ is calculated and assigned to $x$. After that,
$M_{h}\left(x\right)$ is calculated again. If the $M_{h}\left(x\right)$’s absolute
value is lower than the fault tolerance error, the cycle is ended to obtain the final
object location. Unless, the calculation continues. The diagram of the MS TA is denoted
in Fig. 3.
Fig. 3. Schematic diagram of MS TA.
The CamShift algorithm employs the mode of the probability distribution (PD) image
identified by the MS algorithm. In the context of subsequent video detection, a feedback
loop is introduced, whereby the outcome of the previous detection process is employed
as the input for the subsequent detection. Furthermore, the search region is constrained
to a region surrounding the most recent known target position. Once the tracking probability
model has been established, the tracking object should be positioned at the center
of the tracking window, and its expression is indicated in Eq. (15).
In Eq. (15), $W$represents a search window, $p_{k}$ is the initial center point of the search
window, and $f\left(p\right)$ is the MS climb gradient equation. A novel center point
$\widehat{\overline{p}}_{k}$ is found through dynamic iteration. During the course
of research, it was discovered that the target may be lost because of occlusion of
the tracking object. To address this issue, a Kalman filter is employed to forecast
the motion parameters, thereby enabling the adjustment of the position of the search
window and the compensation for the lost target [19].
3.3 Analysis and Recognition of Foul Action based on HMM
In a basketball game, mistakes usually occur when judging foul and approximate foul
actions only by the referee's eyes. In the previous research, the fusion algorithm
combining TFD method and BS method has been used to detect and extract moving objects,
and the improved CamShift algorithm has been applied to realize the tracking of moving
objects in the sports video of ball games. After that, it is necessary to recognize
the foul actions of athletes, and the research uses HMM algorithm to extract and recognize
the foul actions of video target objects. The HMM identification is indicated in Fig. 4.
Fig. 4. HMM foul action identification.
There are 6 types of foul action data sets of basketball players: invading the foul
with hands, blocking the foul, excessive elbow swing, pulling, pushing, and hitting
with the ball. First of all, each violation action is targeted to be modeled, and
the data volume of each violation action is set to 120, and the action is decomposed
into $n$ meta actions. These meta actions in the process of sports have time sequence,
so each game violation action is regarded as an observation sequence with a length
of $n$, and it is trained and learned to explore the best HMM parameters. Once the
optimal parameters have been identified, the extracted observation sequence data can
be utilized as the input data for the HMM, with the Viterbi algorithm subsequently
employed to determine the probability of each action within the video. The action
corresponding to the model with the highest output probability is the identification
result of the current observation sequence. The expression of HMM is shown in Eq.
(16).
In Eq. (16), $A$represents the state PD, $B$represents the observation PD, and $\pi $represents
the initial PD. To get proper HMM parameters, Baum Welch algorithm is used for training,
and Eq. (17) is obtained.
In Eq. (17), $I$ represents unobservable hidden data; $O$means the observation sequence data.
The max expectation algorithm is applied to realize the parameter learning of HMM
algorithm. $Q$ function is denoted in Eq. (18).
In Eq. (18), $\overline{\lambda }$ represents the model parameters’ current estimate, and $\lambda
$ represents the maximized model parameters. After the value of the $Q$ function is
obtained, the HMM’s parameters are got by maximizing the $Q$function and combining
with the Lagrange multiplier method. In the identification of illegal actions in basketball
games, after the training of illegal action model, the research uses Viterbi algorithm
to get the best solution of HMM. For a given HMM model and observation sequence data,
the optimal path $I^{*}=\left(i_{1}^{*},i_{2}^{*},\ldots ,i_{T}^{*}\right)$, $T$ represents
the length of sequence $I$. The aforementioned operations should be employed in order
to identify and analyse any illicit actions that may be occurring within the game.
The Viterbi algorithmprinciple is denoted in Fig. 5.
The research employs VDE to conduct object detection and tracking in the game video
image. Subsequently, the action model is utilized to identify and construct a BFT
model based on VDE technology. This model effectively judges the memory of each illegal
action in the basketball game.
Fig. 5. Principle of Viterbi algorithm.