Mobile QR Code QR CODE




Target detection, Trajectory prediction, Neural network

1. Introduction

In our daily life, there are all kinds of information around us all the time. As a tool for human beings to capture, record and process all kinds of information, images play an important role in human daily life. With the development of information technology and computer technology, human beings are no longer satisfied with the simple application of images [1]. They begin to try to use cameras and computers instead of human brains to detect and track objects and process the captured images so that they can be used on more occasions. Therefore, computer stereo vision has been formed. The research of computer stereoscopic vision is the main research content in the field of computer vision, which can be divided into binocular stereovision and multi binocular stereovision. Among them, a multi binocular stereo vision system is a combination of binocular stereovision systems [2]. In recent years, due to the continuous progress of science and technology, research on computer stereo vision has become the focus. With the rapid development of the modern social economy and the continuous improvement of material living standards, people began to pursue a healthy lifestyle. Sports have gradually become a fashion. Among many sports, basketball has become the most popular sport in China's sports industry. Both the number of basketball courts and the number of basketball fans are far ahead of other ball games.

Basketball has become a fashion. Basketball, as a group project, can improve people's physical qualities, especially in the youth group. Basketball can help them develop teamwork ability and optimize their physical qualities. Basketball is a competitive sport, the game has lost and won. In order to win the game, we must improve the competitive level of athletes [3]. The training of core strength can effectively improve the hit rate of jump shots, and physical training can well prevent the damage caused by basketball. Additionally, basketball footwork training is more significant but is frequently overlooked. The NBA's use of small ball has led to the current game of basketball moving toward a faster rhythm, a more adaptable style, and a more complete system. Basketball players need to have outstanding feet and mobility technology because they must quickly switch between attacking and defending [4]. Basketball footwork training is crucial because it helps players develop their physical control and coordination, increase the stability of physical confrontation, successfully finish each round's attack and defence, and develop a sound tactical system. In addition, if we can get the running track of basketball players on the court and track it, we can observe the coverage of players' running and the positions that often appear [5]. The tracking chart of players' running on the field can help the coaching team restore the offensive and defensive paths of players on the field, and truly restore the implementation of tactics. The positions of the defensive and offensive ends of the players shall be arranged pertinently to give full play to the maximum value of each player. Basketball footwork training and player track tracks complement each other, providing professional theoretical analysis for basketball games. It can not only prevent players from injury but also better design defense and attack strategies and improve the tactical system [6]. The motion recognition system based on video images is shown in Fig. 1.

In the conventional sense, training consists of the coach watching the players train or compete from the sidelines, recording it, assessing it, directing the players using his or her years of teaching and training experience, and creating suitable training schedules for them. The following drawbacks, however, are included in this analysis and training approach [7]. First off, there are not enough coaches to match the number of players, which has an impact on how effectively basketball players train. Second, it is challenging to understand the underlying information that influences the players' footsteps when we observe the players' training with human eyes because we cannot precisely determine data such as the acceleration and angular velocity of the players' footsteps [8]. Finally, modern basketball is developing towards the trend of rapid movement, rapid passing, and changing tactics. Coaches cannot track players all the time, so it is inevitable that there will be some omissions. In a word, the traditional scheme has been unable to adapt to the training and competition of modern basketball, and a more intelligent, comprehensive solution is urgently needed [9]. This paper carries out the research on basketball target detection and rotating ball trajectory prediction based on deep learning. By stacking the neural network, we can achieve the task of predicting the trajectory, and meet the real-time and certain accuracy.

In the current development context of AI technology, the combination of artificial intelligence and robots is a trend, which also brings broader development space and research value to robot research. At the same time, it also brings more research possibilities to basketball robots. For the problems existing in the traditional methods mentioned above, deep learning methods can all make good improvements. However, the detection network of deep learning has a large computational load and requires a large amount of training data, which brings some new problems. Therefore, this article aims to address the shortcomings of traditional methods in basketball target detection and trajectory prediction tasks. It attempts to combine deep learning methods with stronger generalization and anti-interference abilities with the basketball visual system, and takes into account the rotation characteristics of the basketball while predicting the trajectory. Research on deep learning based basketball target detection and trajectory prediction of rotating balls is conducted

The innovation of this article is to improve the accuracy of detection and ensure sufficient detection speed on the basis of traditional methods, combined with intelligent vision technology and motion capture technology, which has a certain supporting effect on sports competition training

The paper's organization paragraph is as follows: The related work is presented in Section 2. Section 3 analyzes the methods of the proposed work. Section 4, discusses the experiments and results. Finally, in Section 5, the research work is concluded.

Fig. 1. The motion recognition system based on the video image.
../../Resources/ieie/IEIESPC.2024.13.6.553/fig1.png

2. Related Work

2.1 Research Status of Binocular Stereo Vision

With the development of computer science, optics, and image processing technology, binocular stereo vision technology is constantly developing and will continue to be practical and living. Early studies on binocular stereo vision were conducted in other nations. As an illustration, the United States and Japan have led the globe in binocular stereo vision research. China is still in the research and development stage at the moment, necessitating the constant study, development, and improvement of a large number of scientific researchers in order to advance stereo vision [10].

The research work on binocular stereo vision began in the 1960s. Roberts of Massachusetts Institute of Technology built a building block world from multiple polyhedrons as the detection environment, thus realizing the acquisition of three-dimensional information from the two-dimensional environment, which means the birth of a stereo vision. The Massachusetts Institute of Technology's Marr expanded the theoretical framework for computer stereo vision in 1977 by proposing a technique for getting stereo pictures from a parallax map. Since that time, computer vision has advanced [11]. Aspects like intelligent transportation, 3D range, robot navigation, virtual reality, and real-time positioning and tracking are where binocular stereo vision is currently most often used abroad. An intelligent traffic sensor was developed by MIT. The sensor calculates the depth of field of the target through radar positioning technology and binocular vision positioning technology and uses the method of image segmentation to segment the position of the target [12]. This method solves the problem that the traditional image segmentation method cannot be applied to the real-time high-speed movement of the target. Osaka University in Japan has developed a binocular vision adaptive servo device, which uses the binocular vision stereo ranging principle to take three relatively static objects in two images as reference objects [13]. The movement direction of the target is predicted using the video image's Jacobian determinant. The method achieves adaptive servo tracking of moving targets while dispensing with the requirement of a standard servo tracking system that requires camera parameters. The University of Tokyo has created a simulation robot dynamic path planning navigation system by fusing real-time binocular vision with overall robot attitude data [14]. In 2005, the machine vision science laboratory of the University of Florida in the central United States developed a cocoa system, which is based on the MATLAB platform and can realize target detection and tracking under the dynamic background of the UAV flight process [15]. The system first places the calibrated sensors that can be applied to computer stereo vision in the experimental environment, then analyzes the image of one camera, the difference between the images taken by two cameras and the coordinates of spatial points, and finally realizes the calculation of the three-dimensional coordinates of an object in space. The research status of binocular stereo vision is shown in Fig. 2.

Although the development of binocular stereo vision in China is relatively late, many research achievements have been made in recent years. Although great achievements have been made in the research of binocular stereo vision technology in China, most of the domestic research is on robot vision and industrial ranging, and the application of binocular stereo vision in the field of sports competition is still relatively lacking. Several high-speed cameras are built in the stadium for sports competitions. Through the three-dimensional ranging system, the real-time tracking and three-dimensional trajectory measurement of the target ball is realized [16]. This will be a great help to the training of athletes and the auxiliary judgment of referees in current sports competitions such as table tennis.

Fig. 2. The research status of binocular stereo vision.
../../Resources/ieie/IEIESPC.2024.13.6.553/fig2.png

2.2 Current Situation of Basketball Game Video Analysis

In recent years, scholars at home and abroad have done some research work on basketball game videos, mainly focusing on low-level visual feature extraction, scene classification, video content analysis based on multi-feature fusion, specific semantic shot detection, and so on [17]. There are numerous different analysis methods in use, however they can generally be broken down into three categories: the analysis method is founded on the video's underlying data. A technique for analysis based on the fusion of supplementary data and fundamental video features. a method of multi-feature fusion-based semantic model analysis.

Researchers use statistics of the main color distribution of the court in the basketball game video image to segment the game video into the game and pause video clips, and combine the shot duration to divide the video shots into game shots and nongame shots [18]. The camera motion and basketball prior knowledge are extracted from the video for high-level semantic analysis, and the fast break shot in the game is recognized from the camera motion parameters in the basketball game shot. Scholars divide video into wide-angle shots and close-up shots by extracting motion vectors from MPEG compressed information [19]. For a wide-angle lens, further analyze the camera movement to mark specific video content, such as steals, fast breaks, possible shots, etc. Through the video information to distinguish the motion of the object, the motion of the basketball is tracked, and the color feature and edge feature are used to automatically locate the backboard. The Hough transform is used to detect the backboard and basket, and the relationship between the basketball position and the backboard and basket is considered to judge whether it is a dunk or a long-distance shot [20]. The motion information of the basketball object in the basketball game video is extracted, and the semantic characteristics of its motion trajectory are analyzed.

2.3 Action Recognition Classification

In recent years, with the rapid development of computer vision technology and wearable sensor technology, human daily activities, and action recognition have been widely used in different scenes and fields, and become a hot research direction of many researchers at home and abroad [21]. At this stage, many schemes have been used for motion recognition, of which the two mainstream solutions are video image-based technology and wearable sensor-based technology. As shown in Fig. 3, they will be introduced respectively below.

Generally, the recognition scheme based on video images is to segment the target object from each frame of the image, extract the pose, action, and position of the target, and then apply various classification algorithms for action recognition. The researchers proposed a basketball event detection method based on multimodality [22]. By extracting audio and visual features from basketball videos, the feature mapping was established with audio keywords, and the classifier recognized 9 basketball events. The same is to detecting basketball events. Researchers use a group of filters to respond to the motion pattern of each video frame, create an energy redistribution function, establish a video analysis framework based on the Hidden Markov model, and classify 16 basketball events [23]. The key point trajectory of the action is extracted into the divided space-time sub-region, and the two-stage SVM is used to establish a high-dimensional training framework [24]. This method has high recognition accuracy in detecting typical space-time actions such as jump shots, standing shots, and layups. Different event recognition modules are set to analyze the action and generate the status, position, and mode of each object. In an environment with sufficient light and no occlusion, the scheme has a good recognition effect. In addition, the framework based on deep learning is also used in various motion recognition systems using video recording [25]. The researchers use the attention mechanism to continuously locate and track the target players, then use a cyclic neural network to extract the action features, and finally add an RNN to realize the action classification.

Reference [26] proposes a basketball trajectory tracking algorithm based on correlation filtering and fusion of multiple features. This method extracts relevant features from the background and target area of the basketball image, and achieves basketball trajectory tracking through feature response maps. However, this method does not denoise the basketball image and cannot accurately track the center coordinates of the basketball target, resulting in low tracking accuracy; Reference [27] proposes a dribbling oriented shooting trajectory tracking method based on symmetric algorithm. The maximum variance threshold method is used to partition the motion area of basketball orientation, and the Camshift method is used to mark the trajectory contour of dribbling orientation. Based on the contour extraction results, the grayscale pixel value feature points of the video sequence are fitted, and a dribbling oriented trajectory tracking model is established to accurately track the dribbling trajectory. However, this method is affected by Gibbs artifacts, Unable to accurately track the trajectory of basketball flight within the blind spot range

In order to obtain the best performance of motion recognition, multi-sensor and multi-algorithm fusion have become a new development trend. The scheme based on wearable sensors collects data through various sensors, processes and analyzes the data accordingly, and then uses various classification algorithms to realize action recognition. Researchers have proposed a wrist strap-based sensor system to identify the actions in basketball games [28].

The traditional method of obtaining technical parameters is to add sensors to basketball players, but the disadvantage of this method is that it may affect their game performance. However, the recorded images of basketball games usually have a unified shooting mode, and their fidelity and real-time interactivity provide strong support for obtaining technical parameters of basketball players. This can enable basketball players and coaches to achieve intuitive teaching and fast feedback, Furthermore, it can greatly reduce the likelihood of athlete injuries. Therefore, various sports action recognition and tracking technologies have been used to extract technical actions from these images, achieving human-computer interaction and bringing tremendous results in further improving athlete skills and protecting them from sports injuries

Fig. 3. Classification of action recognition methods.
../../Resources/ieie/IEIESPC.2024.13.6.553/fig3.png

3. Design of Application Model

In order to solve the problem of decreasing ability to detect small objects with increasing network depth, a feature fusion network was constructed. The feature layers with rich semantic information in the upper layer were fused with the feature maps with rich object position information in the lower layer, so that the network depth increased and the position information of the object could still be learned. Improved the network's ability to detect small targets. The network in this article can adapt to different lighting, environmental interference and other factors, with higher detection accuracy than traditional object detection algorithms and detection speed that meets the requirements of basketball vision systems.

3.1 Force Analysis and Motion Modeling

In this chapter, according to the existing research foundation, a motion model considering rotation is proposed and the discrete form of the motion model is derived. Define the world coordinate system. The Z-axis of the coordinate system is vertical to the ground, the Y-axis is parallel to the long side of the table, the X-axis is parallel to the short side of the ground, and the coordinate origin is the center point of the court. Then gravity, air resistance, and Magnus force can be described as [29]:

(1)
$ F_{g}=-\mathrm{m}\left[\begin{array}{lll} 0 & 0 & g \end{array}\right]^{\mathrm{T}} $
(2)
$ F_{d}=-\frac{1}{2}C_{d}\rho A\parallel V\left(t\right)\parallel V\left(t\right) $
(3)
$ F_{m}=\frac{1}{2}C_{m}\rho rA\left(\Omega \times V\left(t\right)\right) $

When basketball flies in the air, the attenuation of rotation speed is very small, so the rotation speed is regarded as a constant in this chapter. The magnitude of air resistance is directly proportional to the square of the flight speed, and the proportional coefficient is determined by the air resistance coefficient, air density, and the cross-sectional area of the basketball. The direction of air drag is opposite to the direction of flight speed. Given the comprehensive force, the motion model of the rotating basketball can be derived from Newtonian mechanics as follows.

(4)
$ \begin{array}{l} \dot{V}\left(t\right)=-\frac{1}{2m}C_{D}\rho A\parallel V\left(t\right)\parallel V\left(t\right)\\ +\frac{1}{2m}C_{m}\rho r\cdot A\left(\Omega \times V\left(t\right)\right)-\left[\begin{array}{l} 00g \end{array}\right]^{T} \end{array} $

The continuous motion model can directly calculate the motion state of a basketball at any time without iteration because it is aware of the initial motion state. This effectively eliminates the iteration error and cut-off error and also directly describes the relationship between the trajectory position time series and the initial motion state. This provides a necessary basis for using the optimal mathematical method to estimate the motion state of the rotating basketball. The formula is as follows [30].

(5)
$ \dot{V}\left(t\right)=\left[\begin{array}{lll} -k_{d}\parallel V\left(t\right)\parallel & -k_{m}\omega _{z} & k_{m}\omega _{y}\\ k_{m}\omega _{z} & -k_{d}\parallel V\left(t\right)\parallel & -k_{m}\omega _{x}\\ -k_{m}\omega _{y} & k_{m}\omega _{x} & -k_{d}\parallel V\left(t\right)\parallel \end{array}\right]V\left(t\right)+\left[\begin{array}{l} 0\\ 0\\ -g \end{array}\right] $

In this paper, the Fourier series is used to fit the attenuation law of flight speed with time. The formula is as follows.

(6)
../../Resources/ieie/IEIESPC.2024.13.6.553/eq6.png

The expression of the flight speed in the motion model relative to the initial motion state can be solved [31].

(7)
$ V\left(t\right)=h\left(t\right)V\left(0\right)-g\cdot d\left(t\right) $

According to the superposition of integrals, we can integrate the sub-functions in the above formula respectively to obtain the following formula.

(8)
../../Resources/ieie/IEIESPC.2024.13.6.553/eq8.png

In order to ensure a high goal success rate, we must accurately predict the position of the rotating basketball reaching the basket. We can deduce the constraint relationship between the rotation speed and the flight speed of two adjacent frames as follows [32].

(9)
$ \left[\begin{array}{l} v_{x}^{k+1}+k_{v}v_{x}^{k}\\ v_{y}^{k+1}+k_{v}v_{y}^{k}\\ v_{z}^{k+1}+k_{v}v_{z}^{k}+gT_{s} \end{array}\right]=\left[\begin{array}{lll} 0 & k_{m}v_{z}^{k}T_{s} & -k_{m}v_{y}^{k}T_{s}\\ -k_{m}v_{z}^{k}T_{s} & 0 & k_{m}v_{x}^{k}T_{s}\\ k_{m}v_{y}^{k}T_{s} & -k_{m}v_{x}^{k}T_{s} & 0 \end{array}\right]\left[\begin{array}{l} \omega _{x}\\ \omega _{y}\\ \omega _{z} \end{array}\right] $

The left part of the formula represents the acceleration caused by the Magnus force calculated according to the motion model using the flight speeds of two adjacent frames. The right part of the formula represents the constraint relationship between the current flight speed and rotation speed in line with the current Magnus force definition. The gnu’s force is perpendicular to the rotation speed, so we can also get:

(10)
$ \left[\begin{array}{l} v_{x}\left(k+1\right)+k_{v}v_{x}\left(k\right)\\ v_{y}\left(k+1\right)+k_{v}v_{y}\left(k\right)\\ v_{z}\left(k+1\right)+k_{v}v_{z}\left(k\right)+gT_{s} \end{array}\right]\times \left[\begin{array}{l} \omega _{x}\\ \omega _{y}\\ \omega _{z} \end{array}\right]=0 $

It can be seen that the state estimation of rotation speed is essentially equivalent to the state estimation of flight speed. The continuous motion model derived in this paper essentially describes the motion law of basketball in a period of time under the current motion state, which makes it possible to use the constraints of the motion model to optimally estimate the motion state according to the observation information of the continuous multi-frame trajectory position time series. Firstly, the curve coincidence degree is defined as the sum of the Euclidean distance between the trajectory prediction value and the trajectory observation value of consecutive multiple frames.

(11)
../../Resources/ieie/IEIESPC.2024.13.6.553/eq11.png

It can be seen that the smaller the sum of Euclidean distances, the higher the curve coincidence between the trajectory prediction value and the trajectory observation value. The sine and cosine functions, which are high-order nonlinear functions about the motion state, are abundant in the motion model. The steps of dynamic state estimation and trajectory prediction are shown in Fig. 4.

Fig. 4. The steps of dynamic state estimation and trajectory prediction.
../../Resources/ieie/IEIESPC.2024.13.6.553/fig4.png

3.2 Target Detection based on Feature Fusion Network

In basketball target detection tasks, a large amount of training data is also required to train the network. Each image needs to be manually annotated, labeled, and labeled with category and basketball position information. In the task of trajectory prediction, basketball trajectory data is required. Creating these data requires a significant amount of time and effort, and there is currently no publicly available large basketball dataset available for use. Therefore, in order to complete the research in this article, a large amount of basketball image data was manually collected under different environments, colors, and lighting conditions, and further constructed the dataset required for network training

The vision system's main responsibility with the basketball robot system is to quickly and precisely recognise the target and position of the basketball. Mixup is the process of combining two random images in a specific ratio. Additionally, the classification will be distributed based on the size of each image. The implementation method is as follows.

(12)
$ \overset{˜}{x}=\lambda x_{i}+\left(1-\lambda x_{j}\right) $
(13)
$ \overset{˜}{y}=\lambda y_{i}+\left(1-\lambda y_{j}\right) $

The implementation of Mixup is simple and does not increase the amount of computation. It is one of the good data enhancement methods. In order to enhance the detection and location ability of the network for small targets, it is necessary to fuse the underlying feature information in the convolution down sampling process. The output of each layer can be used to detect the object category and position. The FFN network consists of two lines, a bottom-up line and a top-down line, which are horizontally linked. This can make use of the underlying positioning details so that the network can learn more accurate location information while learning the target features, especially for the detection of small objects.

Structure diagram of feature fusion network is shown in Fig. 5.

Fig. 5. Structure diagram of feature fusion network.
../../Resources/ieie/IEIESPC.2024.13.6.553/fig5.png

The depth increase of the model also loses the position information of many objects in the original image, which brings great difficulties to the detection of small objects. It can be called the sensitivity of the basis of each neuron, which means that the error will change as much as the basis changes, as shown in follow.

(14)
$ \delta ^{l}=\left(W^{l+1}\right)^{T}\delta ^{l+1}\circ f'\left(u^{l}\right) $

Compared with down sampling, up sampling is used to enlarge the resolution of the image, and the quality of the enlarged image can exceed that of the original image. In the structure of the traditional neural network, the model does not pay attention to the impact of information processing at the previous time on the current time. Therefore, this paper uses the LSTM network to predict the trajectory of basketball.

The partial code of this article is as follows:

frame nums # Total video frames

frame stride = mod (frame_nums ,16) # Take the step size, taking 16 frames as an example

frame_count =1 # Start Framecount = 1

while True:

if frame_count == count:

gray_frame # Grayscale processing of image frames

save frame # Save this frame

count += frame stride

frame count += 1

4. Results and Analysis

The training data in this paper are from 400 basketball tracks collected in the laboratory and 1000 basketball tracks obtained on the network, of which 1200 are used as the training data set and 200 as the test data set. In the aspect of prediction step size, the network with step sizes of 5, 10, and 20 are trained with 1000 epochs, and the experimental results are analyzed. The experimental error results are shown in Table 1.

As can be seen from the above table, when the prediction step size gradually increases, the prediction error also increases. The possible reason is that in the training process, the error will also be superimposed and will be enlarged with the increase of step size, but the error range is within the acceptable range. In this paper, according to different training steps, input the trajectory data of the first 15 frames, and experiment with the error of the network when predicting 30, 40, 50, and 60 frames to analyze the cumulative error of prediction with the increase of the number of frames. The experimental error results are shown in Table 2 and Fig. 6 below.

The image set in this article mainly comes from regular competition video frames, which are filtered and included in the database of this article

In terms of real-time performance, this paper calculates the time used to predict the 40th and 60th frames under three steps to verify the feasibility of the network, and compares it with the physical model method. It can be seen that the physical model has great real-time advantage in prediction time; with the increase of step size, the time required for network prediction is also increasing. The speed of forecasting 60 frames at a step size of 20 is 87.96 ms, which is considerably slower than the network performance under the first two stages but still satisfies the real-time criteria. Table 3 shows the time of the three steps in predicting 40 and 60 frames.

The purpose of trajectory prediction is to get an accurate hitting point which is helpful to increase the success rate of returning the ball. In the prediction experiment of hitting point, this paper compares the flight model and rebound model of basketball. The predicted time-consuming results are shown in Fig. 7.

The three groups of straight-line trajectories in this experiment are 14 meters from the bottom line position to the center line position of the basketball court. The estimated distance of the three groups of straight-line trajectories is compared with the real distance, and their errors are calculated. Basketball vision system needs to accurately predict the trajectory of rotating basketball and judge the rotation type of basketball. Although the traditional trajectory prediction method based on physical model and motion modeling can meet the requirements of vision system, its prediction accuracy, especially for the judgment of rotating ball, is not ideal. Experiments show that the proposed network is closer to the real value in the accuracy of prediction points, and the accuracy is greatly improved compared with the traditional algorithm. The prediction speed meets the requirements of basketball vision system.

Compare and analyze the method proposed in this article with the method proposed in reference [7], and calculate the accuracy of the two models in capturing basketball motion trajectories. The experimental results are shown in Table 4 below.

The comparison of running time between the two models is shown in Table 5

From the above data statistics, it can be seen that the method model proposed in this article has significant advantages compared to traditional models

Fig. 6. Error of three-step sizes in 30, 40, 50, and 60 frame prediction.
../../Resources/ieie/IEIESPC.2024.13.6.553/fig6.png
Fig. 7. The predicted time-consuming results.
../../Resources/ieie/IEIESPC.2024.13.6.553/fig7.png
Table 1. Error under three steps.

5

10

20

Training error

5.915

6.268

7.217

Test error

6.013

7.121

8.327

Table 2. Error of three-step sizes in 30, 40, 50, and 60 frame prediction.

Prediction frame

30

40

50

60

Step 5*10^3

7.30

8.40

9.70

11.20

Step 10*10^3

7.70

8.20

9.10

10.60

Step 20*10^3

8.20

9.10

10.20

10.80

Comparison method 1

11.30

15.60

21.30

28.90

Comparison method 2

12.50

17.30

24.70

31.50

Table 3. Time of the three steps in predicting 40 and 60 frames.

Prediction frame

40 frame time

60 frame time

Step size is 5 (MS)

21.30

44.80

Step size is 10 (MS)

33.60

69.30

Step size is 20 (MS)

45.20

87.90

Comparison method 1

17.60

35.30

Comparison method 1

16.30

33.80

Table 4. Statistics on the accuracy of basketball rotation trajectory recognition.

NO.

The method of this article

The method of reference [7]

1

83.96

84.87

2

89.84

83.15

3

90.18

83.32

4

88.29

79.80

5

87.22

80.03

6

88.88

85.45

7

87.35

82.79

8

83.17

84.96

9

90.83

78.68

10

85.86

80.39

Table 5. Comparison of Model Running Times.

Number of frames

The method of this article(ms)

The method of reference [7](ms)

100

102

211

500

350

521

1000

622

754

5. Conclusion

The vast expansion of sports video data has generated a great deal of interest in content-based sports video analysis. Basketball game has emerged as one of the hottest topics in the field of content-based video retrieval and analysis due to its significance as a component of sports videos. An efficient description technique is designed to extract video semantic elements in order to reduce storage requirements and meet the individualized needs of consumers. In this topic, basketball video is used as the analysis object. Basketball objects are found, tracked, and their motion tracks are extracted. A specific region enhancement method based on the resolution feature map is proposed. Firstly, the feature map of the image is extracted according to the selective attention mechanism of the human eye to locate the object and then weighted with the image gray to enhance the basketball candidate region. The basketball object template is established according to the basketball detection results, and the two-level scaling algorithm is used to search the search area globally in the subsequent frames. Furthermore, combined with prior knowledge, the three-dimensional coordinates of basketball are estimated, and the trajectory of basketball in three-dimensional space is reproduced.

The shortcomings and potential areas for this paper's improvement are listed below. There will be more influence from the environment in the competition, and the original richness of the data set in this work is still absent. Additionally, this paper's network structure is appropriate for various ball sports as well as basketball situations. It can be used for more ball game scenes as long as sufficient rich data sets are created and retrained on the basis of this network. In the basketball recognition system, the decision-making system and control system also play a vital role. Basketball has an obvious relative motion to the camera, so it is inaccurate to reconstruct its motion only from the single view image sequence. The next step is to collect video data from multiple perspectives for reconstruction.

The proposed network has significant real-time advantages in predicting time, and its accuracy in predicting points is closer to the true value. Compared with traditional algorithms, the accuracy has been greatly improved. It can help basketball players practice in daily training, saving training costs and improving training efficiency. In subsequent basketball training, the system in this article can be used for auxiliary training

In terms of tracking basketball trajectory, the target tracking algorithm proposed in this article has improved tracking accuracy, but there are still certain errors. Therefore, further research is needed on tracking algorithms for fast moving small targets.

The algorithm architecture proposed in this article is aimed at basketball technical actions. How to apply it to short video platforms, how to present it to the web, mini programs, and how to recognize and label technical actions in videos and push them to target users. These directions are also worth further research.

REFERENCES

1 
Liu L, Hodgins J. Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Transactions on Graphics (TOG), 2018, 37(4): 1-14.DOI
2 
Li B, Xu X. Application of artificial intelligence in basketball sport. Journal of Education, Health and Sport, 2021, 11(7): 54-67.DOI
3 
Liu N, Liu P. Goaling recognition based on intelligent analysis of real-time basketball image of Internet of Things. The Journal of Supercomputing, 2022, 78(1): 123-143.DOI
4 
Tsai W L, Pan T Y, Hu M C. Feasibility study on virtual reality based basketball tactic training . IEEE Transactions on Visualization and Computer Graphics, 2020.DOI
5 
Wen P C, Cheng W C, Wang Y S, et al. Court reconstruction for camera calibration in broadcast basketball videos . IEEE transactions on visualization and computer graphics, 2015, 22(5): 1517-1526.DOI
6 
Kang S Y, Park K H. Design and Implementation of Basketball Game Content based on Unity Game Engine and Android Platform. The Journal of Digital Contents Society, 2020, 21(9): 1567-1573.URL
7 
Min B J. Application of Monte Carlo simulations to improve basketball shooting strategy . Journal of the Korean Physical Society, 2016, 69(7): 1139-1143.DOI
8 
Liu W, Yan C C, Liu J, et al. Deep learning based basketball video analysis for intelligent arena application. Multimedia Tools and Applications, 2017, 76(23): 24983-25001.DOI
9 
Chen W, Lao T, Xia J, et al. Gameflow: narrative visualization of NBA basketball games . IEEE Transactions on Multimedia, 2016, 18(11): 2247-2256.DOI
10 
Yang L, Wang B, Zhang R, et al. Analysis on location accuracy for the binocular stereo vision system . IEEE Photonics Journal, 2017, 10(1): 1-16.DOI
11 
Jia Z, Yang J, Liu W, et al. Improved camera calibration method based on perpendicularity compensation for binocular stereo vision measurement system. Optics express, 2015, 23(12): 15205-15223.DOI
12 
Luo Z, Zhang K, Wang Z, et al. 3D pose estimation of large and complicated workpieces based on binocular stereo vision. Applied Optics, 2017, 56(24): 6822-6836.DOI
13 
Xu P, Ding X, Wang R, et al. Feature-based 3D reconstruction of fabric by binocular stereo-vision. The Journal of The Textile Institute, 2016, 107(1): 12-22.DOI
14 
Wang Y, Wang X. On-line three-dimensional coordinate measurement of dynamic binocular stereo vision based on rotating camera in large FOV. Optics express, 2021, 29(4): 4986-5005.DOI
15 
Xue T, Xu L, Zhang S. Bubble behavior characteristics based on virtual binocular stereo vision. Optoelectronics Letters, 2018, 14(1): 44-47.DOI
16 
Read J C A. Stereo vision and strabismus . Eye, 2015, 29(2): 214-224.DOI
17 
Lemme N J, Li N Y, Kleiner J E, et al. Epidemiology and video analysis of Achilles tendon ruptures in the National Basketball Association . The American Journal of Sports Medicine, 2019, 47(10): 2360-2366.DOI
18 
Losada A G, Therón R, Benito A. Bkviz: A basketball visual analysis tool . IEEE computer graphics and applications, 2016, 36(6): 58-68.DOI
19 
Panagiotakis E, Mok K M, Fong D T P, et al. Biomechanical analysis of ankle ligamentous sprain injury cases from televised basketball games: understanding when, how and why ligament failure occurs . Journal of science and medicine in sport, 2017, 20(12): 1057-1061.DOI
20 
Shih H C. A survey of content-aware video analysis for sports . IEEE Transactions on Circuits and Systems for Video Technology, 2017, 28(5): 1212-1231.DOI
21 
Wang L, Xiong Y, Wang Z, et al. Temporal segment networks for action recognition in videos . IEEE transactions on pattern analysis and machine intelligence, 2018, 41(11): 2740-2755.DOI
22 
Chen C, Jafari R, Kehtarnavaz N. A real-time human action recognition system using depth and inertial sensor fusion . IEEE Sensors Journal, 2015, 16(3): 773-781.DOI
23 
Shahroudy A, Ng T T, Gong Y, et al. Deep multimodal feature analysis for action recognition in rgb+ d videos. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(5): 1045-1058.DOI
24 
Ding S, Qu S, Xi Y, et al. Stimulus-driven and concept-driven analysis for image caption generation . Neurocomputing, 2020, 398: 520-530.DOI
25 
Yu N, Zhai Y, Yuan Y, et al. A bionic robot navigation algorithm based on cognitive mechanism of hippocampus . IEEE Transactions on Automation Science and Engineering, 2019, 16(4): 1640-1652.DOI
26 
Amor B B, Su J, Srivastava A. Action recognition using rate-invariant analysis of skeletal shape trajectories . IEEE transactions on pattern analysis and machine intelligence, 2015, 38(1): 1-13.DOI
27 
Yao Q. Adaptive finite-time sliding mode control design for finite-time fault-tolerant trajectory tracking of marine vehicles with input saturation . Journal of the Franklin Institute, 2020, 357(18): 13593-13619.DOI
28 
Demir B E, Bayir R, Duran F. Real-time trajectory tracking of an unmanned aerial vehicle using a self-tuning fuzzy proportional integral derivative controller . International Journal of Micro Air Vehicles, 2016, 8(4): 252-268.DOI
29 
Zhang Q, Mills J K, Cleghorn W L, et al. Trajectory tracking and vibration suppression of a 3-PRR parallel manipulator with flexible links. Multibody System Dynamics, 2015, 33(1): 27-60.DOI
30 
Bouakrif F, Zasadzinski M. High order iterative learning control to solve the trajectory tracking problem for robot manipulators using Lyapunov theory . Transactions of the Institute of Measurement and Control, 2018, 40(15): 4105-4114.DOI
31 
Podder, P.; Das, S.R.; Mondal, M.R.H.; Bharati, S.; Maliha, A.; Hasan, M.J.; Piltan, F. LDDNet: A Deep Learning Framework for the Diagnosis of Infectious Lung Diseases. Sensors 2023, 23, 480.DOI
32 
Hasan MJ, Sohaib M, Kim J-M. An Explainable AI-Based Fault Diagnosis Model for Bearings. Sensors. 2021; 21(12):4070.DOI
Hong Liu
../../Resources/ieie/IEIESPC.2024.13.6.553/au1.png

Hong Liu, a teacher of Henan Light Industry Vocational College, graduated from Henan University with a major in physical education. During her teaching period, she wrote and published nearly ten papers and participated in the compilation of four textbooks. Participate in department-level projects, preside over provincial projects and school-level projects. In terms of teaching and scientific research, it is the main force of the college.