3.1 Construction of the V-DBN Model
                  The characteristics of children’s physical and mental development are different from
                     those of adults. It is necessary to grasp the characteristics of children’s sports
                     development when carrying out football training to avoid injuries and ensure the development
                     of children’s football education. Therefore, the characteristics of children’s football
                     movements are collected based on 3D bone recognition, and recognition of children’s
                     movements can be completed through the proposed V-DBN model. In children’s movements,
                     3D skeleton recognition completes the extraction of motion features with aggregate
                     data (skeleton position data), and provides them to the human motion recognition model
                     [9]. The human motion data are collected by the Kinect device, and the human skeleton
                     point collection structure can be seen in Fig. 1.
                  
                  Position information for key human bones is collected, and each point represents position
                     information and time data, with a bone point in each frame represented by a row. In
                     this research, bone recognition is divided into two types. Scheme A uses the frame
                     selection range of bone points as a group to solve time series features, as shown
                     in Fig. 2(a). Scheme B divides the skeleton into three layers from the outside to the inside,
                     as shown in Fig. 2(b) [10]. The skeletal point division method of Scheme A can better distinguish changes in
                     the human body, but lacks consideration of changes within the bones, whereas Scheme
                     B fully considers the effects of each bone joint, but lacks consideration of changes
                     in specific parts. On the whole, Scheme B considers the positional relationship of
                     each bone point, so it was used to calculate the time series characteristics of human
                     actions.
                  
                  Time series and spatial features are used to reflect the changes in human motion,
                     so in the description of the time series relationship, the bone points are represented
                     by displacement $x$. The time relationship description is represented by acceleration
                     $a$and velocity $v$, and the displacement expression is shown in formula (1) [11]:
                  
                  
                  In formula (1), $p$ is a bone’s three-dimensional space position parameter $(x,y,z)$, $i$ represents
                     a certain bone point of the person, and $f$represents the current frame. The speed
                     expression is shown in formula (2):
                  
                  
                  In formula (2), $\Delta t$ represents $\left[f-1,f+1\right]$, the number of frames between bone
                     points, and the acceleration expression is shown in formula (3):
                  
                  Editor\textemdash{}Highlight\textemdash{}Is this the intended meaning? If not, please
                     clarify (i.e., between what and what?).
                  
                  
                  In order to describe spatial positions in human motion more conveniently, three time
                     series are defined and processed, as shown in formula (4):
                  
                  
                  Considering that human action features have the characteristics of both space sequences
                     and time series, the time series describes the movement of bone points through three
                     types of motion features, and there are certain differences between the features due
                     to the differences in human actions [12]. Especially when people are moving slowly, there is not a big difference between
                     speed and acceleration in people’s movement, and movement is mainly described by displacement.
                     Then, the positional relationship between the reference point and the bone point is
                     shown in formula (5):
                  
                  
                  In formula (5), $p_{i}^{f}$ and $p_{j}^{f}$ represent spatial relative position parameter $f$ of
                     skeleton node $i$ and at frame time $j$, respectively. Table 1 shows the relationships between the four relative spatial position parameters.
                  
                  The spatial position feature is similar to the time series expression, and can be
                     described by concatenation, as shown in formula (6):
                  
                  
                  In the description of a single-action feature, similar individual actions will be
                     the same in the description process, especially when the distance between classes
                     is small and the actions cannot be distinguished [13]. For example, running and walking are close in description, which will affect the
                     action recognition effect. Therefore, combining multiple features and expanding the
                     distance between different actions can improve the effectiveness of human action recognition.
                     Therefore, the V-DBN action recognition model (combining the VLAD model and the DBN
                     network) is used for optimization. By unifying the length of action features through
                     the VLAD model, it is necessary to encode the action features so as to more fully
                     describe them [14]. Arbitrary time series expressions are seen in formula (7):
                  
                  
                  In (7), the $n$ value is taken from $\left\{1,2,3\right\}$ so $t_{n}^{l}$ represents the
                     $l$ characteristic subsequence in the time dimension, with $L$representing the total
                     number. Then, any spatial position can be expressed with (8):
                  
                  
                  In formula (8), $S_{m}^{l}$ represents the $l$ feature subsequence in the mth spatial position,
                     and the $m$ value is taken from $\left\{1,2,3,4\right\}$. The temporal and spatial
                     feature sequences are converted into frames, and a total of 24 cluster centers can
                     be obtained through the clustering algorithm [15]. Use Euclidean distance to calculate the distance between the cluster center and
                     each of the feature data to get 24 clusters; then, the expression of time clusters
                     is as seen in test (9):
                  
                  Editor - Highlight - Is this the intended meaning? If not, please define as intended.
                  
                  In test (9), $N_{u,n,qn}$ represents the time clustering center, $t_{n,f}$ represents the sequence
                     unit time frame, and $\mu _{u,np}$ represents the distance coefficient corresponding
                     to time. Then, the spatial cluster expression is seen in (10):
                  
                  Editor\textemdash{}Highlight\textemdash{}Is this intended (not NN?)
                  
                  In (10), $NN_{u,m,qm}$ represents the spatial clustering center, $s_{n,f}$ represents the
                     sequence unit spatial frame, and $\mu _{u,mp}$ represents the distance coefficient
                     corresponding to space. Each human action, $U\times 23$, is described by vectors of
                     the same dimension, and each action vector contains four space sequences and three
                     time series features [16]. The large amount of data will affect human action recognition. Therefore, the contrastive
                     divergence method is used to reduce data processing, and the principle of the V-DBN’s
                     football action recognition for children can be seen in Fig. 3.
                  
                  In Fig. 3, the VLAD model is responsible for unification of feature data. In the RBM layer,
                     the contrastive divergence algorithm is mainly used to complete the simulation of
                     the initial feature data, and optimizes the data to improve the feature extraction
                     effect.
                  
                  
                        Fig. 1. Distributions of the different skeleton points.
 
                  
                        Fig. 2. Divisions for two kinds of sports bones.
 
                  
                        Fig. 3. Schematic diagram of the V-DBN action recognition model.
 
                  
                        Table 1. Corresponding results of spatial positions.
                     
                           
                              
                                 | 
                                    
                                 									
                                  Spatial description operator 
                                 								
                               | 
                              
                                    
                                 									
                                  Space 1 
                                 								
                               | 
                              
                                    
                                 									
                                  Space 2 
                                 								
                               | 
                              
                                    
                                 									
                                  Space 3 
                                 								
                               | 
                              
                                    
                                 									
                                  Space 4 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Skeletal Point Subset 
                                 								
                               | 
                              
                                    
                                 									
                                  Right Hand and Left Hand 
                                 								
                               | 
                              
                                    
                                 									
                                  Right Foot, Right Hand, Head 
                                 								
                               | 
                              
                                    
                                 									
                                  Left Hand, Left Foot, Head 
                                 								
                               | 
                              
                                    
                                 									
                                  Right Foot, Left Foot, Head 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Reference Frame 
                                 								
                               | 
                              
                                    
                                 									
                                  Head 
                                 								
                               | 
                              
                                    
                                 									
                                  Left hip 
                                 								
                               | 
                              
                                    
                                 									
                                  Right hip 
                                 								
                               | 
                              
                                    
                                 									
                                  Spine 
                                 								
                               | 
                           
                        
                     
                   
                
               
                     3.2 Construction of the LSTM Model 
                  In the V-DBN action recognition model, the VLAD model unifies the length of the action
                     feature data, but the relationship between the upper and lower frames of continuous
                     skeletal action is interrupted, which is not conducive to the recognition of human
                     action. To overcome the above problems, the LSTM model optimizes the process to recognize
                     human actions [17]. Since V-DBN human action model recognition mainly uses the clustering method to
                     realize the operation of the unit frame, and puts frames with similar action structures
                     into one category, the relationship between the front and back sequences of the skeleton
                     is destroyed, and continuous feature data of each moment are processed via LSTM. After
                     processing, training on (and learning) the human body feature data can be completed,
                     which is shown in Fig. 4 [18].
                  
                  Fig. 4 is arranged in frame order, and sequence feature data are processed through the LSTM
                     model. The acquired feature data are input to the LSTM model in a time relationship,
                     so the distribution relationship of the action sequence features in the time dimension
                     can be obtained. For a more detailed description of bone distributions in the same
                     frame, the spatial angle feature is optimized on the basis of quaternions, and three
                     imaginary parts and one real part are used to describe the rotation point and retrograde
                     of the bone action. The corresponding coordinate relationship is constructed as seen
                     in Fig. 5 [19].
                  
                  As shown in Fig. 5, a vector is used to describe the human skeleton relationship, and the angle between
                     the bones of each frame is solved. The first step is to collect human bone data, and
                     obtain two vectors, $v_{1}$ and $v_{2}$, where each vector has three pieces of coordinate
                     position information; then, the expression of the bone vector is seen in (11):
                  
                  
                  In (11), $u_{x},u_{y},u_{z}$ represents the coordinate position information of the bone vector.
                     We then calculate the rotation angle between the two vectors, as seen in (12):
                  
                  
                  By solving the rotation angle parameters between bone vectors, the coefficient corresponding
                     to the quaternion can be obtained from (11) and (12), and the real coefficient of the quaternion is as seen in (13) [20]:
                  
                  
                  In (13), $q_{0}$ represents the real coefficient, and the imaginary coefficient is $q_{1}$,
                     expressed as seen in (14):
                  
                  
                  In (14), $u_{x}$ represents axis information of bone vector $x$, and the imaginary coefficient
                     is $q_{2}$, as expressed in (15):
                  
                  
                  In (15), $u_{y}$ represents axis information of bone vector $y$, and the imaginary coefficient
                     is $q_{3}$, expressed in (16):
                  
                  
                  In (16), $u_{z}$ represents axis information of bone vector $z$, and the coefficient corresponding
                     to the quaternion is obtained by solving the angle between the vectors. Using bone
                     vector features in Table 2, recognition of character action features can be effectively realized.
                  
                  The LSTM model is used to process the continuous feature data of the human skeleton
                     at each moment to solve the problem of breaking the relationship between the upper
                     and lower frames of the continuous skeleton movements. The entire process of young
                     children's football movement recognition is shown in Fig. 6.
                  
                  Using Kinect devices to collect young children's football movement data, build a model
                     original motion recognition library. Extract the bone feature data, unify the length
                     of motion features through the VLAD model, optimize the feature data using the DBN
                     model, and optimize the continuous motion feature data using the LSTM model. Finally,
                     complete the recognition and analysis of young children's football movement.
                  
                  
                        Fig. 4. Schematic diagram of the LSTM model processing continuous feature data.
 
                  
                        Fig. 5. Skeleton origin coordinates.
 
                  
                        Fig. 6. The entire toddler soccer movement recognition process.
 
                  
                        Fig. 6. Action recognition results for different iterations.
 
                  
                        Table 2. Skeleton vector angle feature selection table.
                     
                           
                              
                                 | 
                                    
                                 									
                                  Angular Features 
                                 								
                               | 
                              
                                    
                                 									
                                  B1 
                                 								
                               | 
                              
                                    
                                 									
                                  B2 
                                 								
                               | 
                              
                                    
                                 									
                                  B3 
                                 								
                               | 
                              
                                    
                                 									
                                  B4 
                                 								
                               | 
                              
                                    
                                 									
                                  B5 
                                 								
                               | 
                              
                                    
                                 									
                                  
                                 								
                                | 
                           
                           
                                 | 
                                    
                                 									
                                  Skeletal Vector V1 
                                 								
                               | 
                              
                                    
                                 									
                                  (1,3), (7,9), (10,12), (13,15) 
                                 								
                               | 
                              
                                    
                                 									
                                  (1,3), (4,6), (13,15), 
                                 									
                                 (11,12), (13,15) 
                                 								
                               | 
                              
                                    
                                 									
                                  (1,3), (4,6), (13,15), (8,9), (8,9) 
                                 								
                               | 
                              
                                    
                                 									
                                  (7,9), (10,12), (1,3), (5,6) 
                                 								
                               | 
                              
                                    
                                 									
                                  (12,6), (6,1), (1,9), (9,15) 
                                 								
                               | 
                              
                                     | 
                           
                           
                                 | 
                                     | 
                           
                           
                                 | 
                                    
                                 									
                                  Skeletal Vector V2 
                                 								
                               | 
                              
                                    
                                 									
                                  (4,6) 
                                 								
                               | 
                              
                                    
                                 									
                                  (7,9) 
                                 								
                               | 
                              
                                    
                                 									
                                  (10,12) 
                                 								
                               | 
                              
                                    
                                 									
                                  (13,15) 
                                 								
                               | 
                              
                                    
                                 									
                                  (9,15), (15,12), (6,1), (1,9) 
                                 								
                               | 
                              
                                    
                                 									
                                  
                                 								
                                |