5.1. Experimental Data
In this paper, training and evaluation are performed on the FLIC and MPI Human Pose
datasets. There is often more than one person in an image and the solution is to train
only the person in the positive center of the image. The target person was cropped
to the positive center then the input image was resized to 256x256, and the image
was rotated (+/-30 degrees), scaled (.75-1.25) in order to perform data increment.
In video frame processing, we first use AlphaPose algorithm to accurately detect the
key points of character skeleton in each image, and save this key points information.
Subsequently, we filter the entire image set for accurate and consistent detection,
checking if each frame has the correct number of key points corresponding to the actual
people present during filming. If it is exceeded, it will be considered that there
is an error in the detection result of the frame image, and it will be deleted. There
are 300 pieces of test video data, which are divided into two categories: normal motion
and conflict behavior, 150 pieces each, each piece is about 5 seconds, and the frame
rate is 20 frames/second.
5.2. Results and Analysis
In this experiment, we used 1080 images for model testing, including 360 images in
each of three different poses. Fig. 5 shows the vulnerability comparison. After analyzing the test results, we found that
the proposed recognition model has good overall performance, with an average accuracy
rate of 95.56%. Each movement behavior has a recognition accuracy of over 94%, showing
high consistency and accuracy in recognizing various behaviors.
Fig. 5. Comparison of vulnerability.
Fig. 6 shows the initialization and training network data analysis. When further analyzing
the misidentified image frames, we found that there are two main problems. Firstly,
some image frames are partially occluded due to abnormal behavior and posture, which
makes the camera unable to fully capture the key information, thus affecting the recognition
effect of the model. Secondly, in some cases, poor lighting conditions may cause the
camera to be unable to accurately recognize the image content, which in turn affects
the extraction of key feature points of human posture and leads to incorrect recognition.
Fig. 6. Initialization and training network data analysis.
Fig. 7 shows the similarity of the YOLOv3 layer. The algorithm analyzes the video frame
and determines it as “safe” or “unsafe”. Safe status indicates that athletes' behavior
is normal, while unsafe status indicates that abnormality needs attention. 540 frames
of self-made videos were used for evaluation, including 366 normal frames and 174
abnormal frames. Table 1 displays the evaluation results.
Fig. 7. CKA similarity between all layers U-Real layers in YOLOv3.
Table 1. Results of algorithm behavior detection and evaluation.
|
Evaluation index
|
Epoch
|
Result
|
|
accuracy
|
100
|
95.12%
|
|
accuracy rate
|
100
|
91.24%
|
|
Recall rate
|
100
|
99.50%
|
|
F1 value
|
100
|
93.24%
|
|
Missed alarm rate
|
100
|
1.15%
|
|
false alarm rate
|
100
|
6.03%
|
Evaluating the proposed spatiotemporal feature point-based anomalous behavior detection
algorithm in this paper, we employ an experimental data set consisting of 5 video
segments and 5 simulated video segments containing conflicting behaviors. Fig. 8 shows the standardized energy comparison. By calculating the average displacement
of feature points on key frames and drawing the displacement change curve, we observe
that the normal motion behavior and the abnormal motion behavior show obvious distinction
points on the curve. This shows that the proposed algorithm is effective in identifying
and distinguishing normal and abnormal behaviors.
Fig. 8. Normalized energy comparison.
Fig. 9 shows the analysis of NMS execution time and quantity. The experiment shows a kinetic
energy change threshold of 4000 for motion. By observing the kinetic energy change
curve of image frames, we find that the average kinetic energy is significantly higher
than the threshold in the interval of 30 frames to 110 frames, while before and after
these two frames, the average kinetic energy is lower than the threshold. This shows
that by analyzing the kinetic energy changes of image frames, we can accurately judge
whether there is abnormal motion behavior.
Fig. 9. NMS execution time and quantity.
Fig. 10 Normalized RGB value analysis. By combining the average displacement of feature points
and kinetic energy change, the athlete abnormal behavior detection algorithm proposed
in this paper is evaluated. The experimental data set contains 150 video clips of
safe movement behavior and 150 video clips of abnormal movement behavior, each video
clip lasting about 5 seconds. The evaluation demonstrates the system's high accuracy
in detecting athletes' abnormal behaviors.
Fig. 10. Normalized RGB value analysis.
According to the system test results, the accuracy of safe motion and abnormal motion
detection is high, but it only gives an alarm when abnormal behavior is detected,
which leads to a certain false alarm rate. In order to deeply analyze the causes of
false alarms, we selected 15 safe motion video clips and 15 abnormal motion video
clips from the test video set, which consisted of normal and abnormal key image frames.
Through the analysis of these image frames, we found some possible causes of false
alarms.
Fig. 11 is an experimental time histogram analysis, which details the system's performance
on various motion recognition. Specifically, the recognition accuracy of safe sports
is as high as 97.04%, which shows that the system can accurately distinguish the normal
and safe behaviors of athletes in most cases. At the same time, the recognition accuracy
of abnormal motion has reached 95.03%, which shows that the system has high sensitivity
to abnormal behavior.
However, although the overall accuracy is high, the system still has a certain false
positive rate. The false positive rate refers to the sum of the probability of erroneously
identifying a frame image originally belonging to a safe motion as an abnormal motion
and erroneously identifying a frame image of an abnormal motion as a safe motion.
Through in-depth analysis of these misjudgment frame images, we found that the main
reasons include the following aspects:
Since the video data is simulated by the camera, and the shooting angle is limited
to the lower right corner in front of the secondary movement, this viewing angle limitation
may cause the system to be unable to fully capture the athlete's movements in some
cases. Especially when athletes suddenly make large movements, their bodies may temporarily
block the camera, making it impossible for the system to accurately capture and analyze
the details of the movements, thus misjudging them as fighting or other abnormal behaviors.
In the video, when two people perform large-scale actions at a certain point at the
same time and last for a long time, the system may also misjudge. This is because
the system may struggle to distinguish between this coordinated action and real fighting
behavior, especially if the action is complex, fast, and difficult to predict.
In order to reduce the false judgment rate and improve the overall performance of
the system, the following measures can be considered: First, optimize the shooting
angle and position of the camera to ensure that the athlete's movements can be comprehensively
and clearly captured; Secondly, introducing advanced image processing technologies
and algorithms enhances the system's ability to recognize complex actions and scenes;
The third is to strengthen the monitoring and analysis of system misjudgment, and
discover and correct problems in a timely manner.