GaoYanfei1
-
(Department of Police General Education, Zhengzhou Police University, Zhengzhou 450000,
China
Yanfei_Gao72@outlook.com
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
OC-SVM, Open Pose model, Behavioral anomaly monitoring, Smart community, Security system
1. Introduction
Police security is the key to building a harmonious community environment. In modern
community management, an efficient information technology public security system can
improve police efficiency and strengthen the overall sense of security in the community.
Unfortunately, many property management and grassroots police units in the application
of information technology are at a low level, still using manual surveillance for
security monitoring, which is inefficient with the potential for security omissions
due to distractions. Jonathan et al. highlighted the impact of rising community crime
rates on the residents’ quality of life and property values [1]. This underscores the importance of effective community policing and security systems.
Prior research has evolved from basic manual surveillance methods to more sophisticated
automated systems. Collins et al. [2] proposed automated surveillance systems for video streaming data, while Socha et
al. [3] used video surveillance to improve public space safety. These early efforts laid
the groundwork for integrating machine learning into smart community systems [4,5]. Socha et al. used video surveillance to improve the safety of public space [3] and examined how to use artificial intelligence technology to build smart communities.
Yao et al. analyzed video surveillance [6] that sends powerful explosions to the users and public security system when abnormal
situations are recognized. Unfortunately, the system has a minimum delay of 7.3 seconds
for abnormal behaviors. Shehzed et al. used video surveillance on human contours [7] and clustering using unsupervised learning to determine abnormal events. Recently,
Bhati’s studies on intrusion detection using Coarse Gaussian SVM [8,9] highlighted the evolution of machine learning techniques in security systems. In
addition, Tiwari et al. [10] performed a deep analysis of the prediction of COVID-19 in India using an ensemble
regression approach to demonstrate the application of machine learning in diverse
fields, including public health.
Two main areas are most likely to cause safety issues in a community: pedestrian hazards
caused by falling buildings [11,12] and life safety issues for the elderly caused by falls [13]. In recent years, the global aging rate has been accelerating, and some statistics
show that more than 80% of the accidental life accidents of the elderly are caused
by falls [14,15]. If a fall by an elderly person can be detected in time and treated quickly [16], the extent of health damage can be reduced by more than 85% [17]. Many researchers have conducted in-depth studies on fall monitoring [18]. The field has witnessed significant advances in recent years. For example, Mirmahboub
proposed a fall monitoring algorithm based on the target outer frame [19], which used background differencing to find the external contours of the human body
in the video, and then subsequently utilized SVMs to differentiate between fall events
and non-fall events. Ma proposed an approach based on the combination of limiting
learning classification and shapes to differentiate whether a fall had occurred [20]. Harrou proposed a multivariate exponential average weighted monitoring algorithm
[21] to reinforce the distinction of whether abnormal behavior has occurred. These studies
have contributed to detecting abnormal behaviors, particularly falls, which are a
major concern in community safety.
This paper proposes a novel monitoring system architecture to monitor neighborhood
safety conditions more efficiently and in real time. The system combines a lightweight
convolutional neural network (CNN) MobileNetV3 and multi-discriminative features to
improve the adaptability, speed, and accuracy of abnormal behavior detection. By replacing
some of the structures in the OpenPose model, the algorithm in this paper reduces
the computational effort and simplifies the model structure. In addition, vectors
are generated by processing the coordinates of the key points of the human body using
the angle between these vectors and the ground and the aspect ratio of the human body’s
calibration frame as the discriminative features of a fall. By incorporating the OC-SVM
algorithm, the proposed system improves the anomaly detection accuracy and minimizes
false positives, a common issue in traditional surveillance systems. The OC-SVM algorithm
can handle unbalanced and unlabeled data and is particularly effective in distinguishing
between normal and anomalous behaviors, enhancing the overall reliability of the system.
This study also designed a smart community policing security system that collects
data in various ways, such as IoT sensors and video cameras, and adopts virtualization
technology to allocate computation, storage, and network resources dynamically in
a logical, abstract way, which improves the availability and scalability of the system,
and ultimately realizes the real-time abnormal behavior detection in multiple scenarios.
Finally, the effectiveness and versatility of the algorithm are demonstrated through
experimental analysis of the Multiple Cameras Fall dataset and a comparison with traditional
methods.
2. Abnormal Behavior Monitoring Algorithms
The abnormal behavior algorithm proposed in this paper consists of two steps: feature
extraction and behavioral discrimination.
2.1 MobileNetV3-based OpenPose Feature Extraction
In this study, the OpenPose model was improved to increase the accuracy and real-time
detection of anomalous behaviors. The original OpenPose model consisted of the first
10 layers of the VGG-19 network and a two-branch multi-stage CNN for predicting the
key points and partial affinity fields [22]. On the other hand, its application in real-time scenarios is limited by its computational
intensity. In this study, inspired by the work of Lightweight Open Pose [23], the VGG-19 was replaced with MobileNetV3 [24], a lightweight neural network model known for its efficiency and effectiveness in
processing video data. MobileNetV3 reduces the number of parameters and computational
complexity significantly using techniques such as depth-wise separable convolutions
and average pooling. This makes MobileNetV3 particularly suitable for applications
requiring the real-time processing and analysis of video data.
Fig. 1. Structural diagram of the improved Open Pose feature extraction.
As shown in Fig. 1, to capture small changes in the body and pose in greater detail, the second stage
of this paper in the OpenPose architecture replaces the original 7${\times}$7 convolution
kernel with a smaller combination of convolution kernels, one 1${\times}$1 and two
3${\times}$3 convolution kernels. The model is followed by a ResNet structure. This
design reduces the number of covariates and the computational cost while increasing
the learning and expressive capabilities of the network and helping prevent overfitting.
The network also integrates aspects of the residual network structure of ResNet, accommodating
the increased depth brought about by these changes. The feature extraction structure
of the Mopen Pose model processes the video sequences through MobileNetV3. After obtaining
the feature maps, the key point and partial affine vector field predictions are performed
through two branches to output the prediction results quickly and accurately.
When an image or video sequence is input into the model, the upper branch generates
a site confidence map for the key points of the human body in each image frame:
where m is the character serial number; n is the human body key point; Q is any point
in the confidence map $\mathrm{C}_{\mathrm{n},\mathrm{m}}$; $\mathrm{X}_{\mathrm{n},\mathrm{m}}$
is the true location of the human body’s key point; $\sigma _{\mathrm{n},\mathrm{m}}$
is the probability distribution describing the corresponding key point. Ideally, each
key point corresponds only to the unique maximum value in the confidence map. Therefore,
the maximum value $\mathrm{C}_{\mathrm{n}}^{\mathrm{*}}\left(\mathrm{Q}\right)=\max
_{\mathrm{m}}\mathrm{C}_{\mathrm{n},\mathrm{m}}^{\mathrm{*}}\left(\mathrm{Q}\right)$
can be obtained to determine the location of the Q-point, and the pixel coordinates
of the location of the Q-point can be expressed as the coordinates of the key point
n. Down-branching is used to predict the true location of two phases of the human
body; ${\sigma}\_(n,m)$ denotes the probability distribution describing the corresponding
key point. The lower branch is used to predict the partial affinity field between
two neighboring key points, n1 and n2, with the integral value:
where $\mathrm{Q}\left(\mathrm{t}\right)=\left(1-\mathrm{t}\right)\mathrm{n}_{2}+\mathrm{tn}_{1},\,\,\mathrm{t}\in
\left(0,1\right);\,\,\,\parallel \mathrm{n}_{2}-\mathrm{n}_{1}\parallel _{2}$denotes
the length of the limb; $\mathrm{Q}\left(\mathrm{t}\right)\in \left[\mathrm{n}_{1},\mathrm{n}_{2}\right]$.
If the point Q is on the limb, $\mathrm{k}_{\mathrm{b}}\left(\mathrm{Q}\left(\mathrm{t}\right)\right)=\nu
,\nu =\left(\mathrm{m}_{2}-\mathrm{m}_{1}\right)/\parallel \mathrm{m}_{2}-\mathrm{m}_{1}\parallel
_{2}$ is the unit vector, otherwise $\mathrm{k}_{\mathrm{b}}\left(\mathrm{n}\left(\mathrm{t}\right)\right)=0$.
The integral value between each key point and its neighboring key points is calculated,
and a larger integral value means that the pair of the neighboring key points is closer
to the real skeleton connection. Therefore, the correct connection for each type of
limb can be obtained by selecting the maximum value of B and connecting the limbs
that share the same key points to form the human skeleton.
Twenty-one key body parts were identified (Fig. 2), numbered from 0 to 20 for illustration purposes. No. 0 corresponds to the nose;
No. 1 corresponds to the neck; Nos. 5 and 2 correspond to the left and right shoulders,
respectively; Nos. 6 and 3 correspond to the left and right elbows, respectively;
Nos. 7 and 4 correspond to the left and right wrists, respectively; No. 8 corresponds
to the center of gravity; Nos. 13 and 9 correspond to the left and right hips, respectively;
Nos. 14 and 10 correspond to the left and right knees, respectively; Nos. 15 and 11
correspond to the left and right ankles, respectively; Nos. 16 and 12 correspond to
the left and right feet, respectively; Nos. 19 and 17 correspond to the left and right
feet, respectively. Nos/ 16 and 12 correspond to the left and right feet, respectively;
Nos. 19 and 17 to the left and right eyes, respectively; Nos. 20 and 18 to the left
and right ears, respectively. This numbering system helps describe and analyze the
key parts of the human body more accurately and conveniently.
Fig. 2. Human body key point and skeleton detection results.
The algorithm in this paper takes the upper left corner of the image as the coordinate
origin and assigns a coordinate to each key point. Specifically, these coordinates
are (x0, y0) for the nose, (x1, y1) for the neck, and so on, up to (x18, y18) for
the right ear and (x20, y20) for the left ear. Such a definition of the coordinates
helps pinpoint the location of each key point in the image.
2.2 Principle of OC-SVM Algorithm
The OC-SVM model, proposed by Scholkopf in 1999 [25], is a single-class support vector machine (SVM) that belongs to an unsupervised learning
algorithm and is mainly used for outlier detection. Unlike traditional SVMs, which
are typically used for binary classification tasks, OC-SVM is designed to detect anomalies
in an unsupervised manner. The primary advantage of OC-SVM lies in its ability to
handle unbalanced datasets, where the number of normal instances outweighs the number
of anomalies. This characteristic makes OC-SVM particularly well-suited for real-time
anomaly detection in community policing scenarios, where abnormal behaviors are rare
compared to normal activities. The algorithm is described as follows:
Set the sample data $\left\{\chi _{1},\chi _{2},\cdots ,\chi _{\mathrm{m}}\right\}\in
\mathrm{X}^{\mathrm{n}};\mathrm{~ m}$ is the number of samples. The expression of
the separating hyperplane is $\omega ^{\mathrm{T}}\phi \left(\chi \right)-\rho =0$,
where $\phi \left(\chi \right)$ is the function that maps the samples to the feature
space, and $\omega ^{\mathrm{T}}$, $\rho $ is the normal vector and offset of the
separating hyperplane in the feature space. The objective is to maximize the distance
between the separating hyperplane and the origin. Therefore, the optimization problem
to be solved by OC-SVM is transformed into a mathematical formulation:
where $\xi _{\mathrm{i}}$ is a relaxation variable indicating that outliers can exist;
v is a parameter controlling the upper limit of the number of outliers and the lower
limit of the number of all support vectors. After introducing the Lagrange multiplier
method, the optimal hypersphere is found by maximizing the Lagrange function. The
dual of this optimization problem is obtained as
where $\alpha _{\mathrm{i}}$ is the Lagrange coefficient corresponding to the sample
$\chi _{\mathrm{i}}\,,$ and the kernel function $\kappa \left(\chi _{\mathrm{i}},\chi
_{\mathrm{j}}\right)=$ $\left\langle \phi \left(\chi _{\mathrm{i}}\right),\phi \left(\chi
_{\mathrm{j}}\right)\right\rangle $ replaces the inner product in the feature space.
After solving the optimization problem (4), the samples corresponding to the Lagrangian
coefficient $\alpha _{\mathrm{i}}${\textgreater}0 are the $\chi _{\mathrm{i}}$ support
vectors. From these support vectors, the normal vector of the hyperplane $\omega =\sum
_{\mathrm{i}=1}^{\mathrm{m}}\alpha _{\mathrm{i}}\phi (\chi _{\mathrm{i}})^{\mathrm{j}}$
and the hyperplane offset $\rho =~ \omega ^{\mathrm{T}}\phi \left(\chi _{\mathrm{SV}}\right)=\sum
_{\mathrm{i}=1}^{\mathrm{m}}\alpha _{\mathrm{i}}\kappa \left(\chi _{\mathrm{i}},\chi
_{\mathrm{SV}}\right)$, and $\chi _{\mathrm{SV}}$ refers to some support vector, which
in turn leads to a classification decision function of
The human posture data is tested and brought into Eq. (5). If the output of $\mathrm{f}\left(\mathrm{x}\right)$ is 1, the point is normal data
and an anomaly that needs to be attended to if the output is ${-}$1.
2.3 Abnormal Behavior Judgment based on OC-SVM
According to the improved OpenPose, the key points of the human body and the relationships
between the key points, such as the posture of the arms (defined by the relative positions
and angles between the shoulders, elbows, and wrists) and the posture of the legs
(defined by the relative positions and angles between the hips, knees, and ankles)
can be extracted from video and picture frames. These relationships and angles are
transformed into a series of feature vectors that provide the database for subsequent
abnormal behavior judgments.
Fig. 3. Abnormal behavior discriminant feature vector.
The key points involved in the algorithm of this paper are the neck, center of gravity
point, right knee, right ankle, left knee, and left ankle, whose coordinates are (x1,y1),
(x8,y8), (x10,y10), (x11,y11), (x14,y14), and (x15,y15), respectively. As shown in
Fig. 3, the feature vectors can be obtained as $\left.\mathbf{V}_{1}=\left(\begin{array}{l}
\mathrm{x}_{8}-\mathrm{x}_{1},\mathrm{y}_{8}-\mathrm{y}_{1}
\end{array}\right.\right),\,\,\mathbf{V}_{2}=\left(\begin{array}{l}
\mathrm{x}_{15}-\mathrm{x}_{14},\mathrm{y}_{15}-\mathrm{y}_{14}
\end{array}\right),$
$\left.\mathbf{V}_{3}=\left(\begin{array}{l}
\mathrm{x}_{10}-\mathrm{x}_{11},\mathrm{y}_{10}-\mathrm{y}_{11}
\end{array}\right.\right).$They represent the human spine vector, left calf vector,
and right calf vector, respectively.
The solid arrows in Fig. 3 indicate the human skeleton involved in the algorithm. The dashed arrows indicate
the direction vector x = (1,0) of the x-axis, which can be used to represent the direction
vector of the real ground because it is always parallel to the ground, and the corresponding
angles of $\mathbf{V}_{1}$, $\mathbf{V}_{2}$, and $\mathbf{V}_{3}$ to x are
where $\theta _{1}$ is the angle between the human spine and the ground; $\theta _{2}$
is the angle between the right calf and the ground; $\theta _{3}$ is the angle between
the left calf and the ground. The aspect ratio of the human body also changes when
the person falls or other abnormal behaviors. In this paper, $\mathrm{X}_{\max },\,\,\mathrm{Y}_{\max
},\,\,\mathrm{X}_{\min },\,\,\mathrm{Y}_{\min }$ of all the coordinate points are
chosen as the calibration frame of the human body as another feature of abnormal behavior,
where $\mathrm{X}_{\max }-\mathrm{X}_{\min }$ is the width of the calibration frame
of the human body, and $\mathrm{Y}_{\max }-\mathrm{Y}_{\min }$ is the height of the
calibration frame of the human body. The aspect ratio of the human body calibration
frame can be expressed as
In the video frame, $\theta _{1},\,\,\theta _{2},\,\,\theta _{3}$, and $\mathrm{R~
}$obtained from improved OpenPose are taken into (5). If the $\mathrm{f}\left(\mathrm{x}\right)$
output of the OC-SVM algorithm is ${-}$1, then it is judged that an abnormal behavior
occurs, such as falling. If the $\mathrm{f}\left(\mathrm{x}\right)$ output of the
OC-SVM algorithm is 1, the human body is in a normal posture, and no warning is required.
2.4 Model Realization Process
The basic idea of this paper for the security anomaly monitoring of video data is
to collect all data related to community policing, including video and image data.
The security operation model of each application module is constructed based on the
data. The inter-frame difference method is then combined to calculate the gray value
of the image and extract the target object [26]. This model replaces the first 10 layers of VGG-19 in the Open Pose model with a
lightweight neural network MobileNetV3, which is used as the feature extraction network
of this paper’s algorithm to improve the speed and accuracy of detection and capture
the human posture accurately in real time. At the same time, the 7${\times}$7 convolutional
kernel in the Open Pose two-branch structure is replaced with one 1${\times}$1 and
two 3${\times}$3 sized convolutional kernels to reduce the computation burden. In
addition, to define the fall state of the human body more accurately, the coordinates
of the key points of the human body are processed to generate three vectors representing
the position and direction of the human spine, the left and right calves, and the
angle between the vectors and the ground. The aspect ratio of the human body’s calibrated
frame is used as the fall discrimination feature to monitor fall events accurately.
Fig. 4 presents the specific implementation process.
Fig. 4. Flowchart of the Security Exception Detection Algorithm.
3. OC-SVM based System Design
This smart community policing security system is designed to detect potential safety
hazards and improve the residential well-being of community residents. With the integration
of computer vision and machine learning technologies, the architecture of the smart
community policing security system designed in this paper is divided into five key
layers (Fig. 5): the data sensing layer, network transmission layer, basic support layer, data service
layer, and functional application layer.
Fig. 5. Architecture diagram of intelligent community security system.
Data Sensing Layer: The police security system of the smart community is directly
connected to the community environment and collects data from different types of IoT
sensors (e.g., RF sensors), video cameras, and vehicle identification cameras, which
provide data support for the subsequent analysis and monitoring so to allow a prompt
response to security incidents.
The layer design allows for the easy addition or removal of sensors and cameras, making
it suitable for both small neighborhoods and large urban areas. This flexibility is
crucial for tailoring the system to the specific needs and characteristics of different
communities. Table 1 lists the corresponding data types.
Table 1. Correspondence Between Sensor Types and Data Types.
Sensor Type
|
Data Type
|
Role in Security Monitoring
|
Video Camera
|
Video Stream
|
Real-time area surveillance
|
RF Sensor
|
Signal Strength
|
Positioning and tracking of people and objects
|
Vehicle Recognition Camera
|
Image/Video Stream
|
Vehicle identification and tracking
|
Access Control Sensor
|
Access Control Signal
|
Personnel entry and exit control
|
RFID [27]
|
Tag Information
|
Item tracking and management
|
Facial Recognition Camera
|
Image/Video Stream
|
Identity verification and authentication
|
Table 2. Data Transport Performance Indicators.
Network Layer
|
Bandwidth (Mbps)
|
Latency (ms)
|
Throughput (Mbps)
|
Stability Rating
|
Dedicated Video Network
|
500
|
30
|
450
|
9
|
Local Area Network
|
1000
|
10
|
950
|
10
|
Government External Network
|
200
|
50
|
180
|
8
|
Network transport layer: The data then flows to the Network Transmission Layer, which
acts as the communication backbone of the intelligent community policing security
system. The video private network is a video private network dedicated to the transmission
of continuous video data streams, characterized by high broadband and low latency.
A local LAN is mainly used for transmitting face recognition data, vehicle recognition
data, RFID data, and access control data connecting various sensors and data processing
units in specific areas and buildings. The Government Extranet is used mainly to transmit
sensitive data to public security and medical departments to ensure a prompt response
to security emergencies in the community. Encryption protocols and strict access control
are essential to protect the privacy of the community residents. Table 2 lists the transport performance metrics of the three networks.
Base Support Layer: After obtaining data from the network transmission layer, the
base support layer uses a logical abstraction approach to simplify the core resources
of the system. By abstracting computing, storage, and network resources, after encapsulating
these basic resources into a dedicated pool, the base support layer provides a scalable
architecture that adapts to the future growing demands and improves the reliability
of the system. The network architecture is designed to handle varying loads of data,
ensuring stable performance regardless of the scale of deployment.
Data Service Layer: The data service layer provides services for the data from the
base support layer, cleaning and standardizing the acquired video and photo data,
unifying them into accessible, structured data, and constructing a multi-sensory database
for easy storage and management. The data service layer can manage and analyze large
datasets in terms of data processing. Implementing cloud computing and virtualization
technologies within these layers allows for efficient data processing and storage,
further enhancing the scalability of the system.
Platform Service Layer: The platform service layer integrates middleware services,
such as microservices, container services, and message proxies, providing an environment
for efficient development, deployment, and management. Integrating big data analysis
tools, improved OpenPose, and OC-SVM algorithms provides powerful support for the
real-time monitoring of abnormal behavior.
Application Layer: The application layer obtains interpretable results by calling
services from the platform service layer through interfaces and databases from the
data service layer. This layer comprises the Community Overview Dashboard, Abnormal
Warning, Elderly Fall Warning, and Abnormal Vehicle modules. Among them, the community
overview dashboard displays various security anomalies in the current community and
can be used as a command center for police security. The anomaly warning includes
warnings of falling objects and abnormal behavior of outsiders and uses the OC-SVM
algorithm component to mark abnormal behaviors that deviate from normal patterns.
The Elderly Fall Warning module monitors falls in real time and sends warnings to
the public security department, community managers, and community medical departments.
4. Results and Discussion
4.1 Experimental Data Collection
The effectiveness of the proposed system was validated by carefully selecting the
Multiple Cameras Fall dataset as the primary data source. The rationale behind choosing
this specific dataset is its comprehensive representation of real-world scenarios
involving falls, a critical safety concern in community environments. The dataset
includes a diverse range of fall events captured under various conditions and from
multiple angles, making it an ideal benchmark for testing the robustness and accuracy
of the anomaly detection system.
Developed by the University of Monterey in 2010, the Multiple Cameras Fall dataset
consists of video recordings from eight ordinary IP cameras, capturing twenty-four
scenarios. These scenarios include fall events and various non-fall activities, such
as cleaning, lying on the couch, and sitting. This diversity in the dataset provides
a realistic and challenging test environment for the proposed system, ensuring that
it can accurately differentiate between falls and other common daily activities (Fig. 6). Moreover, the dataset presents various complexities typical of real-life settings,
such as different angles of falls, issues with obstruction by indoor objects, variations
in body types, and diverse backgrounds including different clothing and indoor environments.
These factors are crucial for assessing the ability of the system to function effectively
in real-world community policing scenarios, where similar challenges are frequently
encountered. Table 3 provides details of the dataset.
Fig. 6. Different Fall Positions for The Multiple Cameras Fall Dataset.
Table 3. Content of Multiple Cameras Fall Dataset.
#
|
Features
|
Descriptions
|
1
|
Angle
|
For different angles, the fall is in different directions.
|
2
|
Shelter
|
Problems with obscuring indoor objects, cameras, etc.
|
3
|
Body Differences
|
Physical differences between different bodies (height and size).
|
4
|
Background
|
Different clothing, different indoor environments.
|
The multiple cameras fall dataset was connected to a computer system equipped with
an Intel Core i7 processor and 16GB RAM, running the Windows 10 operating system.
PyTorch, a powerful open-source machine learning library, was utilized for data processing
and simulation. PyTorch provided the tools necessary for video data processing, algorithm
implementation, and performance simulation. The pre-processing of the dataset involves
several key steps to ensure optimal analysis. Initially, video frames are standardized
to a resolution of 640${\times}$480 pixels, followed by applying a Gaussian blur to
reduce noise. This is crucial in enhancing the clarity of human figures against varying
backgrounds. Color normalization is then conducted to adjust the contrast and brightness,
improving figure-background distinction. Subsequently, human figures are isolated
from the background through segmentation techniques. Finally, frames are converted
to grayscale to simplify the data for effective feature extraction. These steps are
essential for preparing the data for accurate abnormal behavior detection.
4.2 Experimental Results
Behavioral anomaly judgments occur continuously, e.g., a significant change in speed
occurs during and after a fall. Therefore, in deep learning, the performance of behavioral
anomaly detection models is generally judged by two metrics: precision and sensitivity
[28]. In particular, precision indicates the proportion of prediction pairs in the samples
where the prediction is a positive example, i.e., the proportion of prediction pairs
where the prediction is a fall action. Sensitivity, also known as recall, represents
the proportion of predicted pairs in the samples where the true outcome is a positive
example, i.e., the proportion of predicted pairs in samples where the actual outcome
is a fall. The formulae for sensitivity and specificity are as follows:
In this study, all the sample videos in the dataset were normalized to a spatial resolution
of 224${\times}$224 with a frame rate of 30fps size. The normalized dataset was then
divided into a training set and a test set at a 7:3 ratio, and the test results of
the test set were analyzed according to the evaluation index. This study examined
the three most common daily movements of the human body (walking, sitting down, and
falling) to verify whether the model can accurately discriminate abnormal behaviors.
The effectiveness of the algorithm in this paper was tested. First, according to the
traditional algorithm, the background difference method was used to find the external
contour of the human body. The size of the extracted human body contour was then judged,
and the OC-SVM algorithm was used to distinguish between falling and non-falling events.
Table 4 lists the test results obtained by this method.
Table 4. Results of the Traditional Algorithms for Security Exception Detection.
Human Behavior States
|
Precision (%)
|
Recall (%)
|
Frame Rate (Frames/second)
|
Fall
|
78.9
|
77.4
|
10.96
|
Walk
|
73.0
|
70.1
|
10.96
|
Sit
|
63.2
|
61.8
|
10.96
|
Average of All Class
|
71.7
|
69.6
|
10.96
|
Table 5. Results of OpenPose and OC-SVM for Security Exception Detection.
Human Behavior States
|
Precision (%)
|
Recall (%)
|
Frame Rate (Frames/second)
|
Fall
|
86.9
|
84.3
|
4.45
|
Walk
|
82.5
|
79.1
|
4.45
|
Sit
|
76.9
|
73.7
|
4.45
|
Average of All Class
|
82.1
|
79.0
|
4.45
|
Table 5 lists the test results of the proposed method of obtaining images based on the inter-frame
difference method and feature extraction by the OpenPose model improved using MobileNetV3
and the OC-SVM algorithm to recognize falling behavior.
The precision of the traditional and proposed methods was 71.7% and 82.1%, respectively,
showing a 10.4% improvement compared to the traditional method. The recall of the
traditional and proposed methods reached 69.6% and 79.0%, respectively, showing a
9.4% improvement. The monitoring speed of the traditional method was 10.96 frames/second,
while the detection speed of the method proposed in this paper reached 4.45 frames/second.
Table 6. Comparison of the OpenPose and OC-SVM and the multimodal approaches.
Data
|
Accuracy
|
Precision
|
F1-Score
|
Proposed
|
88.74
|
82.12
|
75.43
|
Martínez-Villaseñor et al. [29]
|
95.00
|
77.70
|
72.80
|
The proposed method achieved a notable balance in the performance metrics, achieving
an accuracy, precision, and F1-Score of 88.74%, 82.12%, and 75.43%, respectively (Table 6). Although the Mart\'{i}nez-Villase\~{n}or approach showed a higher accuracy of 95.00%,
the precision was inferior, with a 4.42 % difference. This indicates a higher rate
of true positive detections relative to the total number of positive detections made
by the proposed system. The F1-Score, which is a harmonic mean of precision and recall,
underscores the effectiveness of the proposed approach, demonstrating its capability
to maintain balanced performance between precision and recall.
A PR curve was plotted for each category based on precision and recall to average
the performance of the model. mAP is the average of the APs of multiple categories,
i.e., the average accuracy. mAP is the average of the APs of multiple categories,
i.e., the average accuracy. mAP is the average of the APs of multiple categories,
i.e., the average accuracy of multiple categories. Figs. 7 and 8 show the PR curves for the traditional and proposed methods, respectively.
Fig. 7. PR curve of the Traditional Algorithms for Security Exception Detection.
Fig. 8. PR curve of the OpenPose and OC-SVM for Security Exception Detection.
The traditional algorithm had a monitoring accuracy of 0.786 for the falling category,
0.713 for the walking category, 0.625 for the sitting category, and 0.73 after averaging
all the categories. The proposed method is based on obtaining the image using the
inter-frame difference method and the OpenPose model improved by MobileNetV3 The method
of feature extraction and using the OC-SVM algorithm to identify abnormal behaviors
improved the accuracy in the falling category by 0.106, walking category by 0.147,
and sitting category by 0.209, while the average accuracy improved by 0.132. Hence,
the method proposed in this paper has superior accuracy and monitoring speed. This
outcome can meet the real-time requirements in designing an intelligent community
policing security system that will improve community security and residents’ happiness.
5. Conclusion
This paper presented a significant advance in community policing by developing a smart
security system utilizing the OC-SVM algorithm and MobileNetV3-improved OpenPose model.
By combining the lightweight convolutional neural network MobileNetV3 and the improved
OpenPose model, this system can effectively extract human posture features and realize
the real-time detection of abnormal behaviors. This paper first reviewed the traditional
OpenPose model and its shortcomings in real-time surveillance applications. Subsequently,
this study optimized the computational efficiency of feature extraction and behavioral
discrimination by introducing a parallel computing mechanism. The specific conclusions
are as follows:
The improved OpenPose model of MobileNetV3 was used for feature extraction, effectively
improving the speed and accuracy of data processing. Using the optimized model, experiments
conducted on the Multiple Cameras Fall dataset showed that the precision and recall
of this system in fall detection reached 86.9% and 84.3%, respectively, which were
significantly better than the traditional behavior monitoring methods.
The abnormal behavior monitoring algorithm based on the OC-SVM proposed in this paper
realized a rapid response to abnormal behaviors by precisely analyzing feature vectors
and human body postures. The experimental results showed that, compared to the traditional
algorithm, the method proposed in this study had higher accuracy and real-time performance
in recognizing abnormal behavior.
Simulation experiments verified the efficiency and reliability of this system. In
terms of processing speed, the system achieved a processing speed of 4.45 frames per
second, which meets the requirements of real-time monitoring. In addition, the system
maintains stable performance in different environments, which proves its usefulness
in smart community security management.
In summary, the smart community policing security system based on the OC-SVM algorithm
designed in this study has significant advantages in enhancing community security
and residents’ happiness. The successful implementation of this system provides a
new technical solution and application model for security monitoring in smart cities,
which can effectively assist security managers in security supervision and emergency
responses. Future research directions will include exploring the integration of additional
sensory inputs to improve the detection capabilities. The potential for adapting this
system to different environments and applications, such as healthcare or industrial
safety, also presents exciting avenues for expansion. In addition, the system is expected
to play an important role in the future construction of smart cities.
REFERENCES
Jonathan O E, Olusola A J, Bernadin T C A, et al. Impacts of crime on socio-economic
development. Mediterranean Journal of Social Sciences, 2021, 12(5): 71.
Collins R T, Lipton A J, Kanade T, et al. A system for video surveillance and monitoring.
VSAM final report, 2000, (1-68): 1.
Socha R, Kogut B. Urban video surveillance as a tool to improve security in public
spaces. Sustainability, 2020, 12(15): 6210.
Li X, Lu R, Liang X, et al. Smart community: an internet of things application. IEEE
Communications magazine, 2011, 49(11): 68-75.
Barrett B F D, DeWit A, Yarime M. Japanese smart cities and communities: Integrating
technological and institutional innovation for Society 5.0. Smart Cities for Technological
and Social Innovation. Academic Press, 2021: 73-94.
Yao S, Ardabili B R, Pazho A D, et al. Real-World Community-in-the-Loop Smart Video
Surveillance--A Case Study at a Community College. arXiv preprint arXiv:2303.12934,
2023.
Shehzed A, Jalal A, Kim K. Multi-person tracking in smart surveillance system for
crowd counting and normal/abnormal events detection. 2019 International conference
on applied and engineering mathematics (ICAEM). IEEE, 2019: 163-168.
Bhati, B. S., & Rai, C. S. (2021). Intrusion detection technique using Coarse Gaussian
SVM. International Journal of Grid and Utility Computing, 12(1), 27-32.
Bhati, B. S., & Rai, C. S. (2020). Analysis of support vector machine-based intrusion
detection techniques. Arabian Journal for Science and Engineering, 45, 2371-2383.
Tiwari, D., & Bhati, B. S. (2021). A deep analysis and prediction of covid-19 in India:
using ensemble regression approach. Artificial Intelligence and Machine Learning for
COVID-19, 97-109.
Weaver III A, Ojiambo W, Kemp J, et al. Pedestrian Walkways: Hidden Hazards Related
to Common Landscaping Practices. Professional Safety, 2022, 67(07): 14-22.
Li Y, Esmaeili B, Gheisari M, et al. Using Unmanned Aerial Systems (UAS) for Assessing
and Monitoring Fall Hazard Prevention Systems in High-rise Building Projects. arXiv
preprint arXiv:2209, 13137, 2022.
Ang G C, Low S L, How C H. Approach to falls among the elderly in the community. Singapore
medical journal, 2020, 61(3): 116.
Vaishya R, Vaish A. Falls in older adults are serious. Indian journal of orthopedics,
2020, 54: 69-74.
Johnson J, Rodriguez M A, Al Snih S. Life-space mobility in the elderly: current perspectives.
Clinical interventions in aging, 2020: 1665-1674.
Carpenter C R, Cameron A, Ganz D A, et al. Older adult falls in emergency medicine:
2019 update. Clinics in geriatric medicine, 2019, 35(2): 205-219.
Tanwar R, Nandal N, Zamani M, et al. Pathway of trends and technologies in fall detection:
a systematic review. Healthcare. MDPI, 2022, 10(1): 172.
Ramanujam E, Padmavathi S. A vision-based posture monitoring system for the elderly
using intelligent fall detection technique. Guide to Ambient Intelligence in the IoT
Environment: Principles, Technologies and Applications, 2019: 249-269.
Mirmahboub B, Samavi S, et al. Automatic monocular system for human fall detection
based on variations in silhouette area. IEEE transactions on bio medical engineering,
2013, 60(2):427-436.
Ma X, Wang H, et al. Depth-Based human fall detection via shape features and improved
extreme Learning Machine. IEEE Journal of Biomedical and Health Informatics,2014,18(6):1915-1922.
Harrou F, Zerrouki N, Sun Y, et al. Vision-based fall detection system for improving
safety of elderly people. IEEE Instrumentation and Measurement Magazine, 2017, 20(6):49-55.
Chen W, Jiang Z, Guo H, et al. Fall detection based on key points of human-skeleton
using Open Pose. Symmetry, 2020, 12(5): 744.
Osokin D. Real-time 2d multi-person pose estimation on cpu: Lightweight openpose.
arXiv preprint arXiv:1811.12004, 2018.
Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3. Proceedings of the IEEE/CVF
international conference on computer vision. 2019: 1314-1324.
Scholkopf B, Mika S, Burges C J C, et al. Input space versus feature space in kernel-based
methods. IEEE transactions on neural networks, 1999, 10(5): 1000-1017.
Weng M, Huang G, Da X. A new interframe difference algorithm for moving target detection.
2010 3rd international congress on image and signal processing. IEEE, 2010, 1: 285-289.
Weinstein R. RFID: a technical overview and its application to the enterprise. IT
professional, 2005, 7(3): 27-33.
Ringberg H, Soule A, Rexford J, et al. Sensitivity of PCA for traffic anomaly detection.
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and
modeling of computer systems. 2007: 109-120.
Martínez-Villaseñor, L., Ponce, H., Brieva, J., Moya-Albor, E., Núñez-Martínez, J.,
& Peñafort-Asturiano, C. (2019). UP-fall detection dataset: A multimodal approach.
Sensors, 19(9), 1988.
Author
Yanfei Gao was born in Henan, China, in 1986. From 2005 to 2009, she studied in Northwestern
University and received her bachelor's degree in 2009. From 2009 to 2012, she studied
in Northwestern University and received her Master's degree in 2012.Since 2012, she
has been working at Zhengzhou Police University. Her research interests are included
Sociology and Public Security.