2.1. Tracking Algorithm Based on Compressed Sensing
Nyquist sampling theorem points out that in order to ensure the integrity of the signal
when acquiring the signal, the sampling frequency must be greater than twice of the
highest frequency in the signal to accurately complete the signal reconstruction [15-
17]. Traditional signal sampling is based on this, but most of the sampled data is redundant.
In the process of signal or image compression, only the important data is usually
retained, and a large number of redundant data is discarded. In 2004, The concept
of compressed sensing was proposed by Donoho et al. This theory is a new signal acquisition
codec theory which makes full use of signal sparsity. When the signal is sparse, it
is possible to reconstruct the signal accurately or approximately by sampling a few
signal projected values. Compression Sensing is a procedure by which the high dimension
raw signal $X$ is measured by measuring matrix $A$. In this case, the measurement
signal $y$ has a length that is far less than that of the raw signal $X$. Therefore,
it is possible to compress the signal $X$ [18].
The theory of compressed sensing breaks through the limit of the Shannon sampling
theorem. The Shannon sampling theorem provides the theoretical basis for the digital
processing of signals, ensuring that the signal information is not lost during sampling,
and the original signal can be perfectly recovered with random sampling methods with
less data sampling points. The basic formula is as follows:
where $\Phi$ is the random measurement matrix, $x \in R^{m\times 1}$ is the high-dimensional
original signal, and $y \in R^{m\times 1}$ is the low-dimensional data after compression.
CT algorithm refers to the image reconstruction algorithm of computer photography.
CT technology is an advanced imaging technology, which scans the examined object in
different directions through X-rays, and then uses CT algorithm to integrate and build
these data into two-or three-dimensional images with spatial anatomical structure
information.
The basic process can be expressed as: first in the process of tracking to classify
the positive sample (tracked target) and negative sample (background) characteristics,
and then use the RIP (restricted isometryproperty) random projection matrix of multi-scale
image feature dimension reduction, and then use the naive Bayes classifier to reduce
the characteristics of the classification.
In this algorithm, the random measurement matrix $\Phi$ completes the process of feature
dimensionality reduction, which is defined as follows:
where $s$ is the random number generated between 2-4.
After dimensionality reduction through the above measurement matrix, the actual feature
obtained is actually the weighted sum of several regions in the image. After completing
the above steps, what should be done next is to update the naive Bayes classifier
by using the positive and negative sample features after dimensionality reduction
[19,
20]. Finally, by traversing the adjacent region of the target position of the previous
frame, the similarity is calculated by using Bayesian criterion, and the accurate
location of the tracking target in the current frame is obtained. Where, the Bayes
formula is
$p(v_i \mid y = 1) \sim N(\mu_i^1, \sigma_i^1)$, $p(v_i \mid y = 0) \sim N(\mu_i^0,
\sigma_i^0)$, $\mu^1$ and $\sigma^1$ are respectively the mean and variance of the
positive sample, $\mu^0$ and $\sigma^0$ represents the mean and variance of the negative
sample.) $MAX(H(v)$ where the target is most likely to appear in the current frame.
After the latest location of the target is determined, the positive and negative samples
after dimensionality reduction are re-taken to further update the classifier. The
updating process is as follows:
where for the learning parameter, this is an experience value, the larger the update
speed is slower. After the classifier is updated, it continues to search for the target
position in the next frame. Through such repeated iterations, the target tracking
is realized.
The tracking algorithm based on compressed sensing proposed in this paper is compared
with CT, TLD and OAB tracking algorithms to track more than 20 different video sequences
[21,
22].
(1) CT
Features: The CT algorithm achieves target tracking by calculating the correlation
between the target template and the candidate regions in the frame to be tracked.
The Fourier transform was mainly used to calculate the correlation score of the target
template and the candidate regions, and the highest score was selected as the most
likely target position.
Advantages: The algorithm is simple and efficient, and it is robust to target scale
changes and partial occlusion.
Disadvantages: sensitive to complex background, light changes and other factors, prone
to target drift phenomenon.
(2) TLD (tracking-learning-detection)
Features: The TLD algorithm combines three functions: target tracking, target learning
and target detection. The accuracy and robustness of tracking are improved by constantly
updating the target model, learning the target appearance and movement pattern, and
detecting and correcting the target.
Advantages: It can deal with the deformation, scale change and partial occlusion of
targets, which is suitable for long-term target tracking.
Disadvantages: requires a lot of training data and computing resources, high hardware
requirements.
(3) OAB (online AdaBoost)
Features: OAB is a target tracking algorithm based on the AdaBoost algorithm. It trains
a series of weak classifiers to track the targets according to their weights.
Advantages: Good robustness and accuracy, suitable for target tracking in complex
scenarios.
Disadvantages: The effect may be affected if the light changes greatly or the target
appearance changes quickly.
As shown in Fig. 2, the tracking effects of CT, TLD and OAB are significantly reduced when the vehicle
is running faster, while the algorithm proposed in this paper is basically stable.
As shown in the tracking results of frames 60 and 100 in Fig. 2, when the vehicle speed exceeds the moving speed of the tracking target, obvious
drift occurs in the tracking process of the CT algorithm. But the new method is effective
to prevent the shift of components from moving.
Fig. 2. Algorithm training.
2.2. Virtual Avatar Motion Location Based on Tracking Algorithm
The motor system of the human body consists of bones, joints and skeletal muscles.
The bones of the whole body are connected by joints to form bones, support weight,
protect internal organs, and give the human body basic shape. In this study, the virtual
avatar bone model establishes a hierarchical structure and regards the virtual avatar
bone model as a tree-like structure model connected by bone segments and joints [23]. In the tree structure of the virtual avatar bone model, the Hip joint of the human
body is taken as the root node of the tree structure, and every node in the tree structure
is rotated in the space local coordinate system of its parent [24]. To locate the movement of a virtual image, we need to find out the space location
of a person's joint, i. e., the sensation of movement pose. According to the tree
structure of the virtual avatar bone model, this study took the upper limb of the
human body as an example to introduce the problem of solving the spatial position
of the virtual avatar bone joint [25-
27].
As shown in Fig. 3 (a) and (b), the parent of wrist joint $W$ is elbowing joint $E$. Based on the original
point of the elbow, a person’s forearm is extended along the $Y$-axes of the local
coordinate. Suppose that the coordinates of the elbow joint $E$ are known in the global
coordinate system, and then the coordinates of the wrist joint $W$, the subnode can
be resolved by using the rotational matrix R of the elbow joint $E$. Let the length
of the forearm skeleton be ARM_LENGTH , then the translation vector along the $Y$-axis
of the local coordinate system $T_{axis} = (0,ARM\_LENGTH)$, Here, the coordinates
of the elbow joint $E$ are given in the global coordinate system, the rotational matrix
of the elbow joint $E$ is, and the displacement coordinates of the sub node $W$ with
respect to the mother node
Based on Eq. (7) below, the coordinates of the subnode wrist joint $W$ in the global coordinate system
are
According to the above solution method taking the upper limb of the human body as
an example, the method of solving the spatial position of each joint of the whole
body is summarized:
Virtual avatar skeleton model initialization calibration. Set the coordinates of its
hip joint $P_{Hip}$ and its rotation matrix $R_{Hip}$ as well as the length of the
avatar’s individual bones.
Virtual avatar bone model modeling. Through the depth-first traversal algorithm, the
root node hip joint is first visited, and then each adjacent joint node $J$ is successively
visited from the parent node. Suppose that the mother node’s global coordinates are,
and the mother node’s rotation matrix is. Based on the local coordinate system of
the parent node, The axes of growth of the respective bones are determined in the
local coordinate system of a person’s body, and the translation vector is computed
in a local coordinate system of the mother node, where $L$ represents the bonelength
1) If it grows along the $X$-axis, $T_{axis} = (L,0,0)$;
2) If it grows along the $Y$-axis, $T_{axis} = (0,L,0)$;
3) If it grows along the $Z$-axis, $T_{axis} = (0,0,L)$;
Virtual avatar real-time motion modeling. Based on the translation vector $T_{axis}$,
the subnode’s translation coordinates are computed. The subnode’s global coordinates
are those of its mother’s global coordinate and those of its subnode with respect
to its mother
Then, in the framework of Virtual Avatar Frame Frame, the subnode’s translation coordinates
with its mother are computed by means of TVS. Lastly, the subnode’s global coordinates
are those of its mother and mother’s global coordinate system and its subnode’s translation
coordinates. Based on the HDI coordinates, we can compute the upper position of the
head, the buttocks and the upper position of the person’s head.
A forward kinematic equation for calculating the distance between a person’s hip and
a head is deduced
$P_{Hip}$, $P_{Spine}$, $P_{Heal}$ denote the rotation matrices of the joints Hip,
Spine, and Head, respectively, $P_{Hip}$ denotes the point of articulation of the
human body.
If the human motion data is directly given to a virtualized body skeletal model, some
of the 3D spatial positions of the bones in the target model will change. When the
changes are significant enough to produce motion distortion, then we need to define
motion redirection algorithms to modify the human motion data
From the forward kinematics, the human lower limb posture can be obtained by utilizing
the human lower limb bone length and motion data, and the height of the human joint
point in the global coordinate system can be derived based on the constraints of the
human lower limb and the horizontal plane $Z_{Hip}$.Set the neck and spine in the
same line, that is $P_{Hip}$, $P_{Spine}$ and $P_{Heal}$ 3 points in a straight line.
In three-dimensional space can be worn in the human head of the optical sensor absolute
coordinates $P_{Healend}$, $P_{Heal}$to $P_{Healend}$ the distance for the length
of the human spine T1, then according to the formula (13) can be obtained from the
absolute coordinates of the top of the human spine $P_{Head}(X_{Head},Y_{Head},Z_{Head})$.
Let the coordinates of the joints of the human body Hip be in three-dimensional space
as $P_{Hip} = (X_{Hip}, Y_{Hip}, Z_{Hip})$, where $Z_{Hip}$ can be obtained from human
lower limb posture calculations based on the spatial position constraints of the human
lower limb with respect to the ground plane. From $P_{Hip}$ to $P_{Heal}$ the distance
for the length of the human spine $T_2$, so the calculation of the human spine pitch
angle $\beta$.
The yaw angle of the human spine captured $\alpha$ by the sensor and the pitch angle
$\beta$ obtained from (14) are shown in Fig. 3(c).
Then, the calculation gives
Fig. 3. Avatar positioning.