Compared with other images, fisheye images have a broader field of view, so in order
to achieve the goal of remote scenic spot roaming, the panoramic splicing technology
of fisheye images will be studied. To ensure accurate stitching of fisheye images,
research will be conducted on image correction, registration, spherical projection
mapping, and fusion to ensure that the stitched images conform to normal vision.
2.1. Fisheye Image Correction and Registration
When stitching fisheye images, it is necessary to first correct the images, that is,
convert the non-linear storage of fisheye images into linear storage. Due to the fact
that the pixels in the same column of the fisheye image are similar to the longitude
of the Earth, when converting the fisheye image to a normal image, the horizontal
axis of the same pixel is the same, but the vertical axis will change. The calculation
method for mapping the pixel points of the fisheye image to the camera plane is shown
in Eq. (1).
In Eq. (1), $(x', y')$ represents the mapped pixel plane coordinates. $(x, y)$ means the pixel
coordinates of the fisheye image; $(x_0, y_0)$ represents the center point coordinates.
The method for converting spherical longitude and latitude coordinates of fisheye
images is shown in Eq. (2).
Fig. 1. The RANSAC algorithm.
In Eq. (2), $(X,Y)$ represent the spherical latitude and longitude coordinates of the pixel
point. $r_0$ represents the distance between the distortion point and the center point.
The correction calculation for fisheye images can be obtained by combining the above
equation with Eq. (1), as shown in Eq. (3).
After extracting the effective regions from the fisheye image, the center and radius
of the shape are calculated, and a correction model is established. Thus, the pixel
coordinates of the fisheye image can be converted. After the coordinate conversion
is completed, the grayscale difference is applied to the pixel points to achieve the
conversion of the fisheye image to a normal image. After image correction is completed,
it is also necessary to register the image, which directly affects the final effect
of image stitching. Image registration first uses the RANSAC algorithm to match and
filter the feature points of the image, and then uses the SIFT algorithm for registration.
The RANSAC algorithm is shown in Fig. 1.
As shown in Fig. 1, for a set of noisy sample points, the RANSAC algorithm can calculate the model parameters
through random sampling, and then find the correct model from a large number of noisy
samples. The main relationship models of the RANSAC algorithm are the basic matrix
and the homography matrix, where the homography matrix is mainly responsible for characterizing
the transformations between planes, and the basic matrix is mainly responsible for
characterizing the epipolar geometric relationships between matching points [15]. The mathematical model of homography matrix is denoted in Eq. (4).
Fig. 2. Principles of the epipolar geometry.
In Eq. (4), $(x_1, y_1)$ represents the pixel coordinate after plane transformation. $H$ represents
homography matrix. The mathematical model of the basic matrix is shown in Eq. (5).
In Eq. (5), $p_1$ and $p_2$ represent the matching feature points of different images, respectively.
$F$ represents the fundamental matrix. $K_1$ and $K_2$ represent the internal reference
matrices of different cameras, respectively. $t$ and $R$ represent the relative external
poses of different cameras. The principle of epiolar geometry is denoted in Fig. 2.
As shown in Fig. 2, the line connecting the two cameras (baseline) will intersect with the opposite
poles of the two images. At this point, any plane containing a baseline is a epipolar
plane that intersects with two image planes on two straight lines. When the 3D position
of a point changes, it is actually a rotation of the epipolar plane around the baseline,
and the resulting plane cluster is called the polar plane bundle. All epipolar lines
that intersect with the image plane intersect at the epipolar point. However, due
to the large number of false filters caused by homography matrix and the poor filtering
effect of the basic matrix, the reliability of the feature filtering results of RANSAC
algorithm is low [16,
17]. Therefore, the study introduces ray vectors and rotation matrices to improve the
RANSAC algorithm. The ray vector is shown in Fig. 3.
Fig. 3. Schematic diagram of the light vector.
From Fig. 3, the camera internal parameters can be used to recover the incident light of pixels
in the image, and the camera model can be used to map the light vector of feature
points to the camera coordinate system. Due to the fact that the ray vector is independent
of the lens itself, using it for feature point filtering can effectively avoid the
problem of false positives and missed filters. The method for calculating the ray
vector is shown in Eq. (6).
In Eq. (6), $v$ represents the ray vector. $r$ represents the projection position of feature
points of an image on the normalized focal length plane. $f$ stands for camera focal
length. $[u,\upsilon]^T$ represents the coordinate of feature points. The screening
of ray vectors is carried out through a rotation matrix, and the method for solving
the rotation matrix is shown in Eq. (7).
In Eq. (7), $M$ represents the solution of the equation, which is the rotation matrix between
cameras. $v_a$ and $v_b$ represent ray vectors for different images, respectively.
If the diagonal value of $MM^T$ is 1, it will perform singular value decomposition
on the $M$ matrix. If the singular value of the matrix is close to 1, the left and
right singular matrices are multiplied to obtain the normalized unit orthogonal solution,
which will be used for interior point screening. The criteria for interior point screening
are shown in Eq. (8).
In Eq. (8), $\theta$ represents the angle between vectors. Due to the fact that the final model
of the improved RANSAC algorithm is calculated through three pairs of matching vectors,
it does not have statistical significance and requires further optimization. The optimization
objective function is shown in Eq. (9).
In Eq. (9), $n$ represents the number of matched ray vector pairs. $v_i$ and $v'_i$ both represent
matching ray vectors. $R(\cdot)$ represents rotation transformation. $r$ represents
a rotation vector.
2.2. Spherical Projection Mapping and Fusion of Fisheye Images
Due to the fact that fisheye images are collected from four different directions,
direct stitching can lead to severe image distortion. Therefore, it needs to first
project the images onto a sphere so that all images are in the same plane before stitching
can be performed. The schematic diagram of the spherical projection mapping of the
image is denoted in Fig. 4.
Fig. 4. Schematic diagram of spherical projection mapping.
In Fig. 4, A and B represent the horizontal tilt angle and the up and down tilt angle of the
camera, respectively. As shown in Fig. 4, with the same point as the origin, the world coordinate system and camera coordinate
system are constructed separately, connecting the coordinate origin with the pixel
points of the image. The intersection point with the sphere is the projection mapping
of the image on the sphere. The coordinate calculation method for spherical projection
mapping is shown in Eq. (10).
In Eq. (10), $(x_2, y_2, z_2)$ and $(x_1, y_1, z_1)$ respectively represent the pixel coordinates
before and after projection. $r_p$ represents the spherical radius after projection.
By reflecting the projected image on the sphere, an unfolded image located in the
same coordinate plane can be obtained. The coordinate calculation method after plane
expansion is shown in Eq. (11).
In Eq. (11), $(x_3, y_3)$ represents the coordinate of the unfolded plane. After the above operation,
the flat unfolded image of the fisheye image can be obtained, and the next step of
image stitching is carried out. The image stitching method used in the study is the
optimal stitching method, but due to factors such as differences in brightness between
images, it can lead to obvious stitching marks and ghosting of moving objects. Therefore,
to solve this problem, a nonlinear weighted fusion algorithm is utilized to achieve
smooth transitions in concatenated images. The best suture line search is shown in
Fig. 5.
From Fig. 5, the pixel values are searched in four different directions along the dotted line.
If the corresponding feature point is found, it is added to the search queue. If multiple
feature points are found, it will add the feature point with the smallest color difference
to the search queue and continue searching [18]. The pixel value in the queue is set to 1 and the remaining pixel values are set
to 0. After weighted optimization of all feature points in the queue, multiple stitching
lines can be obtained. The optimal stitching algorithm is shown in Fig. 6.
Fig. 6. Best suture line algorithm.
From Fig. 6, the algorithm first searches for the overlapping areas of each image. This operation
is achieved through dynamic programming algorithm, which subtracts the pixel grayscale
values of two images. If there is a significant change compared to the original grayscale
values, then the area is an overlapping area. Next, superpixel segmentation is performed
on the overlapping areas, which is achieved through a linear iterative clustering
algorithm. Then it will conduct a suture line search in the overlapping area and find
the best suture line. Next, the pixel values are readjusted and allocated to further
optimize the optimal stitching line [19,
20]. Finally, the optimal fusion region is selected and processed using a nonlinear weighted
fusion algorithm, to make the transition of the stitching area natural. The calculation
method for superpixel segmentation is shown in Eq. (12).
In Eq. (12), $d_c$ denotes the color distance. $l_i$ and $l_j$ represent the imaging distance
of different images. $a_i$ and $a_j$ represent superpixel block edge pixels of different
images. $b_i$ and $b_j$ represent pixels in different images except for edge pixels.
$d_s$ is spatial distance. $D'$ represents the distance between a pixel point and
the center of a pixel block. $p$ represents the max spatial distance within a class.
$Q$ represents the length of pixel blocks. The non-linear weighted fusion expression
is shown in Eq. (13).
Fig. 7. Fish eye image stitching method.
In Eq. (13), $f(x)$ represents the nonlinear fusion function. $[a,b]$ represents the optimal
fusion area range. $k$ represents the best fusion line. $t$ is a constant. The proposed
fisheye image stitching method is shown in Fig. 7.
As shown in Fig. 7, after inputting the fisheye image, it is first standardized and mapped to obtain
an equidistant fisheye image. Then pile up for feature point matching and spherical
projection. Then, based on the feature point matching results, the rotation matrix
is obtained through RANSASR, and the projection parameters are calculated. Then, based
on the matching inliers, determine the adjacency table and combine it with the projection
parameters to make global and local optimal adjustments. Then perform spherical projection
to obtain the deformed image. The panoramic image can be obtained by performing image
fusion.