Maaz Ahmed1
Kang-Sun Choi1
-
(Department of Electrical, Electronics and Communications Engineering, Korea University
of Technology and Education, Cheonan, South Korea.)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Document analysis, Skeletonization, Iterative Closest point, Dice coefficient
1. Introduction
Archaeology and historical analysis play vital roles in uncovering the achievements
of past civilizations, helping us explore ancient innovations and technological milestones.
In this paper, we analyze historical texts from Korea and investigate the oldest printed
books produced using movable metal types. Korea, during the Goryeo Dynasty, is recognized
as the birthplace of this technology, with physical evidence of early printed works
such as the “Sangjeong Yemun” from 1234 and “Jikji” from 1377, the latter being the
oldest surviving book printed with movable metal type [1].
When the Joseon Dynasty succeeded Goryeo in 1392, the use of metal type continued,
primarily for printing texts in Chinese characters (Hanja), which were used for official
and scholarly purposes. After King Sejong created the Korean alphabet, Hangul, in
1443, this new script was also incorporated into movable metal type printing, although
Hanja remained dominant in formal contexts. Despite the historical importance of both
the Chinese and Hangul metal types, few examples survive today. Researchers analyze
the remaining books from this period using high-resolution scans to extract and group
character images, providing valuable insights into the scale of early metal type printing.
The introduction of movable metal type revolutionized the mass production of books,
greatly accelerating the spread of knowledge. During the Joseon Dynasty, metal movable
types, primarily made from copper, played a pivotal role in advancing Korea's printing
culture. The process involved assembling individual characters into a metal frame,
applying ink to the typefaces, and pressing them onto paper. As shown in Fig. 1, this method proved more efficient and durable than traditional woodblock printing,
as the metal types resisted deformation over time, ensuring consistent character reproduction.
Fig. 1. The process of metal letterpress printing starts with the type selection,
proceeding to typesetting and inking. The inked chase is then pressed on the paper,
and the printed pages are then bound in the form of a book.
Analyzing the fonts and stylistic features of printed books provides valuable insights
into the time periods in which they were produced, allowing us to trace the technological
evolution of printing methods. In early Korean printing, each letter could appear
multiple times within a single metal chase, leading to the use of multiple typefaces
for a single character. Factors such as printing pressure, amount of ink applied,
and the paper material can cause variations in the thickness and shape of strokes,
even when the same metal type is used. Additional distortions may arise from factors
such as noise, paper damage, or wear over time.
Traditional computational techniques, such as the Dice coefficient for measuring similarity,
image registration, classification, segmentation, and feature extraction [2-
5], are commonly used in digital image analysis. Applying these methods to digital images
of historical Korean books can help recover insights lost due to wars and natural
disasters. By studying printed characters, we can estimate the number of metal types
manufactured during a particular period.
In conventional analysis [6], each character in the scanned images is extracted and grouped by visual similarity,
as demonstrated in Fig. 2. Although each group represents the same character, variations arise due to the limitations
of early metal type production, such as inconsistencies in typeface creation. To refine
this, groups of identical characters are further divided into subgroups based on specific
metal type variants. By aligning and averaging the images within each subgroup, we
can recreate a 3D model of the typeface, and the number of subgroups provides an estimate
of the number of metal types used in printing during that era.
Fig. 2. Overall process of the type reconstruction, starting from scanning, character
segmentation, character classification, and character grouping followed using the
method proposed by Jeong et al. [6], proceeding to metal type grouping proposed in this paper.
A critical challenge in this process is to determine the similarity between images
of the same character, which requires a nuanced approach. Unlike traditional region-based
image analysis methods, character image comparison focuses on identifying whether
the characters were printed using the same metal type. Given the variability introduced
by manual inking and pressing, the stroke width can vary between prints. Therefore,
the similarity measure must capture structural differences in stroke shapes while
ignoring irrelevant variations in stroke width.
This paper proposes a novel approach designed to address these specific challenges.
Our method consists of three key components: skeletonization of character images,
alignment via skeleton registration, and a similarity measure applied to the aligned
skeletons. The proposed approach offers more reliable results compared to conventional
techniques [6].
We explore various image preprocessing techniques, including Erosion and Skeletonization,
to enhance character comparison. We also explore different metrics for evaluating
similarity between images, such as the mean average distance, Dice coefficient, and
Wasserstein distance. After extensive testing on numerous character pairs, we devised
a comparison algorithm that employs binarized character images, skeletonization, and
Iterative Closest Point (ICP) image registration, followed by similarity computation.
The resulting numerical similarity score helps determine whether two characters were
printed using the same metal type. Notably, the Dice coefficient proved to be the
most reliable metric for our analysis. Additionally, we compared our method with the
conventional approach [6], concluding that our algorithm performs efficiently in terms of both accuracy and
computational time.
In the subsequent sections, we detail our methodology and present the outcomes of
our experiments, along with their implications for historical research and technological
advancements.
2. Related Work
2.1. Character Extraction, Segmentation, and Recognition
Character extraction, segmentation, and recognition are crucial stages in optical
character recognition (OCR) [7]. These processes involve extracting characters from images and printed documents
for information extraction [8]. Various methods, such as water flow, neural networks, endpoint, Euler number, and
Fourier transform, are used for character extraction to prepare data for further analysis
[9].
Segmentation divides characters into subunits for detailed analysis. Techniques, like
Hough transform, neighborhood connected units, and vertical/horizontal projection
profiles, are employed for text segmentation [10-
12]. Segmentation is vital for converting images into matrices for in-depth analysis.
Character recognition identifies and converts characters into machine-readable text.
Template matching, statistical techniques like k-nearest neighbor and fuzzy set reasoning,
and stroke recognition methods are used for character recognition [13-
15]. These techniques play a crucial role in OCR for data analysis. [14,
15]
2.2. Skeletonization
Skeletonization reduces binary images to one-pixel representations, aiding in effective
2-D and 3-D object display for recognition, retrieval, and analysis [16]. Pervouchine et al. applied skeletonization to extract strokes from handwritten documents,
benefiting forensic analysis but being content-dependent [17]. Ko et al. faced challenges adapting skeletonization to diverse characters, such
as Chinese, Korean, and Japanese fonts, proposing SkelGAN for Korean Hangul [18].
Various algorithms, like Zhang-Suen (ZS)-series and one-pass thinning (OPTA)-series
algorithms, enhance character skeletonization by addressing noise issues; ZS-series
deals with boundary noise, while OPTA offers faster thinning [19]. Al-Maadeed used text skeletonization with codebook generation for writer identification,
achieving high accuracy and addressing ethical concerns [20]. This approach works well with Greek and Latin languages and some other open-source
datasets [20].
2.3. Image Registration
Image registration aligns images from different time intervals, aiding spatial correspondence
in documents [21]. Pan et al. describe two common approaches: intensity-based, which focuses on grey
level intensity areas, and feature-based, which selects common features for optimal
results [22].
Zhao et al. highlight the interest in using image registration with a deep learning
method called recursive cascade architecture for deformed medical images and documents,
offering valuable data for research [23]. Wang et al. use the iterative closest point (ICP) method for image registration
based on point distances, addressing its time consumption with M-estimation and weighting
approaches [24]. He et al. apply ICP to register 3D points using geometric features, avoiding local
extremum issues and registration errors by considering density, surface normal, angle
between data points, and curvature features [25].
2.4. Wasserstein Distance
Wasserstein Distance is a valuable tool for measuring similarities or distances between
probability distributions, widely applied in document analysis, character similarity,
and comparison, as well as in forensic science, research, and probability theory [26]. In practice, it finds use in image retrieval and computer vision, focusing on the
cost of mass transportation between points for tasks like image synthesis and style
transfer.
Ye et al. employed Wasserstein Distance to measure text sequence similarity, leveraging
probability distribution functions, though it can be affected when character distances
vary [27]. Fournier et al. noted challenges in applying Wasserstein Distance, especially with
complex high-dimensional data, where it may not always effectively measure similarity
between probability distributions [28]. Kolouri et al. introduced modified sliced Wasserstein Distance, reducing computational
demands by using one-dimensional linear projections for more accurate probability
measurements, particularly suitable for generative modeling in research science [29]. Piccoli et al. utilized generalized Wasserstein distance to handle the transport
equation, effectively addressing the Cauchy problem for vector fields and closely
spaced sources, resembling Lipschitzian behavior [30].
2.5. Mean Euclidean Distance
The mean Euclidean distance calculates the average distance between two data points
in Euclidean space, primarily used for character identification in text files [31]. It is effective when data points are in the same dimension [31]. In statistical analysis and machine learning, it aids in measuring similarity and
dissimilarity by considering the structure and shapes of objects in documents.
You et al. introduced a method to approximate mean Euclidean distance using the Yinyang
k-means clustering approach, addressing memory consumption issues and enhancing similarity
detection between input dimensions [32]. Berthold et al. proposed modifying the k-means algorithm for time series comparisons
by adding squared Euclidean distance and incorporating the Pearson Correlation coefficient.
This approach effectively normalized input data for more accurate comparisons [33]. Hammoud et al. found Euclidean distance useful for pre-defined multi-dimensional
constellation sets, improving accuracy by eliminating noise through iterative methods
in 2D and 3D estimators [34]. Lee et al. highlighted the effectiveness of Euclidean Support Vector Machine (E-SVM)
for text document classification, especially with kernel functions. While E-SVM offers
better classification accuracy than conventional SVM, it comes with increased processing
time due to distance calculations between document data points [35].
2.6. Dice Coefficient
The Dice coefficient, or Sørensen-Dice index, assesses character similarity by calculating
the ratio of shared elements in datasets, commonly used in image segmentation, forensic
document analysis, and medical image processing to verify accuracy [36,
37]. Higher Dice coefficients signify greater similarity, correlating with improved image
segmentation performance.
The dice coefficient is a measure of similarity between two given images. The Dice
Coefficient D is defined as:
where $N_r$ is defined as the set of pixels in the reference image and $N_i$ is defined
as the set of pixels in the input image.
Liang et al. emphasize the Dice coefficient's role in statistically validating manual
and automated medical report segmentation for patient diagnosis but highlight the
need to address false negatives, false positives, and class imbalance in model design
[38]. Oco et al. point out another significant use of the Dice coefficient in identifying
language similarity through trigram profiles in documents, particularly in Philippine
language documents. This approach has potential applications in phonetic transcription
and text analysis, revealing an average difference of 27% in language tree comparisons
[39].
2.7. Image Clustering
Korean historical books, such as The Song of Enlightenment, utilized moveable metal
types for printing, resulting in multiple versions with varying metal types. Yoo employed
image processing techniques, including clustering, to identify metal casting defects
in these books [40].
Lai and Lee applied k-means clustering for binarizing Korean characters, with 3-means
clustering outperforming 2-means clustering in character detection and binarization
quality [41]. Jo et al. used vertical projection and clustering for character segmentation in
Korean printed text, achieving high-speed segmentation with 99.25% accuracy and effectively
managing touching character issues [42]. Song explored the unique history of Korean characters, which differed from China
and Japan, highlighting the use of moveable metal types for preserving information
in books from the 14th to 19th centuries. Modern image analysis and grouping techniques
helped identify changes in characters due to the issues with smoothness and durability
of carved text on metal types [43]. Similarly, OK utilized image feature extraction and clustering techniques to characterize
Korean movable metal type ancient books, shedding light on history and differentiating
Korean texts from those of China and Japan [44]. This analysis aided in understanding the distinctiveness of Korean historical texts,
which were often considered similar to Chinese and Japanese [44].
3. The Proposed Method
Fig. 3 presents a brief visual representation of the character comparison part of the algorithm
proposed in this paper.
Fig. 3. Architecture for the character comparison, starting from pre-processing, proceeding
to ICP Registration and finally towards similarity computation.
3.1. Characters Extraction, Segmentation, and Classification
The data extraction process begins with high-resolution scans using a flatbed scanner
to minimize distortion. These scans include not only printed characters but also non-textual
elements like seals, lines, and noise, which must be removed from the dataset. A Trans-UNET
network, customized with a specific loss function, is employed to clean the noise
and rebuild any broken characters [45]. This architecture allows for accurate pixel-wise segmentation of character strokes,
even in degraded or noisy input images, which is critical for reconstructing historical
typefaces. The Trans-UNET network excels in capturing the intricate details of these
characters, ensuring a more precise output. The dataset includes 13,500 image patches,
cropped and augmented from the scanned images, with 12,150 patches used for training
and 1,350 for testing. Each patch has a resolution of 1024x1024 pixels, and random
rotations and scales are applied to ensure the generalization of the model.
To further refine character extraction, the Boundary Gaussian Distance (BGD) loss
function is used to enhance the smoothness of character boundaries [45]. This method improves upon traditional techniques by applying Gaussian smoothing
to ground-truth labels, resulting in clearer, more accurate segmentation, especially
when handling ancient documents with rough, noisy strokes. The combination of BGD
loss with Dice loss significantly enhances the clarity of the extracted characters,
effectively addressing issues like ink bleeds, noise, and missing strokes. The model
performs well in difficult cases such as blurred prints and faded characters, demonstrating
its effectiveness in processing degraded historical documents.
After noise-free pages are obtained, they are then processed using the segmentation
method by Beom et al. [6] that involved detection of each character and a convex hull for segmentation. The
segmented characters are then fed to PaddleOCR to classify them with respect to their
ASCII codes, which are saved in directories corresponding to each character for further
analysis [46].
3.2. Skeletonization
Once we obtain the binarized images of any two given characters, we then pre-process
the images before performing image registration. The reason behind introducing these
pre-processing techniques is to cater to the cases where the width of the character
is different even though it has been printed using the same type-head as another character.
Different widths occur due to the amount of ink applied while printing the character,
the greater the amount of ink, the thicker the character will be. Another reason behind
that was that without erosion or skeletonization, the union functions that we later
use for similarity measurement yield a greater value due to a larger area being part
of the characters overlapping, and lesser values of the distance functions because
the characters are closer to each other, even if printed using different type-heads.
For the skeletonization process, we use an iterative parallel thinning algorithm [47]. We start with a black pixel p in a 3x3 window, with the neighbors named $x_1, x_2,
x_3, \dots, x_8$, and all the nine values are collectively denoted by $N_{(p)}$. The
odd-numbered values of $x_i$ are all the adjacent neighbors of p. The number of black
pixels in $N_{(p)}$ is denoted by $b_{(p)}$. In our conditions, we also use the crossing
number $X_H(p)$ defined as the number of times we cross from a white point to a black
point when iterated over $N_{(p)}$ in order.
Fig. 4. A visual representation of $N_{(p)}$.
Fig. 4 shows a visual representation of $N_{(p)}$. The algorithm we used for skeletonization
consists of two sub-iterations for a single iteration. In the first sub-iteration,
we delete the pixel p if the conditions 1, 2, and 3 are met, and in the second sub-iteration,
we delete the pixel p if the conditions 1, 2, and 4 are met.
Condition 1:
where
Condition 2:
where
Condition 3:
Condition 4:
In our case, we use n as infinity, making the whole skeletonization process go on
iteratively until the length of the given image gets to one pixel thick.
3.3. Image Registration Using Iterative Closest Point (ICP)
Image registration plays a critical role in ensuring accurate comparison between printed
characters, as unaligned images can result in significant errors when computing similarity
metrics. To align images before comparison, we use the Iterative Closest Point (ICP)
algorithm, a well-established method for registering two point-clouds [48]. In our process, we convert the two given images (a fixed and a moving image) into
3D point-clouds, where each pixel of the image corresponds to a point in space. Specifically,
we extract the pixel coordinates of the white pixels (representing character strokes)
from the 2D images and then add a third dimension (z = 0) to treat the images as 3D
point-clouds suitable for ICP processing.
The ICP algorithm consists of two primary steps: (1) finding correspondences between
points in the two point-clouds (i.e., matching the closest points from the moving
image to the fixed image), and (2) computing the transformation (rotation and translation)
that minimizes the distance between these correspondences. This process is repeated
iteratively until the transformation converges, meaning that the difference between
the moving and fixed point-clouds is minimized. During each iteration, we calculate
the distance between the closest corresponding points, and the transformation is adjusted
to bring the moving image closer to the fixed image. The algorithm stops when the
change in the transformation matrix becomes negligible, indicating that the best alignment
has been achieved.
To handle cases where the two images do not overlap perfectly, we use a matching threshold
$d_{max}$. This threshold is a part of generic ICP functions provided in MATLAB and
helps discard correspondences between points that are too far apart, which could otherwise
reduce the accuracy of the alignment. Once the ICP algorithm has converged, the final
transformation matrix is applied to the moving image, aligning it with the fixed image.
The registered image can then be compared directly with the fixed image using similarity
metrics.
In our approach, we convert 2D images into point-clouds, apply the ICP algorithm for
alignment, and then re-convert the registered point-cloud into an image format. This
ensures that the characters are aligned correctly before similarity computations are
performed, minimizing errors due to misalignment. The detailed steps of the ICP algorithm
and the transformation process are essential for accurately comparing characters that
may have subtle variations due to printing inconsistencies.
Algorithm 1: Generic ICP algorithm.
3.4. Similarity Computation
A lot of research has been done in the past with respect to images and graphs for
similarity computation. For our use case, we tried a few state-of-the-art methods
later followed by a comparative analysis to conclude the best one for our character
similarity measurement. The methods are described as follows:
3.4.1 Wasserstein Distance
Wasserstein Metric is defined as the distance measure between the probability measured
between a given metric space $(M, p)$, where $p(x, y)$ a distance function for the
instances $x$ and $y$, in the set $M$ [49].
where $P$, $Q$ are two probability measures on M with a finite $p$-th moment and $\Gamma(P,Q)$
is the set of all measures on $M \times M$ with marginals $P$ and $Q$. Wasserstein
metric arises in the problem of optimal transport: $\mu(x, y)$ can be viewed as a
randomized policy for transporting a unit quantity of some material from a random
location $x$ to another location $y$ while satisfying the marginal constraint $x \sim
P$ and $y \sim Q$. If the cost of transporting a unit of material from $x \in P$ and
$y \in Q$ is given by $p(x, y)^p$ then $W_p(P,Q)$ is the minimum expected transport
cost.
To apply the Wasserstein metric, we create arrays of the sum of white pixels in each
row and column respectively, and then use those histograms as an input for the equation
mentioned above. This leaves us with the probabilities of the lit pixels in each row
and column that we concatenate in the respective input arrays. We then use these probabilities
as the input for the equation and get a numerical distance value as an output.
3.4.2 Mean Euclidean Distance
We apply a distance function to our fixed image and the registered image we obtain
from the previous step. For the distance function, we are using Euclidean distance.
It is defined as the length of a straight line between two points in Euclidean space.
Given that we have two points, $A(x_A, y_A)$ and $B(x_B, y_B)$, the Euclidean distance
D between them is defined as follows:
This process is then repeated for each element in the point sets obtained from the
images.
The next step consists of getting the minimum value for each column in the obtained
array which is calculated using the following equation:
In the equation above, $n$ defines the number of columns in $d(A,B)$.
The final step consists of getting the mean distance. The mean distance is obtained
as follows:
After comparing the results obtained from this method on different images, we were
able to see a pattern that the more similar the image, the lesser we obtain the mean
distance.
We implement an algorithm to group similar images based on their Dice similarity score,
represented in Algorithm 2. It begins by loading all images $I$ from a specified folder
and sorting them according to their page numbers extracted from the file names. This
step ensures that images from the same page are not incorrectly grouped together.
Once the images are loaded and sorted, the algorithm calculates the Dice similarity
score, $D_{ij}$, for every unique pair of images $(I_i, I_j)$ that belong to different
page numbers. The Dice scores for all image pairs are stored in a matrix $D$ for later
use.
A threshold value $\theta$ is defined to determine if two images are considered similar
enough to be placed within the same group. The grouping process then iterates through
each image in $I$, starting from the first. This initial image is assigned to the
first group in $G$. The algorithm checks the Dice scores between the first image and
all subsequent images, adding any images with $D_{ij} > \theta$ to the first group.
Algorithm 2: Type Grouping Algorithm
This process of expanding the initial group continues until all images below the threshold
are reached. New groups are then created, and the assignment is repeated for the remaining
unassigned images.
Finally, the groups are organized and printed to view the final groupings. Unique
group IDs $g_k$ are determined and each group is represented as a cell in the array
$G$, containing the images indices assigned to that group. This allows for easy visualization
of how the algorithm has clustered visually similar images together based on their
computed Dice similarity scores.
4. Experimental Results
After we obtain the character images sorted with respect to their ASCII codes, we
then proceed toward applying the comparison metrics. We start with the pre-processing
of our images. Fig. 5 shows the results of applying the above-discussed pre-processing techniques to four
characters as an example to examine the best method for our use case. As we discussed
in the introduction, the thickness of the strokes can vary due to the amount of ink
or pressure applied, we had to abandon the erosion method, as that method does not
cater to the use cases where the character thickness is different. In the case of
a thick and thin character printed using the same metal type, the eroded images will
not give a higher similarity measure, as we would have greater distances and smaller
unions; therefore, skeletonization turned out to be the best method for this use case.
Fig. 5. Pre-processing results on four different characters. For each character, each
column shows the original, eroded and the skeletonized character respectively.
Fig. 6 shows a comparison of the metrics discussed in Subsection 3.4. In the overlaid images,
the red lines represent the original moving image, the green skeleton is our fixed
image and the blue one is the registered image. As observed in the results, we can
conclude that the point-set distance method in this comparison has totally failed.
We also recognize that Wasserstein distance has performed well, and the results represent
the distance value in the different images to be twice that in the similar image.
We can further observe that the dice coefficient returned the value of the same metal-type
images to be 2.5 times the value of the different metal-type images, concluding Dice
to be the best comparison metric for the following pair of images. Furthermore, examining
the next example of the “Sip” character, we observe that the point-set distance function
has performed reasonably well, as opposed to the previous example, but the difference
is not significant enough to be used as a reliable comparison metric. If we examine
the Wasserstein distance in the second example, we observe the distance value to be
around thrice in the different images than that of the same metal-type printed character
images, and almost the same multiple was observed while examining the dice values.
Fig. 6. Results obtained after the comparative analysis using the methods proposed.
In the figure, MED refers to the Mean Euclidean Distance, WD refers to the Wasserstein
Distance and the last column shows the values for the Dice Coefficient.
Although Wasserstein distance does demonstrate promising results, we observed the
results to be inconsistent with the type, making it difficult for us to use it as
a comparison metric. While comparing the values for additional sets of characters
of different and same metal types, we observed the results produced by the dice coefficients
to be consistent as opposed to the results produced by other methods, thus concluding
our final comparison metric to be the Dice Coefficient.
In order to compare registration results of the proposed method with the conventional
one, presented in Fig. 7, the skeletonized version of the character strokes are presented in the overlaid
images, where the fixed, the registered, and the overlapped data are illustrated in
red, blue, and magenta, respectively.
Fig. 7. The proposed similarity measure for the character comparison. (a) Shows a
case where conventional registration failed to properly align the characters. (b)
Demonstrates a successful registration case where the alignment was achieved.
The failure rate of the conventional image registration method reaches a high number
due to the lack of features for tracking in the binarized stroke regions. The registration
failure yields unreliable dice coefficients, potentially leading to the erroneous
metal type grouping of different character images. Successful registration makes the
dice coefficient of the same type set higher than that of the different type set.
Nonetheless, the dice coefficients for the two sets exhibit proximity, signifying
that the conventional method lacks discriminatory power.
Using the Algorithm 2, the groups are organized and printed to view the final groupings.
Fig. 8 represents some of the results obtained using the grouping algorithm, where the characters
dotted in red dots represent the ones grouped incorrectly. Using the proposed method,
the grouping of the first character was done perfectly when compared using the manually
overlaid pairs, whereas we found one anomaly in the second group. Analyzing the results
in the conventional method, we can observe that the rate of failure in grouping the
characters is significantly higher. The results were obtained for multiple character
pairs, and the obtained results help us conclude that the proposed method is significantly
better as the results obtained without the pre-processing steps, using the conventional
method.
Fig. 8. Grouping Results using the Proposed and Conventional Method. The dotted images
show anomalies in the GT data that was labelled manually.
Table 1 represents the time taken for the proposed method as compared to the conventional
method. We can clearly observe that the proposed method is 45 times faster as compared
to the conventional method, also yielding a much better accuracy. Furthermore, even
after the pre-processing steps, we can analyze the proposed method doesn't really
affect the computation time.
Table 1. Computation time comparison in milliseconds for the conventional method as
opposed to the proposed method.
|
Method
|
Skeletonization
|
Registration
|
Dice
|
|
Conventional
|
-
|
477.7
|
0.4
|
|
Proposed
|
2
|
10.6
|
0.4
|
5. Conclusion
In this paper, we have presented a novel approach for character recognition and grouping
of ancient Korean movable metal types printed during the Joseon dynasty. A three-step
process of skeletonization, registration, and similarity metric computation was introduced
to accurately compare digitized character images from historical books.
Extensive experimentation evaluated various pre-processing techniques and similarity
metrics. The Dice coefficient emerged as the most reliable metric for discriminating
typeface variations. An algorithm was developed applying the proposed approach and
it outperformed conventional methods, achieving up to 45 times faster computation
time while improving character grouping accuracy.
One of the limitations of our methods is that the type grouping method is not dynamic,
the threshold has to be calibrated and provided manually, and one of our future works
include creating a clustering algorithm that would dynamically calculate the thresholds,
the number of clusters and classify within the clusters. Another shortcoming we encountered
was related to the skeletonization method, as we sometimes observed branches and junctions
appearing as artifacts, leading to erroneous results. We plan on trying to resolve
that problem by training a Generative Network for skeletonization, that would focus
on the strokes resulting in images clean from additional artifacts.
The methodology provides a reliable means of analyzing surviving printed works to
gain a new understanding of the production scale and evolution of metal type printing
technologies during this pivotal period. Such insights expand our knowledge of associated
cultural and intellectual developments enabled through wider dissemination of knowledge
via moveable type printing.
Acknowledgements
This work was supported by the National Research Foundation of Korea(NRF) grant funded
by the Korea government(MSIT) (RS-2025-24683900). This work was supported by the BK-21
FOUR program through the National Research Foundation of Korea (NRF) under the Ministry
of Education.
References
Park H. O. , 2014, The history of pre-Gutenberg woodblock and movable type printing
in Korea, Int. J. Humanit. Soc. Sci., Vol. 4, No. 9

Narang S. R. , Jindal M. K. , Kumar M. , 2019, Devanagari ancient documents recognition
using statistical feature extraction techniques, Sādhanā, Vol. 44, No. 6

Kim M. S. , Cho K. T. , Kwag H. K. , Kim J. H. , 2004, Digitalizing scheme of
handwritten hanja historical documents, Proc. of First International Workshop on Document
Image Analysis for Libraries, pp. 321-327

Maitra D. S. , Bhattacharya U. , Parui S. K. , 2015, CNN based common approach
to handwritten character recognition of multiple scripts, Proc. of 13th International
Conference on Document Analysis and Recognition (ICDAR), pp. 1021-1025

Kim M. S. , Cho K. T. , Kwag H. K. , Kim J. H. , 2004, Segmentation of handwritten
characters for digitalizing Korean historical documents, Document Analysis Systems
VI, pp. 114-124

Jeong B.-C. , Choi K.-S. , 2022, 3-D movable type reconstruction from old printed
documents using deep learning-based character extraction and recognition, Journal
of the Institute of Electronics and Information Engineers, Vol. 59, No. 9, pp. 74-83

Majid N. , Barney Smith E. H. , 2022, Character spotting and autonomous tagging:
offline handwriting recognition for Bangla, Korean and other alphabetic scripts, International
Journal on Document Analysis and Recognition, Vol. 25, No. 4, pp. 245-263

Hamad K. , Kaya M. , 2016, A detailed analysis of optical character recognition
technology, International Journal of Applied Mathematics Electronics and Computers,
No. Special Issue-1, pp. 244-249

Park T. , Jung J. , Cho J. , 2016, A method for automatically translating print
books into electronic Braille books, Science China Information Sciences, Vol. 59

Razak Z. , Zulkiflee K. , Salleh R. , Yaacob M. , Tay Y. , 2008, Off-line handwriting
text line segmentation: a review, International Journal of Computer Science and Network
Security, Vol. 8, No. 7, pp. 12-20

Tappert C. C. , Suen C. Y. , Wakahara T. , 1990, The state of the art in online
handwriting recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 12, No. 8, pp. 787-808

Likforman-Sulem L. , Zahour A. , Taconet B. , 2007, Text line segmentation of historical
documents: a survey, International Journal on Document Analysis and Recognition, Vol.
9, pp. 123-138

Narang S. R. , Jindal M. K. , Kumar M. , 2020, Ancient text recognition: a review,
Artificial Intelligence Review, Vol. 53, pp. 5517-5558

Das S. , Banerjee S. , 2014, Survey of pattern recognition approaches in Japanese
character recognition, International Journal of Computer Science and Information Technology,
Vol. 5, No. 1, pp. 93-99

Jo J. , Lee J. , Lee Y. , 2009, Stroke-based online hangul/Korean character recognition,
Proceedings of the Chinese Conference on Pattern Recognition, pp. 1-5

Jerripothula K. R. , Cai J. , Lu J. , Yuan J. , 2021, Image co-skeletonization
via co-segmentation, IEEE Transactions on Image Processing, Vol. 30, pp. 2784-2797

Pervouchine V. , Leedham G. , Melikhov K. , 2005, Handwritten character skeletonisation
for forensic document analysis, Proc. of the ACM Symposium on Applied Computing, pp.
754-758

Ko D. H. , Hassan A. U. , Majeed S. , Choi J. , 2021, SkelGAN: A font image skeletonization
method, Journal of Information Processing Systems, Vol. 17, No. 1, pp. 1-13

Ma X. , Ren X. , Tsviatkou V. Y. , Kanapelka V. K. , 2022, A novel fully parallel
skeletonization algorithm, Pattern Analysis and Applications, Vol. 25, pp. 1-20

Al-Maadeed S. , Hassaine A. , Bouridan A. , 2014, Using codebooks generated from
text skeletonization for forensic writer identification, Proc. of the IEEE/ACS International
Conference on Computer Systems and Applications, pp. 729-733

Ma J. , Zhou H. , Zhao J. , Gao Y. , Jiang J. , Tian J. , 2015, Robust feature
matching for remote sensing image registration via locally linear transforming, IEEE
Transactions on Geoscience and Remote Sensing, Vol. 53, No. 12, pp. 6469-6481

Pan M. , Tang J. , Rong Q. , Zhang F. , 2011, Medical image registration using
modified iterative closest points, International Journal for Numerical Methods in
Biomedical Engineering, Vol. 27, No. 8, pp. 1150-1166

Zhao S. , Dong Y. , Chang E. I. , Xu Y. , 2019, Recursive cascaded networks for
unsupervised medical image registration, Proc. of the IEEE/CVF International Conference
on Computer Vision, pp. 10600-10610

Wang X. , Zhao Z.-L. , Capps A. G. , Hamann B. , 2017, An iterative closest point
approach for the registration of volumetric human retina image data obtained by optical
coherence tomography, Multimedia Tools and Applications, Vol. 76, pp. 6843-6857

He Y. , Liang X. , Chen X. , Zhang Z. , Zhang J. , 2017, An iterative closest
points algorithm for registration of 3D laser scanner point clouds with geometric
features, Sensors, Vol. 17, No. 8

Panaretos V. M. , Zemel Y. , 2019, Statistical aspects of Wasserstein distances,
Annual Review of Statistics and Its Application, Vol. 6, pp. 405-431

Ye J. , Wu P. , Wang J. Z. , Li J. , 2017, Fast discrete distribution clustering
using Wasserstein barycenter with sparse support, IEEE Transactions on Signal Processing,
Vol. 65, No. 9, pp. 2317-2332

Fournier N. , Guillin A. , 2015, On the rate of convergence in Wasserstein distance
of the empirical measure, Probability Theory and Related Fields, Vol. 162, No. 3-4,
pp. 707-738

Kolouri S. , Nadjahi K. , Simsekli U. , Badeau R. , Rohde G. , 2019, Generalized
sliced Wasserstein distances, Advances in Neural Information Processing Systems, Vol.
32

Piccoli B. , Rossi F. , 2014, Generalized Wasserstein distance to handle the transport
equation, Archive for Rational Mechanics and Analysis, Vol. 211, pp. 335-358

Bottesch T. , Bühler T. , Kächele M. , 2016, Speeding up k-means by approximating
Euclidean distances via block vectors, Proc. of the International Conference on Machine
Learning, pp. 2578-2586

Faisal M. , Zamzami E. , 2020, Comparative analysis of inter-centroid K-means performance
using Euclidean distance, Canberra distance and Manhattan distance, Journal of Physics:
Conference Series, Vol. 1566, No. 1

You L. , Jiang H. , Hu J. , Chang C. , Chen L. , Cui X. , Zhao M. , 2022, GPU-accelerated
faster mean shift with Euclidean distance metrics, Proc. of the IEEE Annual Computer
Software and Applications Conference, pp. 211-216

Berthold M. R. , Höppner F. , 2016, On clustering time series using Euclidean distance
and Pearson correlation, arXiv preprint arXiv:1601.02213

Hammoud B. , Daou G. , Wehn N. , 2022, Multidimensional minimum Euclidean distance
approach using radar reflectivities for oil slick thickness estimation, Sensors, Vol.
22, No. 4

Jha S. , 2019, Neutrosophic image segmentation with Dice coefficients, Measurement,
Vol. 134, pp. 762-772

Alroy J. , 2015, A new twist on a very old binary similarity coefficient, Ecology,
Vol. 96, No. 2, pp. 575-586

Liang S. , Tang F. , Huang X. , Yang K. , Zhong T. , Hu R. , Liu S. , Yuan
X. , Zhang Y. , 2019, Deep-learning-based detection and segmentation of organs at
risk in nasopharyngeal carcinoma computed tomographic images for radiotherapy planning,
European Radiology, Vol. 29, pp. 1961-1967

Oco N. , Syliongka L. R. , Roxas R. E. O. , Ilao J. P. , 2013, Dice's coefficient
on trigram profiles as metric for language similarity, Proc. of the International
Conference on Oriental COCOSDA and Conference on Asian Spoken Language Research and
Evaluation, pp. 1-4

Yoo W. S. , 2022, Direct evidence of metal type printing in The Song of Enlightenment,
Korea, 1239, Heritage, Vol. 5, No. 4, pp. 1719-1734

Lai A.-N. , Lee G. , 2008, Binarization by local K-means clustering for Korean text
extraction, Proc. of the IEEE International Symposium on Signal Processing and Information
Technology, pp. 117-122

Wahyono , Jo K.-H. , 2012, A clustering strategy for touching characters in Korean
and English printed text segmentation, Proc. of the International Conference on Ubiquitous
Robots and Ambient Intelligence, pp. 23-25

Song M. K. , 2009, The history and characteristics of traditional Korean books and
bookbinding, Journal of the Institute of Conservation, Vol. 32, pp. 53-78

Ok Y. J. , 2020, The present situation and characteristics of Korean rare books in
the Shanghai Library of China, Journal of Studies in Bibliography, Vol. 84, pp. 99-120

Lee W. S. , Choi K. S. , 2024, Boundary Gaussian distance loss function for enhancing
character extraction from high-resolution scans of ancient metal-type printed books,
Electronics, Vol. 13, No. 10

Du Y. , 2020, PP-OCR: A practical ultra lightweight OCR system, arXiv preprint arXiv:2009.09941

Tappert C. C. , Suen C. Y. , Wakahara T. , 1990, The state of the art in online
handwriting recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 12, No. 8, pp. 787-808

Besl P. J. , McKay N. D. , 1992, A method for registration of 3-D shapes, IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 14, No. 2, pp. 239-256

Shen J. , Qu Y. , Zhang W. , Yu Y. , 2017, Wasserstein distance guided representation
learning for domain adaptation, arXiv preprint arXiv:1707.01217

Maaz Ahmed is a graduate of the Korea University of Technology and Education, where
he earned a master's degree in future convergence engineering with a focus on computer
vision. His research interests include machine learning, and computer vision, and
he has contributed to several innovative projects in both academic and industrial
settings.
Kang-Sun Choi received a Ph.D. degree in nonlinear filter design in 2003, an M.S.
degree in 1999, and a B.S. degree in 1997 in electronic engineering from Korea University.
In 2011, he joined the School of Electrical, Electronics & Communication Engineering
at Korea University of Technology and Education, where he is currently a professor.
In 2017, he was a visiting scholar at the University of California, Los Angeles. From
2008 to 2010, he was a research professor in the Department of Electronic Engineering
at Korea University. From 2005 to 2008, he worked in Samsung Electronics, Korea, as
a senior software engineer. From 2003 to 2005, he was a visiting scholar at the University
of Southern California. His research interests are in the areas of deep learning-based
semantic segmentation, multimodal sensor calibration, human-robot interaction, and
culture technology. He is a recipient of an IEEE International Conference on Consumer
Electronics Special Merit Award (2012).