Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (Department of Electrical, Electronics and Communications Engineering, Korea University of Technology and Education, Cheonan, South Korea.)



Document analysis, Skeletonization, Iterative Closest point, Dice coefficient

1. Introduction

Archaeology and historical analysis play vital roles in uncovering the achievements of past civilizations, helping us explore ancient innovations and technological milestones. In this paper, we analyze historical texts from Korea and investigate the oldest printed books produced using movable metal types. Korea, during the Goryeo Dynasty, is recognized as the birthplace of this technology, with physical evidence of early printed works such as the “Sangjeong Yemun” from 1234 and “Jikji” from 1377, the latter being the oldest surviving book printed with movable metal type [1].

When the Joseon Dynasty succeeded Goryeo in 1392, the use of metal type continued, primarily for printing texts in Chinese characters (Hanja), which were used for official and scholarly purposes. After King Sejong created the Korean alphabet, Hangul, in 1443, this new script was also incorporated into movable metal type printing, although Hanja remained dominant in formal contexts. Despite the historical importance of both the Chinese and Hangul metal types, few examples survive today. Researchers analyze the remaining books from this period using high-resolution scans to extract and group character images, providing valuable insights into the scale of early metal type printing.

The introduction of movable metal type revolutionized the mass production of books, greatly accelerating the spread of knowledge. During the Joseon Dynasty, metal movable types, primarily made from copper, played a pivotal role in advancing Korea's printing culture. The process involved assembling individual characters into a metal frame, applying ink to the typefaces, and pressing them onto paper. As shown in Fig. 1, this method proved more efficient and durable than traditional woodblock printing, as the metal types resisted deformation over time, ensuring consistent character reproduction.

Fig. 1. The process of metal letterpress printing starts with the type selection, proceeding to typesetting and inking. The inked chase is then pressed on the paper, and the printed pages are then bound in the form of a book.

../../Resources/ieie/IEIESPC.2026.15.1.12/fig1.png

Analyzing the fonts and stylistic features of printed books provides valuable insights into the time periods in which they were produced, allowing us to trace the technological evolution of printing methods. In early Korean printing, each letter could appear multiple times within a single metal chase, leading to the use of multiple typefaces for a single character. Factors such as printing pressure, amount of ink applied, and the paper material can cause variations in the thickness and shape of strokes, even when the same metal type is used. Additional distortions may arise from factors such as noise, paper damage, or wear over time.

Traditional computational techniques, such as the Dice coefficient for measuring similarity, image registration, classification, segmentation, and feature extraction [2- 5], are commonly used in digital image analysis. Applying these methods to digital images of historical Korean books can help recover insights lost due to wars and natural disasters. By studying printed characters, we can estimate the number of metal types manufactured during a particular period.

In conventional analysis [6], each character in the scanned images is extracted and grouped by visual similarity, as demonstrated in Fig. 2. Although each group represents the same character, variations arise due to the limitations of early metal type production, such as inconsistencies in typeface creation. To refine this, groups of identical characters are further divided into subgroups based on specific metal type variants. By aligning and averaging the images within each subgroup, we can recreate a 3D model of the typeface, and the number of subgroups provides an estimate of the number of metal types used in printing during that era.

Fig. 2. Overall process of the type reconstruction, starting from scanning, character segmentation, character classification, and character grouping followed using the method proposed by Jeong et al. [6], proceeding to metal type grouping proposed in this paper.

../../Resources/ieie/IEIESPC.2026.15.1.12/fig2.png

A critical challenge in this process is to determine the similarity between images of the same character, which requires a nuanced approach. Unlike traditional region-based image analysis methods, character image comparison focuses on identifying whether the characters were printed using the same metal type. Given the variability introduced by manual inking and pressing, the stroke width can vary between prints. Therefore, the similarity measure must capture structural differences in stroke shapes while ignoring irrelevant variations in stroke width.

This paper proposes a novel approach designed to address these specific challenges. Our method consists of three key components: skeletonization of character images, alignment via skeleton registration, and a similarity measure applied to the aligned skeletons. The proposed approach offers more reliable results compared to conventional techniques [6].

We explore various image preprocessing techniques, including Erosion and Skeletonization, to enhance character comparison. We also explore different metrics for evaluating similarity between images, such as the mean average distance, Dice coefficient, and Wasserstein distance. After extensive testing on numerous character pairs, we devised a comparison algorithm that employs binarized character images, skeletonization, and Iterative Closest Point (ICP) image registration, followed by similarity computation. The resulting numerical similarity score helps determine whether two characters were printed using the same metal type. Notably, the Dice coefficient proved to be the most reliable metric for our analysis. Additionally, we compared our method with the conventional approach [6], concluding that our algorithm performs efficiently in terms of both accuracy and computational time.

In the subsequent sections, we detail our methodology and present the outcomes of our experiments, along with their implications for historical research and technological advancements.

2. Related Work

2.1. Character Extraction, Segmentation, and Recognition

Character extraction, segmentation, and recognition are crucial stages in optical character recognition (OCR) [7]. These processes involve extracting characters from images and printed documents for information extraction [8]. Various methods, such as water flow, neural networks, endpoint, Euler number, and Fourier transform, are used for character extraction to prepare data for further analysis [9].

Segmentation divides characters into subunits for detailed analysis. Techniques, like Hough transform, neighborhood connected units, and vertical/horizontal projection profiles, are employed for text segmentation [10- 12]. Segmentation is vital for converting images into matrices for in-depth analysis.

Character recognition identifies and converts characters into machine-readable text. Template matching, statistical techniques like k-nearest neighbor and fuzzy set reasoning, and stroke recognition methods are used for character recognition [13- 15]. These techniques play a crucial role in OCR for data analysis. [14, 15]

2.2. Skeletonization

Skeletonization reduces binary images to one-pixel representations, aiding in effective 2-D and 3-D object display for recognition, retrieval, and analysis [16]. Pervouchine et al. applied skeletonization to extract strokes from handwritten documents, benefiting forensic analysis but being content-dependent [17]. Ko et al. faced challenges adapting skeletonization to diverse characters, such as Chinese, Korean, and Japanese fonts, proposing SkelGAN for Korean Hangul [18].

Various algorithms, like Zhang-Suen (ZS)-series and one-pass thinning (OPTA)-series algorithms, enhance character skeletonization by addressing noise issues; ZS-series deals with boundary noise, while OPTA offers faster thinning [19]. Al-Maadeed used text skeletonization with codebook generation for writer identification, achieving high accuracy and addressing ethical concerns [20]. This approach works well with Greek and Latin languages and some other open-source datasets [20].

2.3. Image Registration

Image registration aligns images from different time intervals, aiding spatial correspondence in documents [21]. Pan et al. describe two common approaches: intensity-based, which focuses on grey level intensity areas, and feature-based, which selects common features for optimal results [22].

Zhao et al. highlight the interest in using image registration with a deep learning method called recursive cascade architecture for deformed medical images and documents, offering valuable data for research [23]. Wang et al. use the iterative closest point (ICP) method for image registration based on point distances, addressing its time consumption with M-estimation and weighting approaches [24]. He et al. apply ICP to register 3D points using geometric features, avoiding local extremum issues and registration errors by considering density, surface normal, angle between data points, and curvature features [25].

2.4. Wasserstein Distance

Wasserstein Distance is a valuable tool for measuring similarities or distances between probability distributions, widely applied in document analysis, character similarity, and comparison, as well as in forensic science, research, and probability theory [26]. In practice, it finds use in image retrieval and computer vision, focusing on the cost of mass transportation between points for tasks like image synthesis and style transfer.

Ye et al. employed Wasserstein Distance to measure text sequence similarity, leveraging probability distribution functions, though it can be affected when character distances vary [27]. Fournier et al. noted challenges in applying Wasserstein Distance, especially with complex high-dimensional data, where it may not always effectively measure similarity between probability distributions [28]. Kolouri et al. introduced modified sliced Wasserstein Distance, reducing computational demands by using one-dimensional linear projections for more accurate probability measurements, particularly suitable for generative modeling in research science [29]. Piccoli et al. utilized generalized Wasserstein distance to handle the transport equation, effectively addressing the Cauchy problem for vector fields and closely spaced sources, resembling Lipschitzian behavior [30].

2.5. Mean Euclidean Distance

The mean Euclidean distance calculates the average distance between two data points in Euclidean space, primarily used for character identification in text files [31]. It is effective when data points are in the same dimension [31]. In statistical analysis and machine learning, it aids in measuring similarity and dissimilarity by considering the structure and shapes of objects in documents.

You et al. introduced a method to approximate mean Euclidean distance using the Yinyang k-means clustering approach, addressing memory consumption issues and enhancing similarity detection between input dimensions [32]. Berthold et al. proposed modifying the k-means algorithm for time series comparisons by adding squared Euclidean distance and incorporating the Pearson Correlation coefficient. This approach effectively normalized input data for more accurate comparisons [33]. Hammoud et al. found Euclidean distance useful for pre-defined multi-dimensional constellation sets, improving accuracy by eliminating noise through iterative methods in 2D and 3D estimators [34]. Lee et al. highlighted the effectiveness of Euclidean Support Vector Machine (E-SVM) for text document classification, especially with kernel functions. While E-SVM offers better classification accuracy than conventional SVM, it comes with increased processing time due to distance calculations between document data points [35].

2.6. Dice Coefficient

The Dice coefficient, or Sørensen-Dice index, assesses character similarity by calculating the ratio of shared elements in datasets, commonly used in image segmentation, forensic document analysis, and medical image processing to verify accuracy [36, 37]. Higher Dice coefficients signify greater similarity, correlating with improved image segmentation performance.

The dice coefficient is a measure of similarity between two given images. The Dice Coefficient D is defined as:

(1)
$ D = \frac{2|N_r \cap N_i|}{|N_r| + |N_i|}, $

where $N_r$ is defined as the set of pixels in the reference image and $N_i$ is defined as the set of pixels in the input image.

Liang et al. emphasize the Dice coefficient's role in statistically validating manual and automated medical report segmentation for patient diagnosis but highlight the need to address false negatives, false positives, and class imbalance in model design [38]. Oco et al. point out another significant use of the Dice coefficient in identifying language similarity through trigram profiles in documents, particularly in Philippine language documents. This approach has potential applications in phonetic transcription and text analysis, revealing an average difference of 27% in language tree comparisons [39].

2.7. Image Clustering

Korean historical books, such as The Song of Enlightenment, utilized moveable metal types for printing, resulting in multiple versions with varying metal types. Yoo employed image processing techniques, including clustering, to identify metal casting defects in these books [40].

Lai and Lee applied k-means clustering for binarizing Korean characters, with 3-means clustering outperforming 2-means clustering in character detection and binarization quality [41]. Jo et al. used vertical projection and clustering for character segmentation in Korean printed text, achieving high-speed segmentation with 99.25% accuracy and effectively managing touching character issues [42]. Song explored the unique history of Korean characters, which differed from China and Japan, highlighting the use of moveable metal types for preserving information in books from the 14th to 19th centuries. Modern image analysis and grouping techniques helped identify changes in characters due to the issues with smoothness and durability of carved text on metal types [43]. Similarly, OK utilized image feature extraction and clustering techniques to characterize Korean movable metal type ancient books, shedding light on history and differentiating Korean texts from those of China and Japan [44]. This analysis aided in understanding the distinctiveness of Korean historical texts, which were often considered similar to Chinese and Japanese [44].

3. The Proposed Method

Fig. 3 presents a brief visual representation of the character comparison part of the algorithm proposed in this paper.

Fig. 3. Architecture for the character comparison, starting from pre-processing, proceeding to ICP Registration and finally towards similarity computation.

../../Resources/ieie/IEIESPC.2026.15.1.12/fig3.png

3.1. Characters Extraction, Segmentation, and Classification

The data extraction process begins with high-resolution scans using a flatbed scanner to minimize distortion. These scans include not only printed characters but also non-textual elements like seals, lines, and noise, which must be removed from the dataset. A Trans-UNET network, customized with a specific loss function, is employed to clean the noise and rebuild any broken characters [45]. This architecture allows for accurate pixel-wise segmentation of character strokes, even in degraded or noisy input images, which is critical for reconstructing historical typefaces. The Trans-UNET network excels in capturing the intricate details of these characters, ensuring a more precise output. The dataset includes 13,500 image patches, cropped and augmented from the scanned images, with 12,150 patches used for training and 1,350 for testing. Each patch has a resolution of 1024x1024 pixels, and random rotations and scales are applied to ensure the generalization of the model.

To further refine character extraction, the Boundary Gaussian Distance (BGD) loss function is used to enhance the smoothness of character boundaries [45]. This method improves upon traditional techniques by applying Gaussian smoothing to ground-truth labels, resulting in clearer, more accurate segmentation, especially when handling ancient documents with rough, noisy strokes. The combination of BGD loss with Dice loss significantly enhances the clarity of the extracted characters, effectively addressing issues like ink bleeds, noise, and missing strokes. The model performs well in difficult cases such as blurred prints and faded characters, demonstrating its effectiveness in processing degraded historical documents.

After noise-free pages are obtained, they are then processed using the segmentation method by Beom et al. [6] that involved detection of each character and a convex hull for segmentation. The segmented characters are then fed to PaddleOCR to classify them with respect to their ASCII codes, which are saved in directories corresponding to each character for further analysis [46].

3.2. Skeletonization

Once we obtain the binarized images of any two given characters, we then pre-process the images before performing image registration. The reason behind introducing these pre-processing techniques is to cater to the cases where the width of the character is different even though it has been printed using the same type-head as another character. Different widths occur due to the amount of ink applied while printing the character, the greater the amount of ink, the thicker the character will be. Another reason behind that was that without erosion or skeletonization, the union functions that we later use for similarity measurement yield a greater value due to a larger area being part of the characters overlapping, and lesser values of the distance functions because the characters are closer to each other, even if printed using different type-heads.

For the skeletonization process, we use an iterative parallel thinning algorithm [47]. We start with a black pixel p in a 3x3 window, with the neighbors named $x_1, x_2, x_3, \dots, x_8$, and all the nine values are collectively denoted by $N_{(p)}$. The odd-numbered values of $x_i$ are all the adjacent neighbors of p. The number of black pixels in $N_{(p)}$ is denoted by $b_{(p)}$. In our conditions, we also use the crossing number $X_H(p)$ defined as the number of times we cross from a white point to a black point when iterated over $N_{(p)}$ in order.

Fig. 4. A visual representation of $N_{(p)}$.

../../Resources/ieie/IEIESPC.2026.15.1.12/fig4.png

Fig. 4 shows a visual representation of $N_{(p)}$. The algorithm we used for skeletonization consists of two sub-iterations for a single iteration. In the first sub-iteration, we delete the pixel p if the conditions 1, 2, and 3 are met, and in the second sub-iteration, we delete the pixel p if the conditions 1, 2, and 4 are met.

Condition 1:

(2)
$ X_H(p) = 1, $

where

(3)
$ X_H(p) = \sum_{i=1}^{4} b_i, $
(4)
$ b_i = \begin{cases} 1, & \text{if } x_{2i-1} = 0 \text{ and } (x_{2i} = 1 \text{ or } x_{2i+1} = 1), \\ 0, & \text{otherwise.} \end{cases} $

Condition 2:

(5)
$ 2 \le \min\{n_{1(p)}, n_{2(p)}\} \le 3, $

where

(6)
$ n_1(p) = \sum_{k=1}^{4} x_{2k-1} \vee x_{2k}, $
(7)
$ n_2(p) = \sum_{k=1}^{4} x_{2k} \vee x_{2k+1}. $

Condition 3:

(8)
$ (x_2 \vee x_3 \vee x_8) \wedge x_1 = 0. $

Condition 4:

(9)
$ (x_6 \vee x_7 \vee x_4) \wedge x_5 = 0. $

In our case, we use n as infinity, making the whole skeletonization process go on iteratively until the length of the given image gets to one pixel thick.

3.3. Image Registration Using Iterative Closest Point (ICP)

Image registration plays a critical role in ensuring accurate comparison between printed characters, as unaligned images can result in significant errors when computing similarity metrics. To align images before comparison, we use the Iterative Closest Point (ICP) algorithm, a well-established method for registering two point-clouds [48]. In our process, we convert the two given images (a fixed and a moving image) into 3D point-clouds, where each pixel of the image corresponds to a point in space. Specifically, we extract the pixel coordinates of the white pixels (representing character strokes) from the 2D images and then add a third dimension (z = 0) to treat the images as 3D point-clouds suitable for ICP processing.

The ICP algorithm consists of two primary steps: (1) finding correspondences between points in the two point-clouds (i.e., matching the closest points from the moving image to the fixed image), and (2) computing the transformation (rotation and translation) that minimizes the distance between these correspondences. This process is repeated iteratively until the transformation converges, meaning that the difference between the moving and fixed point-clouds is minimized. During each iteration, we calculate the distance between the closest corresponding points, and the transformation is adjusted to bring the moving image closer to the fixed image. The algorithm stops when the change in the transformation matrix becomes negligible, indicating that the best alignment has been achieved.

To handle cases where the two images do not overlap perfectly, we use a matching threshold $d_{max}$. This threshold is a part of generic ICP functions provided in MATLAB and helps discard correspondences between points that are too far apart, which could otherwise reduce the accuracy of the alignment. Once the ICP algorithm has converged, the final transformation matrix is applied to the moving image, aligning it with the fixed image. The registered image can then be compared directly with the fixed image using similarity metrics.

In our approach, we convert 2D images into point-clouds, apply the ICP algorithm for alignment, and then re-convert the registered point-cloud into an image format. This ensures that the characters are aligned correctly before similarity computations are performed, minimizing errors due to misalignment. The detailed steps of the ICP algorithm and the transformation process are essential for accurately comparing characters that may have subtle variations due to printing inconsistencies.

Algorithm 1: Generic ICP algorithm.

../../Resources/ieie/IEIESPC.2026.15.1.12/al1.png

3.4. Similarity Computation

A lot of research has been done in the past with respect to images and graphs for similarity computation. For our use case, we tried a few state-of-the-art methods later followed by a comparative analysis to conclude the best one for our character similarity measurement. The methods are described as follows:

3.4.1 Wasserstein Distance

Wasserstein Metric is defined as the distance measure between the probability measured between a given metric space $(M, p)$, where $p(x, y)$ a distance function for the instances $x$ and $y$, in the set $M$ [49].

(10)
$ W_p(P,Q) = \left( \inf_{\mu \in \Gamma(P,Q)} \int_{M \times M} p(x, y)^p d\mu(x, y) \right)^{1/p}, $

where $P$, $Q$ are two probability measures on M with a finite $p$-th moment and $\Gamma(P,Q)$ is the set of all measures on $M \times M$ with marginals $P$ and $Q$. Wasserstein metric arises in the problem of optimal transport: $\mu(x, y)$ can be viewed as a randomized policy for transporting a unit quantity of some material from a random location $x$ to another location $y$ while satisfying the marginal constraint $x \sim P$ and $y \sim Q$. If the cost of transporting a unit of material from $x \in P$ and $y \in Q$ is given by $p(x, y)^p$ then $W_p(P,Q)$ is the minimum expected transport cost.

To apply the Wasserstein metric, we create arrays of the sum of white pixels in each row and column respectively, and then use those histograms as an input for the equation mentioned above. This leaves us with the probabilities of the lit pixels in each row and column that we concatenate in the respective input arrays. We then use these probabilities as the input for the equation and get a numerical distance value as an output.

3.4.2 Mean Euclidean Distance

We apply a distance function to our fixed image and the registered image we obtain from the previous step. For the distance function, we are using Euclidean distance. It is defined as the length of a straight line between two points in Euclidean space. Given that we have two points, $A(x_A, y_A)$ and $B(x_B, y_B)$, the Euclidean distance D between them is defined as follows:

(11)
$ d(A,B) = \sqrt{(x_A - x_B)^2 + (y_A - y_B)^2}. $

This process is then repeated for each element in the point sets obtained from the images.

The next step consists of getting the minimum value for each column in the obtained array which is calculated using the following equation:

(12)
$ distance\_min = \min_{i=1}^{n} \{d(A_i - B_i)\}. $

In the equation above, $n$ defines the number of columns in $d(A,B)$.

The final step consists of getting the mean distance. The mean distance is obtained as follows:

(13)
$ \mu = \frac{1}{N} \sum_{i=1}^{N} distance\_min_i. $

After comparing the results obtained from this method on different images, we were able to see a pattern that the more similar the image, the lesser we obtain the mean distance.

We implement an algorithm to group similar images based on their Dice similarity score, represented in Algorithm 2. It begins by loading all images $I$ from a specified folder and sorting them according to their page numbers extracted from the file names. This step ensures that images from the same page are not incorrectly grouped together. Once the images are loaded and sorted, the algorithm calculates the Dice similarity score, $D_{ij}$, for every unique pair of images $(I_i, I_j)$ that belong to different page numbers. The Dice scores for all image pairs are stored in a matrix $D$ for later use.

A threshold value $\theta$ is defined to determine if two images are considered similar enough to be placed within the same group. The grouping process then iterates through each image in $I$, starting from the first. This initial image is assigned to the first group in $G$. The algorithm checks the Dice scores between the first image and all subsequent images, adding any images with $D_{ij} > \theta$ to the first group.

Algorithm 2: Type Grouping Algorithm

../../Resources/ieie/IEIESPC.2026.15.1.12/al2.png

This process of expanding the initial group continues until all images below the threshold are reached. New groups are then created, and the assignment is repeated for the remaining unassigned images.

Finally, the groups are organized and printed to view the final groupings. Unique group IDs $g_k$ are determined and each group is represented as a cell in the array $G$, containing the images indices assigned to that group. This allows for easy visualization of how the algorithm has clustered visually similar images together based on their computed Dice similarity scores.

4. Experimental Results

After we obtain the character images sorted with respect to their ASCII codes, we then proceed toward applying the comparison metrics. We start with the pre-processing of our images. Fig. 5 shows the results of applying the above-discussed pre-processing techniques to four characters as an example to examine the best method for our use case. As we discussed in the introduction, the thickness of the strokes can vary due to the amount of ink or pressure applied, we had to abandon the erosion method, as that method does not cater to the use cases where the character thickness is different. In the case of a thick and thin character printed using the same metal type, the eroded images will not give a higher similarity measure, as we would have greater distances and smaller unions; therefore, skeletonization turned out to be the best method for this use case.

Fig. 5. Pre-processing results on four different characters. For each character, each column shows the original, eroded and the skeletonized character respectively.

../../Resources/ieie/IEIESPC.2026.15.1.12/fig5.png

Fig. 6 shows a comparison of the metrics discussed in Subsection 3.4. In the overlaid images, the red lines represent the original moving image, the green skeleton is our fixed image and the blue one is the registered image. As observed in the results, we can conclude that the point-set distance method in this comparison has totally failed. We also recognize that Wasserstein distance has performed well, and the results represent the distance value in the different images to be twice that in the similar image. We can further observe that the dice coefficient returned the value of the same metal-type images to be 2.5 times the value of the different metal-type images, concluding Dice to be the best comparison metric for the following pair of images. Furthermore, examining the next example of the “Sip” character, we observe that the point-set distance function has performed reasonably well, as opposed to the previous example, but the difference is not significant enough to be used as a reliable comparison metric. If we examine the Wasserstein distance in the second example, we observe the distance value to be around thrice in the different images than that of the same metal-type printed character images, and almost the same multiple was observed while examining the dice values.

Fig. 6. Results obtained after the comparative analysis using the methods proposed. In the figure, MED refers to the Mean Euclidean Distance, WD refers to the Wasserstein Distance and the last column shows the values for the Dice Coefficient.

../../Resources/ieie/IEIESPC.2026.15.1.12/fig6.png

Although Wasserstein distance does demonstrate promising results, we observed the results to be inconsistent with the type, making it difficult for us to use it as a comparison metric. While comparing the values for additional sets of characters of different and same metal types, we observed the results produced by the dice coefficients to be consistent as opposed to the results produced by other methods, thus concluding our final comparison metric to be the Dice Coefficient.

In order to compare registration results of the proposed method with the conventional one, presented in Fig. 7, the skeletonized version of the character strokes are presented in the overlaid images, where the fixed, the registered, and the overlapped data are illustrated in red, blue, and magenta, respectively.

Fig. 7. The proposed similarity measure for the character comparison. (a) Shows a case where conventional registration failed to properly align the characters. (b) Demonstrates a successful registration case where the alignment was achieved.

../../Resources/ieie/IEIESPC.2026.15.1.12/fig7.png

The failure rate of the conventional image registration method reaches a high number due to the lack of features for tracking in the binarized stroke regions. The registration failure yields unreliable dice coefficients, potentially leading to the erroneous metal type grouping of different character images. Successful registration makes the dice coefficient of the same type set higher than that of the different type set. Nonetheless, the dice coefficients for the two sets exhibit proximity, signifying that the conventional method lacks discriminatory power.

Using the Algorithm 2, the groups are organized and printed to view the final groupings. Fig. 8 represents some of the results obtained using the grouping algorithm, where the characters dotted in red dots represent the ones grouped incorrectly. Using the proposed method, the grouping of the first character was done perfectly when compared using the manually overlaid pairs, whereas we found one anomaly in the second group. Analyzing the results in the conventional method, we can observe that the rate of failure in grouping the characters is significantly higher. The results were obtained for multiple character pairs, and the obtained results help us conclude that the proposed method is significantly better as the results obtained without the pre-processing steps, using the conventional method.

Fig. 8. Grouping Results using the Proposed and Conventional Method. The dotted images show anomalies in the GT data that was labelled manually.

../../Resources/ieie/IEIESPC.2026.15.1.12/fig8.png

Table 1 represents the time taken for the proposed method as compared to the conventional method. We can clearly observe that the proposed method is 45 times faster as compared to the conventional method, also yielding a much better accuracy. Furthermore, even after the pre-processing steps, we can analyze the proposed method doesn't really affect the computation time.

Table 1. Computation time comparison in milliseconds for the conventional method as opposed to the proposed method.

Method Skeletonization Registration Dice
Conventional - 477.7 0.4
Proposed 2 10.6 0.4

5. Conclusion

In this paper, we have presented a novel approach for character recognition and grouping of ancient Korean movable metal types printed during the Joseon dynasty. A three-step process of skeletonization, registration, and similarity metric computation was introduced to accurately compare digitized character images from historical books.

Extensive experimentation evaluated various pre-processing techniques and similarity metrics. The Dice coefficient emerged as the most reliable metric for discriminating typeface variations. An algorithm was developed applying the proposed approach and it outperformed conventional methods, achieving up to 45 times faster computation time while improving character grouping accuracy.

One of the limitations of our methods is that the type grouping method is not dynamic, the threshold has to be calibrated and provided manually, and one of our future works include creating a clustering algorithm that would dynamically calculate the thresholds, the number of clusters and classify within the clusters. Another shortcoming we encountered was related to the skeletonization method, as we sometimes observed branches and junctions appearing as artifacts, leading to erroneous results. We plan on trying to resolve that problem by training a Generative Network for skeletonization, that would focus on the strokes resulting in images clean from additional artifacts.

The methodology provides a reliable means of analyzing surviving printed works to gain a new understanding of the production scale and evolution of metal type printing technologies during this pivotal period. Such insights expand our knowledge of associated cultural and intellectual developments enabled through wider dissemination of knowledge via moveable type printing.

Acknowledgements

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (RS-2025-24683900). This work was supported by the BK-21 FOUR program through the National Research Foundation of Korea (NRF) under the Ministry of Education.

References

1 
Park H. O. , 2014, The history of pre-Gutenberg woodblock and movable type printing in Korea, Int. J. Humanit. Soc. Sci., Vol. 4, No. 9Google Search
2 
Narang S. R. , Jindal M. K. , Kumar M. , 2019, Devanagari ancient documents recognition using statistical feature extraction techniques, Sādhanā, Vol. 44, No. 6DOI
3 
Kim M. S. , Cho K. T. , Kwag H. K. , Kim J. H. , 2004, Digitalizing scheme of handwritten hanja historical documents, Proc. of First International Workshop on Document Image Analysis for Libraries, pp. 321-327Google Search
4 
Maitra D. S. , Bhattacharya U. , Parui S. K. , 2015, CNN based common approach to handwritten character recognition of multiple scripts, Proc. of 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1021-1025Google Search
5 
Kim M. S. , Cho K. T. , Kwag H. K. , Kim J. H. , 2004, Segmentation of handwritten characters for digitalizing Korean historical documents, Document Analysis Systems VI, pp. 114-124Google Search
6 
Jeong B.-C. , Choi K.-S. , 2022, 3-D movable type reconstruction from old printed documents using deep learning-based character extraction and recognition, Journal of the Institute of Electronics and Information Engineers, Vol. 59, No. 9, pp. 74-83DOI
7 
Majid N. , Barney Smith E. H. , 2022, Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts, International Journal on Document Analysis and Recognition, Vol. 25, No. 4, pp. 245-263DOI
8 
Hamad K. , Kaya M. , 2016, A detailed analysis of optical character recognition technology, International Journal of Applied Mathematics Electronics and Computers, No. Special Issue-1, pp. 244-249DOI
9 
Park T. , Jung J. , Cho J. , 2016, A method for automatically translating print books into electronic Braille books, Science China Information Sciences, Vol. 59DOI
10 
Razak Z. , Zulkiflee K. , Salleh R. , Yaacob M. , Tay Y. , 2008, Off-line handwriting text line segmentation: a review, International Journal of Computer Science and Network Security, Vol. 8, No. 7, pp. 12-20Google Search
11 
Tappert C. C. , Suen C. Y. , Wakahara T. , 1990, The state of the art in online handwriting recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 8, pp. 787-808DOI
12 
Likforman-Sulem L. , Zahour A. , Taconet B. , 2007, Text line segmentation of historical documents: a survey, International Journal on Document Analysis and Recognition, Vol. 9, pp. 123-138DOI
13 
Narang S. R. , Jindal M. K. , Kumar M. , 2020, Ancient text recognition: a review, Artificial Intelligence Review, Vol. 53, pp. 5517-5558DOI
14 
Das S. , Banerjee S. , 2014, Survey of pattern recognition approaches in Japanese character recognition, International Journal of Computer Science and Information Technology, Vol. 5, No. 1, pp. 93-99Google Search
15 
Jo J. , Lee J. , Lee Y. , 2009, Stroke-based online hangul/Korean character recognition, Proceedings of the Chinese Conference on Pattern Recognition, pp. 1-5Google Search
16 
Jerripothula K. R. , Cai J. , Lu J. , Yuan J. , 2021, Image co-skeletonization via co-segmentation, IEEE Transactions on Image Processing, Vol. 30, pp. 2784-2797DOI
17 
Pervouchine V. , Leedham G. , Melikhov K. , 2005, Handwritten character skeletonisation for forensic document analysis, Proc. of the ACM Symposium on Applied Computing, pp. 754-758Google Search
18 
Ko D. H. , Hassan A. U. , Majeed S. , Choi J. , 2021, SkelGAN: A font image skeletonization method, Journal of Information Processing Systems, Vol. 17, No. 1, pp. 1-13DOI
19 
Ma X. , Ren X. , Tsviatkou V. Y. , Kanapelka V. K. , 2022, A novel fully parallel skeletonization algorithm, Pattern Analysis and Applications, Vol. 25, pp. 1-20DOI
20 
Al-Maadeed S. , Hassaine A. , Bouridan A. , 2014, Using codebooks generated from text skeletonization for forensic writer identification, Proc. of the IEEE/ACS International Conference on Computer Systems and Applications, pp. 729-733Google Search
21 
Ma J. , Zhou H. , Zhao J. , Gao Y. , Jiang J. , Tian J. , 2015, Robust feature matching for remote sensing image registration via locally linear transforming, IEEE Transactions on Geoscience and Remote Sensing, Vol. 53, No. 12, pp. 6469-6481DOI
22 
Pan M. , Tang J. , Rong Q. , Zhang F. , 2011, Medical image registration using modified iterative closest points, International Journal for Numerical Methods in Biomedical Engineering, Vol. 27, No. 8, pp. 1150-1166DOI
23 
Zhao S. , Dong Y. , Chang E. I. , Xu Y. , 2019, Recursive cascaded networks for unsupervised medical image registration, Proc. of the IEEE/CVF International Conference on Computer Vision, pp. 10600-10610Google Search
24 
Wang X. , Zhao Z.-L. , Capps A. G. , Hamann B. , 2017, An iterative closest point approach for the registration of volumetric human retina image data obtained by optical coherence tomography, Multimedia Tools and Applications, Vol. 76, pp. 6843-6857DOI
25 
He Y. , Liang X. , Chen X. , Zhang Z. , Zhang J. , 2017, An iterative closest points algorithm for registration of 3D laser scanner point clouds with geometric features, Sensors, Vol. 17, No. 8DOI
26 
Panaretos V. M. , Zemel Y. , 2019, Statistical aspects of Wasserstein distances, Annual Review of Statistics and Its Application, Vol. 6, pp. 405-431DOI
27 
Ye J. , Wu P. , Wang J. Z. , Li J. , 2017, Fast discrete distribution clustering using Wasserstein barycenter with sparse support, IEEE Transactions on Signal Processing, Vol. 65, No. 9, pp. 2317-2332DOI
28 
Fournier N. , Guillin A. , 2015, On the rate of convergence in Wasserstein distance of the empirical measure, Probability Theory and Related Fields, Vol. 162, No. 3-4, pp. 707-738DOI
29 
Kolouri S. , Nadjahi K. , Simsekli U. , Badeau R. , Rohde G. , 2019, Generalized sliced Wasserstein distances, Advances in Neural Information Processing Systems, Vol. 32Google Search
30 
Piccoli B. , Rossi F. , 2014, Generalized Wasserstein distance to handle the transport equation, Archive for Rational Mechanics and Analysis, Vol. 211, pp. 335-358Google Search
31 
Bottesch T. , Bühler T. , Kächele M. , 2016, Speeding up k-means by approximating Euclidean distances via block vectors, Proc. of the International Conference on Machine Learning, pp. 2578-2586Google Search
32 
Faisal M. , Zamzami E. , 2020, Comparative analysis of inter-centroid K-means performance using Euclidean distance, Canberra distance and Manhattan distance, Journal of Physics: Conference Series, Vol. 1566, No. 1DOI
33 
You L. , Jiang H. , Hu J. , Chang C. , Chen L. , Cui X. , Zhao M. , 2022, GPU-accelerated faster mean shift with Euclidean distance metrics, Proc. of the IEEE Annual Computer Software and Applications Conference, pp. 211-216Google Search
34 
Berthold M. R. , Höppner F. , 2016, On clustering time series using Euclidean distance and Pearson correlation, arXiv preprint arXiv:1601.02213Google Search
35 
Hammoud B. , Daou G. , Wehn N. , 2022, Multidimensional minimum Euclidean distance approach using radar reflectivities for oil slick thickness estimation, Sensors, Vol. 22, No. 4DOI
36 
Jha S. , 2019, Neutrosophic image segmentation with Dice coefficients, Measurement, Vol. 134, pp. 762-772DOI
37 
Alroy J. , 2015, A new twist on a very old binary similarity coefficient, Ecology, Vol. 96, No. 2, pp. 575-586DOI
38 
Liang S. , Tang F. , Huang X. , Yang K. , Zhong T. , Hu R. , Liu S. , Yuan X. , Zhang Y. , 2019, Deep-learning-based detection and segmentation of organs at risk in nasopharyngeal carcinoma computed tomographic images for radiotherapy planning, European Radiology, Vol. 29, pp. 1961-1967DOI
39 
Oco N. , Syliongka L. R. , Roxas R. E. O. , Ilao J. P. , 2013, Dice's coefficient on trigram profiles as metric for language similarity, Proc. of the International Conference on Oriental COCOSDA and Conference on Asian Spoken Language Research and Evaluation, pp. 1-4Google Search
40 
Yoo W. S. , 2022, Direct evidence of metal type printing in The Song of Enlightenment, Korea, 1239, Heritage, Vol. 5, No. 4, pp. 1719-1734DOI
41 
Lai A.-N. , Lee G. , 2008, Binarization by local K-means clustering for Korean text extraction, Proc. of the IEEE International Symposium on Signal Processing and Information Technology, pp. 117-122Google Search
42 
Wahyono , Jo K.-H. , 2012, A clustering strategy for touching characters in Korean and English printed text segmentation, Proc. of the International Conference on Ubiquitous Robots and Ambient Intelligence, pp. 23-25Google Search
43 
Song M. K. , 2009, The history and characteristics of traditional Korean books and bookbinding, Journal of the Institute of Conservation, Vol. 32, pp. 53-78DOI
44 
Ok Y. J. , 2020, The present situation and characteristics of Korean rare books in the Shanghai Library of China, Journal of Studies in Bibliography, Vol. 84, pp. 99-120Google Search
45 
Lee W. S. , Choi K. S. , 2024, Boundary Gaussian distance loss function for enhancing character extraction from high-resolution scans of ancient metal-type printed books, Electronics, Vol. 13, No. 10DOI
46 
Du Y. , 2020, PP-OCR: A practical ultra lightweight OCR system, arXiv preprint arXiv:2009.09941Google Search
47 
Tappert C. C. , Suen C. Y. , Wakahara T. , 1990, The state of the art in online handwriting recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 8, pp. 787-808DOI
48 
Besl P. J. , McKay N. D. , 1992, A method for registration of 3-D shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 14, No. 2, pp. 239-256DOI
49 
Shen J. , Qu Y. , Zhang W. , Yu Y. , 2017, Wasserstein distance guided representation learning for domain adaptation, arXiv preprint arXiv:1707.01217Google Search
Maaz Ahmed
../../Resources/ieie/IEIESPC.2026.15.1.12/au1.png

Maaz Ahmed is a graduate of the Korea University of Technology and Education, where he earned a master's degree in future convergence engineering with a focus on computer vision. His research interests include machine learning, and computer vision, and he has contributed to several innovative projects in both academic and industrial settings.

Kang-Sun Choi
../../Resources/ieie/IEIESPC.2026.15.1.12/au2.png

Kang-Sun Choi received a Ph.D. degree in nonlinear filter design in 2003, an M.S. degree in 1999, and a B.S. degree in 1997 in electronic engineering from Korea University. In 2011, he joined the School of Electrical, Electronics & Communication Engineering at Korea University of Technology and Education, where he is currently a professor. In 2017, he was a visiting scholar at the University of California, Los Angeles. From 2008 to 2010, he was a research professor in the Department of Electronic Engineering at Korea University. From 2005 to 2008, he worked in Samsung Electronics, Korea, as a senior software engineer. From 2003 to 2005, he was a visiting scholar at the University of Southern California. His research interests are in the areas of deep learning-based semantic segmentation, multimodal sensor calibration, human-robot interaction, and culture technology. He is a recipient of an IEEE International Conference on Consumer Electronics Special Merit Award (2012).