Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 11, No. 05, p.316-323

ISSN (online) :

2287-5255

Received : 17 July 2022Revised : 12 August 2022Accepted : 24 August 2022

DOI :

https://doi.org/10.5573/IEIESPC.2022.11.5.316

Regular Paper

Information Extraction from Invoices by using a Graph Convolutional Neural Network: A Case Study of Vietnamese Stores

Tran An Cong¹ Ho Lai Thi¹ Nguyen Hai Thanh¹

(College of Information and Communication Technology, Can Tho University, Vietnam tcan@cit.ctu.edu.vn, lai01633196630@gmail.com, nthai.cit@ctu.edu.vn )

^* Corresponding Author: An Cong Tran

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Information extraction automatically obtains structured information from unstructured or semi-structured machine-readable documents. The extraction steps consist mainly of classifying words (tagging). The output can be stored as key-value pairs in a computer-friendly file format, and then stored in a database for later reference. Information extraction from receipts or invoices is a difficult task because the tagging step should not be done solely on machine-readable words. Also, we obtain layout information or positions of words relative to other words in the invoices or receipts. This study deployed optical character recognition solutions for the Vietnamese language (VietOCR) combining a graph convolutional network (GCN) to extract information from 731 Vietnamese invoices issued by several stores. First, we collected invoice images captured with smartphones from supermarkets in Vietnam. Then, with those images we proceeded with text detection and recognition, then feature processing. The dataset was classified into two parts for training and testing, and we executed classification tasks with two GCNs. Experimental results revealed that our proposed method reached 99.50%, 98.52%, 98.52%, and 98.52% for accuracy, recall, precision, and F1-score, respectively. This work is expected to prove useful for information extraction from image-based documents.

Keywords

Information extraction, Text recognition, Vietnamese invoices

1. Introduction

From activities in economies around the world, we can see that the commercial invoice is a necessary document for enterprise businesses and production activities. Although the main purpose of a commercial invoice is for payment certification, invoices have many other uses, such as maintaining records, legal protection, paying taxes, and for business analytics. Moreover, a commercial invoice is a legal document allowing the seller to obtain money from the buyer. Therefore, it reveals details of money-related transactions, such as total price in numbers and words, the price of each item, unit quantities, currency demanded, etc., and has a full seal and signature to ensure the payment for service is clear. Invoices can be engraved in hand-written, printed, or electronic.

Although most countries have deployed e-invoices, some countries (such as Vietnam) still accept both hand-written and electronic documents. Therefore, requirements for processing many hand-written and electronic invoices are essential. For a long time, we have relied on hand-written invoices to process payments, and invoice reconciliation often takes a long time to process and record everything into a ledger manually or enter data into software for future retrieval or reference. One of the limitations in these procedures can be high cost. In addition, such repetitive tasks consume a lot of time. However, this work can be done better, with less time and effort, by automating it with information technology.

Numerous countries, including Vietnam, are speeding up the process of digital transformation and information integration in various fields. For commercial activities, stores design new systems and want to integrate data produced by older systems. Many companies hire numerous people to record the data. In recent years, we have witnessed numerous achievements in many fields through advancements in information technology. Such techniques can aid the processing of text and can provide quick and efficient information extraction from invoices. In this study, we leverage a robust, deep learning technique, the graph convolutional network (GCN), to analyze Vietnamese store invoices, achieving accuracy reaching 99.5% and an F1-score of 98.52%. This can be promising for practical applications in Vietnamese stores. Another contribution is a collection of Vietnamese invoices that can be used in further studies.

The characteristics of our method can be summarized as follows.

· We collected invoices from a chain of G7 stores in Vietnam to perform and analyze the efficiency of a GCN in extracting information for a case study in Vietnam with invoices in Vietnamese.

· The proposed method extracted and recognized crucial and essential areas of an invoice, including the name of the store (in this work, called store title), the address of the store, the date of issue, total payment due, and other areas.

· We applied two GCNs for comparison: ChebConv and GCNConv. Furthermore, we evaluated the advantages and disadvantages of the GCNs and compared their efficiency in terms of running time and accuracy in the recognition tasks.

The remainder of this study is organized as follows. Section 2 outlines the related work, and Section 3 introduces and describes our proposed Vietnamese invoice information extraction method. Section 4 reports and discusses the experiments and findings. The final section concludes this research and presents various perspectives.

2. Related Work

Information extraction automatically obtains necessary information from a document. In addition, the main extraction step is to classify words (i.e., tagging). The output is usually stored in the form of key-value pairs.

The task of extracting information from invoices is based on textual information and the positions of text frames to be calculated and classified so researchers can extract needed information. In ^[1], the authors deployed Chargrid ^[2] and Wordgrid ^[3] to extract information from scanned documents. A review paper in ^[4] investigated many methods to perform information extraction tasks, and listed some of the existing challenges for further research. Finally, a survey in ^[5] was done to summarize methods for extracting information from unstructured and multidimensional data.

Recognition and analysis of invoices have been investigated in numerous studies with various optical character recognition (OCR) approaches. In ^[6], the authors implemented OCRMiner, extracting and indexing metadata in images scanned from structured documents with the ORD technique, and the authors in ^[7] extracted value-added tax (VAT) information from invoices, reaching accuracy of 96.2%. The work in ^[8] extracted key information from invoices by using simulated complex scenes, combining prior knowledge and data augmentation techniques such as adding random noise, color jitter, horizontal lines, and random rotation. In another study ^[9], the authors deployed OCR to extract invoice numbers, dates, final payment amounts, and related descriptions from bills and invoices, exporting and transferring them to a database for later use. The authors in ^[10] implemented ZXing code technology and OCR for invoice identification tasks. Finally, a method of extracting and indexing metadata of (semi-)structured documents was introduced in ^[11]. Although only a very few samples were used for the training phase, the method obtained performance that was comparable to a model trained with a large number of samples.

Several types of research on machine learning have provided interesting results from invoice analysis. The work in ^[12] investigated some possibilities of deploying unsupervised outlier detection approaches to detect potential fraud in invoice data. The authors in ^[13] implemented Light Gradient Boosting Machine and Random Forest for invoice analysis, saving the data for deductive analysts. Machine learning techniques were attempted in ^[14] to detect anomalies in invoices from Tunisia, leveraging techniques such as multivariate Gaussian distribution and Light Gradient Boosting Machine. Finally, the authors in ^[15] used AlexNet (a famous convolutional neural network) to classify three types of receipts from hand-written and machine-printed invoices. Another study in ^[16] deployed a Stacked Propagation Network combined with a Graph Attention Network to extract key information and data points from invoices and bank statements. The authors in ^[17] attempted methods to indicate the invoice information area, and deployed a projection technique to perform single-character cutting from electronic invoices.

The work in ^[18] used Support Vector Machines to evaluate risk pre-warnings from invoices, achieving an accuracy of 97% when classifying three types of risk (denoted A, B, and C by the authors). The graph convolutional network was tried in ^[19], recognizing and detecting tables in invoices to get details from the invoices. Ensemble learning algorithms in ^[20] analyzed electronic invoices from financial transactions. Machine learning techniques in ^[21] provided anomaly detection in electronic invoices. Finally, the authors in ^[22] used Random Forest to analyze and explore electronic invoices for automobile parts manufacturers.

3. The Proposed Method

As mentioned in previous sections, extracting information from invoices is based on textual information and the position of text frames in order to classify them to get necessary information, with approaches based on templates, natural language processing, and graphs. The template-based method applies predefined rules to forms and documents, and has a fixed structure that does not change much. Next, text/keyword matching methods determine corresponding fields. However, the most significant disadvantage is defining each rule separately for each form, and not adapting each rule to a new form. The method with natural language processing techniques starts by converting the image to text and includes a named object recognition (NER) model for classification into the corresponding information fields.

The advantage of this method over the template-based approach is the ability to adapt to new data. However, it does not take advantage of location features, although it does help identify the respective fields. Specifically, in invoices with information fields such as total amount, there will be a lot of text about the price, which is easily confusing. To solve the problems mentioned in the above two methods, a third graph-based approach can resolve the data classification problem. In this study, we deployed GCNs to extract information from invoices.

Our overall proposed architecture for Vietnamese invoice information extraction consists of the steps in Fig. 1. First, from invoice images captured by mobile phones or a scanner, we identify text frames, recognize text using Character Region Awareness for Text (CRAFT) ^[23], then build graphs with the two GCN architectures, and then identify features for nodes. The OCR task is crucial and important for recognizing text areas and identifying text content. Next, the graph convolutional network builds links between text fields and content on the invoice. After embedding the features in the graph, we divide the dataset into two parts: the training set and the testing set from 650 invoices with 19,159 data frames and 81 invoices with 2424 data frames, respectively. During the testing phase, the pre-processing steps are similar to the training phase, and the extraction is performed based on the previous stage's results. Details of the techniques are presented in the following sections. Finally, the model is saved and used in the training phase.

Fig. 1. The overall architecture for information extraction.

3.1 Character Region Identification for Text Detection

CRAFT ^[23] tries to define the text frames in the invoice image. The main goal is to localize individual character areas and associate detected characters with text.

First, CRAFT predicts two points for each character, including area scores, which indicate the area of \hspace{0pt}\hspace{0pt}the character. Then, it localizes the character (the area map marks where characters are present). Furthermore, a relationship score indicates how one character tends to combine with another. The relationship points merge the characters into one word. The relation map is a symbol for related characters, with red indicating characters with high relationships that must be merged into one word, as illustrated in Fig. 2. Finally, we combine the area and relationship scores to give each word a bounding box. The coordinates of recognized areas are in the following order: upper left, upper right, lower left, and lower right, where each coordinate owns an (x,y) coordinate pair of the area, as revealed in Fig. 3.

Fig. 2. The heatmap for (a) identifying text frames; (b) revealing their relationships.

Fig. 3. (a) The identified text frames; (b) their coordinates

3.2 Recognizing Text; Labeling Sections

To recognize Vietnamese words, we leverage the VietOCR method$^{https://github.com/pbcquoc/vietocr, accessed March 5, 2022}$, which combines Convolutional Neural Networks and Transformer models to perform recognition tasks. The VietOCR model has good generalizability and achieves high accuracy on a new dataset. Therefore, we decided to apply the VietOCR model to the text recognition problem. VietOCR was deployed using the vgg_transformer model during the training process. First, the text in an invoice can be recognized as seen in Fig. 4(a), and then, we choose and assign five main areas corresponding to the name of the store (store title), the address of store (address), the date issued (date), total payment (total) and anything not in the four named fields (NaN), as seen in Fig. 4(b). Finally, we check for missing labels and add them manually.

Fig. 4. (a) An example of words recognized by VietOCR; (b) marking labels for them.

3.3 Graph Construction

There are many techniques to create graphs from documents. Most of them convert each text area into a node and use different techniques to construct edges. This way will create four edges for each node such that the edge connects the nearest text areas in four directions (top, bottom, left, right).

From a node, if we can draw a vertical or horizontal line to another node, the two are connected. At each node in each direction, we choose the edge with the shortest length. For destination nodes with many connected edges, we choose the edge with the shortest length. Finally, we create a graph by creating a unique edge at each source node to the destination node (if any), giving preference to horizontal edges. The steps for processing edges when building the graph are illustrated in Figs. 5 and 6.

The Graph Convolutional Network is a type of Convolutional Neural Network (CNN) that can operate directly on graphs and that takes advantage of their structural information. The GCN helps to solve the problem of classifying nodes in a graph. The general idea of a GCN is that it can receive characteristic information from all its neighbors and the location for each node. Suppose, using the average function, we do the same for all nodes. Then, we feed these averages into a neural network. It is possible to use more complex aggregate functions than the average function in practice, and to put layers on top of each other for a deeper GCN. Layer output will be considered input to the next layer. The layer number is the farthest distance a node can cross. Each node can only get information from neighbors to a GCN layer. The crawling process takes place independently and simultaneously for all nodes. The crawling process can repeat when we stack another layer on top of the first. Moreover, the neighbors already have information about their neighbors (from the previous step).

This study deployed a GCN for invoice analysis to explore local patterns by identifying locations and text features. Like the CNN, the GCN can indicate local patterns, but instead of pixel points, it connects nodes with a higher relationship with a node farther away in the graph. In addition, the GCN can identify location features, which are information about the position/coordinates of the node in the image that can also help the model easily distinguish information fields. For example, information about the name of the supermarket/grocery store is often shown at the top of the bill. Similar to location features, the GCN’s analysis of textual information is also essential. For example, we distinguish the address field from other data fields. In addition, we can stack multiple GCN modules on top of each other, which helps the model learn high-level features better.

We deployed a GCNConv-based architecture including four traditional graph convolutional layers (as illustrated in Fig. 7). In contrast, for the ChebConv-based architecture, the proposed architecture includes four Chebyshev Spectral Graph Convolution layers (with a Chebyshev filter size of 3 for all Chebyshev Spectral Graph Convolution layers). The size of each output after each graph convolutional layer for both architectures is 64, 32, and 16, respectively, while the output layer includes five outputs, corresponding to the five labels to be classified. After the network has been initialized, it computes new features by passing in the nodes, edges, and weights. Then, the process is repeated in the subsequent layers with features from output of the class before it. Finally, the ReLU function activates the output of hidden layers, and in the output, the layer uses the LogSoftMax function to calculate the log probability for data classification. As a result, the network receives input with 776 features and provides five outputs, corresponding to the five areas considered from the invoice.

Fig. 5. (a) The original graph in which all edges are shown; (b) the graph after unnecessary edges are removed; (c) the graph after the edges with the same targeted node are removed.

Fig. 6. The graphs of invoices built based on text frames.

Fig. 7. The proposed GCN architecture.

3.4 Feature Embedding

We build initial properties for graph nodes by aggregating features from many attributes, including Boolean features, position features, and text features. The Boolean features check the attributes. For example, we check whether the text is numeric or has special characters, etc., while the position feature is the relative distance from the current text frame to the next two frames (horizontally and vertically). The text features deploy the work in$^{https://metatext.io/models/sentence-transformers-distilbert-base-nli-stsb-mean-tokens}$, and a Siamese network ^[24] (a natural language processing model) was implemented by the Transformer library to compute embedded vectors for sentences in order to obtain a 768-D feature vector. Finally, the network joins all the attributes and gets a 776-D feature vector (6 + 2 + 768) as the initial feature for each node in the graph.

We deployed and compared two algorithms in training the data: GCNConv ^[25] and ChebConv ^[26]. ChebConv is a model that generalizes CNNs to graphs. The main goal is to define filters that help operate on graphs efficiently. A GCNConv model is an approach based on an efficient variant of a CNN. The first-order approximation is changed to fit the graph with the convolution-based architecture.

4. EXPERIMENTS

4.1 Dataset

We experimented with a dataset collected from G7 stores$^{https://shopg7.com/}$ (a mini supermarket in Vietnam) with invoices captured by mobile phones and saved as images with a resolution of 1920${\times}$2560. In an experiment using 731 invoices with 21,583 data frames, five classes were organized: store title, address, date, total, and other. We extracted 19,159 data frames from 650 invoices for training, while 81 other invoices with 2424 data frames were used as the test set. The training dataset through the pre-processing steps before training was stored as a dataset file (train_data.dataset) with the attributes described in Tables 1 and 2.

We evaluated the accuracy from classification of text frames in the invoices and evaluated the effectiveness of the two training models in order to choose the optimal model for extracting information from invoices.

Table 1. Attributes of the dataset.

Attributes	Description
batch	Identifier of each node on the graph
edge_index	Index of edges
img_id	Image filename
ptr	Pointers to the graph of the next invoice
text	Text corresponding to each node
x	Feature vector of the nodes in the graph
y	The label of each node

Table 2. Sample distributions according to label.

Label	Number of samples
Store Title	650
Address	650
Date	650
Total	650
Other	16559

4.2 Environment Settings

The experiments were run on a computer with an Intel Core i7-4600U CPU at 2.1GHz and 8GB of RAM under the Windows 10 Pro 64-bit operating system and repeated 10 times. The results were assessed on the test set with many metrics, including accuracy, and a confusion matrix averaged the 10 repetitions from the training and test phases. Both networks used a learning rate of 0.001 and the Adam optimizer ^[27] running for 2000 epochs.

4.3 The Results on ChebConv

We obtained an average accuracy of 99.81% on the training set and 99.59% on the test set with ChebConv. In Table 3, we see that Store Title, Address, and Date revealed promising classification results, and no labels were mistakenly classified, but Total and Other had some errors. From the results, the ChebConv model applied to the problem of extracting invoice information is highly suitable.

Table 3. The average confusion matrix for ChebConv in the training and test phases.

	Store Title	Address	Date	Total	Other
Average performance on the training set
Store Title	650.0	0.0	0.0	0.0	0.0
Address	0.0	650.0	0.0	0.0	0.0
Date	0.0	0.0	650.0	0.0	0.0
Total	0.0	0.0	0.0	632.0	18.0
Other				18.0	16541.0
Average performance on the test set
Store Title	81.0	0.0	0.0	0.0	0.0
Address	0.0	81.0	0.0	0.0	0.0
Date	0.0	0.0	81.0	0.0	0.0
Total	0.0	0.0	0.0	76.0	5.0
Other	0.0	0.0	0.0	5.0	2095.0

4.4 The Results from GCNConv

Table 4 shows that all labels had been mistaken when classifying them. In particular, more than 30 cases with the total label (nearly 40%) were misclassified. However, overall, the classification results were still relatively good. As a final result, we had an average accuracy of 96.58% on the training set and 95.26% on the test set. As can be seen from the graph, the accuracy results on both datasets were pretty good, but there was an unstable increase or decrease compared to the test set results.

Table 4. The average confusion matrix for GCNConv in the training and test phases.

	Store Title	Address	Date	Total	Other
Average performance on the training set
Store Title	602.0	0.0	0.0	0.0	48.0
Address	0.0	624.7	0.0	0.0	25.3
Date	0.0	0.0	576.3	0.0	73.7
Total	0.0	0.0	0.0	469.8	180.2
Other	48.0	25.3	73.7	180.2	16231.8
Average performance on the test set
Store Title	74.3	0.0	0.0	0.0	6.7
Address	0.0	76.5	0.0	0.0	4.5
Date	0.0	0.0	66.4	0.0	14.6
Total	0.0	0.0	0.0	49.3	31.7
Other	6.7	4.5	14.6	31.7	2042.5

4.5 Comparison of ChebConv and GCNConv

As illustrated in Tables 3 and 4, ChebConv had better accuracy than GCNConv in all classes. However, the inference time of the ChebConv model was much slower than the GCNConv model, with average execution times of more than 2 hours 30 minutes, and 16 minutes, respectively (as detailed in Table 5). Each architecture has its advantages, depending on the problem to be solved. Although GCNConv produced lower performance, it can provide the results at a speed more than seven times faster than ChebConv. The disadvantage with ChebConv is that it is not time efficient and is hard to use in real time.

As shown in Fig. 8, the performance of ChebConv during the training phase was relatively higher than GCNConv, and with less overfitting. In the beginning, the performances of ChebConv and GCNConv followed a fairly similar pattern over the first 750 epochs, all remaining at more than 95% accuracy. After that, however, GCNConv faced overfitting issues where there was a significant difference between the performance in training and testing. In addition, GCNConv seemed to converge more slowly than ChebConv during the 2000 epochs. Finally, we can see that both models had relatively high results in general, so applying a GCN to the problem of extracting invoice information is entirely appropriate.

Based on the results, it is better to build a website to automatically extract invoice information from G7 stores with a GCN, because the ChebConv model had higher accuracy than GCNConv, so ChebConv was prioritized for application on the G7 store website. The website receives the input image of the G7 store's invoice. After being processed, it will return an image with classified information frames, and the content of the information to be extracted, as illustrated in Fig. 9.

Fig. 8. The performance from (a) GCNConv; (b) ChebConv in the training and test phases.

Fig. 9. An illustration of the application with a Graph Convolutional Neural Network.

Table 5. Average and standard deviation in a blanket comparison of accuracy and execution times between ChebConv and GCNConv.

	ChebConv	GCNConv
Accuracy in training	99.81 ($\pm 0.07)$%	96.58 ($\pm 0.72)$%
Accuracy in testing	99.59 ($\pm 0.01)$%	95.26($\pm 0.67)$%
Execution time	157 ($\pm 23)$ mins	16.1 ($\pm 2)$ mins

5. Conclusion

The results reveal that the information extraction method with a Graph Convolutional Neural Network on Vietnamese invoices is promising, with an accuracy of 99.5%. A prerequisite step is building an automatic invoice information extraction system with high accuracy for multiple invoices. Five areas in the invoices (store title, address, date, total payment, and other information) were analyzed. These areas cover important information that can be cross-checked among stores and consumers for future reference or other requirements. The model was only trained for specialization operation on G7 store invoices; however, it is expected that the method can be applied to other templates rather than to various invoices. In addition, we added automatic invoice direction detection for precise adjustments and scaling to large datasets with invoice diversity.

Data can be collected and extended for further research, most likely to build in spelling correction from text recognition output.

REFERENCES

Kerroumi M., Sayem O., Shabou A., 2021, VisualWordGrid: Information Extraction from Scanned Documents Using a Multimodal Approach, in Document Analysis and Recognition - ICDAR 2021 Workshops, Springer International Publishing, pp. 389-402

Katti A. R., et al. , 2018, Chargrid: Towards Understanding 2D Documents., arXiv

Denk T. I., 2019, Wordgrid: Extending Chargrid with Word-level Information

Joan S. P. F., Valli S., Jan. 2018, A Survey on Text Information Extraction from Born-Digital and Scene Text Images, Proc. Natl. Acad. Sci. India Sect. Phys. Sci., Vol. 89, No. 1, pp. 77-101

Adnan K., Akbar R., Oct. 2019, An analytical study of information extraction from unstructured and multidimensional big data, J. Big Data, Vol. 6, No. 1

Ha H. T., Medved’ M., Nevěřilová Z., Horák A., 2018, Recognition of OCR Invoice Metadata Block Types, in Text, Speech, and Dialogu, Springer International Publishing, pp. 304-312

Zhang J., Ren F., Ni H., Zhang Z., Wang K., Dec. 2019, Research on Information Recognition of VAT Invoice Based on Computer Vision

Zhi X., Shen Z., Zhao B., Jul. 2021, A Method for Identifying the Key Information of Electronic Invoicing in Complex Scenes

Kumar P., Revathy S., 2021, An Automated Invoice Handling Method Using OCR, in Data Intelligence and Cognitive Informatics, Springer Singapore, pp. 243-254

Wang Y., 2022, Intelligent Invoice Identification Technology Based on Zxing Technology, in Lecture Notes in Electrical Engineering, Springer Nature Singapore, pp. 87-93

Ha H. T., Horák A., Mar. 2022, Information extraction from scanned invoice images using text analysis and layout features, Signal Process. Image Commun., Vol. 102, pp. 116601

Hamelers L. H., Jan. 2021, Detecting and explaining potential financial fraud cases in invoice data with Machine Learning.

Tutica L., Vineel K. S. K., Mishra S., Mishra M. K., Suman S., 2021, Invoice Deduction Classification Using LGBM Prediction Model, in Lecture Notes in Electrical Engineering, Springer Singapore, pp. 127-137

Bâra S.-V. Oprea and A., Sep. 2021, Machine learning classification algorithms and anomaly detection in conventional meters and Tunisian electricity consumption large datasets, Comput. Electr. Eng., Vol. 94, pp. 107329

Tarawneh A. S., Hassanat A. B., Chetverikov D., Lendak I., Verma C., Apr. 2019, Invoice Classification Using Deep Features and Machine Learning Techniques

Zhang C., Li B., Edirisinghe E., Smith C., Lowe R., 2022, Extract Data Points from Invoices with Multi-layer Graph Attention Network and Named Entity Recognition, in 2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), pp. 1-6

Li M., 2022, Smart Accounting Platform Based on Visual Invoice Recognition Algorithm, in 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1436-1439

Ding N., Zhang X., Zhai Y., Li C., Mar. 2021, Risk assessment of VAT invoice crime levels of companies based on DFPSVM: a case study in China, Risk Manage., Vol. 23, No. 1-2, pp. 75-96

Riba P., Dutta A., Goldmann L., Fornes A., Ramos O., Llados J., Sep. 2019, Table Detection in Invoice Documents by Graph Neural Networks

Bardelli C., Rondinelli A., Vecchio R., Figini S., Nov. 2020, Automatic Electronic Invoice Classification Using Machine Learning Models, Mach. Learn. Knowl. Extr., Vol. 2, No. 4, pp. 617-629

Tang et al. P., Oct. 2020, Anomaly detection in electronic invoice systems based on machine learning, Inf. Sci., Vol. 535, pp. 172-186

Hong J., Yeo H., Cho N.-W., Ahn T., Oct. 2018, Identification of Core Suppliers Based on E-Invoice Data Using Supervised Machine Learning, J. Risk Financ. Manag., Vol. 11, No. 4, pp. 70

Baek Y., Lee B., Han D., Yun S., Lee H., Jun. 2019, Character Region Awareness for Text Detection

Koch G., Zemel R., Salakhutdinov R., 2015, Siamese Neural Networks for One-shot Image Recognition, in Proceedings of the 32 nd International Conference on Machine Learning, pp. 8

Kipf T. N., Welling M., 2017, Semi-Supervised Classification with Graph Convolutional Networks.

Kumthekar Y. V., 2020, Using ChebConv and B-Spline GNN models for Solving Unit Commitment and Economic Dispatch in a day ahead Energy Trading Market based on ERCOT Nodal Model

Ba D. P. Kingma and J., 2017, Adam: A Method for Stochastic Optimization.

Author

An Cong Tran

An Cong Tran (tcan@cit.ctu.edu.vn) is a senior lecturer of the College of Information and Communication Technology, Can Tho University, Vietnam. He earned his Bachelor Degree in Computer Science at CTU in 2001 and become a lecturer at CTU from there. In 2007, he got the Master Degree (Hons) in Computer Science from Asian Institute of Technology (AIT), Thailand. In 2013, he received the Doctoral Degree in Computer Science from Massey University, New Zealand. His PhD thesis focuses on a symmetric parallel approach for class expression learning. His current research interests include description logic learning, ontology learning, applications of blockchain in public sector and applications of deep learning methods.

Lai Thi Ho

Lai Thi Ho is a fresh graduate student from the College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam. Her research interests include machine learning, deep learning, computer vision and web programming.

Hai Thanh Nguyen

Hai Thanh Nguyen is a lecturer of CICT, Can Tho University, Vietnam. He received his B.S degree in Informatics from Can Tho University, the master degree in Computer Science and Engineering of National Chiao Tung University, Taiwan, and obtained the PhD degree in Computer Science from Sorbonne University, France. His current research includes bioinformatics, health care system, recommen-dation systems, and machine learning-based applications. Contact him at nthai.cit@ctu.edu.vn.

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

Information Extraction from Invoices by using a Graph Convolutional Neural Network: A Case Study of Vietnamese Stores

Abstract

Keywords

1. Introduction

2. Related Work

3. The Proposed Method

Fig. 1. The overall architecture for information extraction.

3.1 Character Region Identification for Text Detection

Fig. 2. The heatmap for (a) identifying text frames; (b) revealing their relationships.

Fig. 3. (a) The identified text frames; (b) their coordinates

3.2 Recognizing Text; Labeling Sections

Fig. 4. (a) An example of words recognized by VietOCR; (b) marking labels for them.

3.3 Graph Construction

Fig. 5. (a) The original graph in which all edges are shown; (b) the graph after unnecessary edges are removed; (c) the graph after the edges with the same targeted node are removed.

Fig. 6. The graphs of invoices built based on text frames.

Fig. 7. The proposed GCN architecture.

3.4 Feature Embedding

4. EXPERIMENTS

4.1 Dataset

Table 1. Attributes of the dataset.

Table 2. Sample distributions according to label.

4.2 Environment Settings

4.3 The Results on ChebConv

Table 3. The average confusion matrix for ChebConv in the training and test phases.

4.4 The Results from GCNConv

Table 4. The average confusion matrix for GCNConv in the training and test phases.

4.5 Comparison of ChebConv and GCNConv

Fig. 8. The performance from (a) GCNConv; (b) ChebConv in the training and test phases.

Fig. 9. An illustration of the application with a Graph Convolutional Neural Network.

Table 5. Average and standard deviation in a blanket comparison of accuracy and execution times between ChebConv and GCNConv.

5. Conclusion

REFERENCES

Author

An Cong Tran

Lai Thi Ho

Hai Thanh Nguyen

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing