Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 11, No. 04, p.248-254

ISSN (online) :

2287-5255

Received : 21 July 2022

DOI :

https://doi.org/10.5573/IEIESPC.2022.11.4.248

Regular Paper

Invited Paper: This paper is invited by Cheolsoo Park, the associate editor.

Review Paper: This paper reviews the recent progress possibly including previous works in a particular research topic, and has been accepted by the editorial board through the regular reviewing process.

Extended from a Conference: Preliminary results of this paper were presented at the IEEE VTC Fall 2012. This present paper has been accepted by the editorial board through the regular reviewing process that confirms the original contribution.

Customer Service Assist System based on Natural Language Processing

YunNayoung¹ LimSangkyu¹ HongSeoyoung² MoonJiwon¹ LeeHakjun¹ KimSunmok¹ LeeHeung-Jae¹ LeeKi-Baek¹

(Department of Electrical Engineering, Kwangwoon University / Seoul 01897, Korea {nayoung1124, khlim258, mjw426, cpfl410, nadasunmok, hjlee, kblee}@kw.ac.kr )
(Department of Electrical and Computer Engineering, New York University, NY, USA sh6480@nyu.edu )

^* * Corresponding Author: Heung-Jae Lee and Ki-Baek Lee

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

This paper proposes a novel assist system for customer service representatives based on natural language processing (NLP). In the proposed system, an NLP model calculates the relationships between a question from a customer and all the questions in a given FAQ list. Based on the model’s calculation, the system will recommend several FAQs that are more similar to the customer’s question than the others in the FAQ list and then the representative responses, whether the recommended questions are actually similar to the customer’s question or not. Since these responses become the data for the NLP model’s next training, the NLP model’s accuracy can be incrementally enhanced by repetitive fine-tuning with the accumulated data. The experimental result shows that the proposed system can effectively help customer service representatives as well as incrementally improve via the automatically accumulated data.

Keywords

NLP, Sentence similarity, FAQ, Assist system

1. Introduction

Customer service is one of the most difficult tasks in product marketing because it is not easy to satisfy customers while limiting costs ^[1]. Cheong et al. ^[2] showed that 53% of customers were not satisfied with customer support center service. The results also revealed that customers wanted their problems solved quickly by customer service representatives.

As one way to address this problem, a number of companies have constructed live chat systems that connect customers to representatives in real time through the Internet ^[3]. In addition, since costs increase with an increased number of representatives, attempts to replace people with chatbots have also been initiated in order to keep costs down. It is questionable, however, whether chatbots are capable of natural conversation and of understanding exactly what the customer needs ^[4,^5].

Another way—text classification through deep learning—can be used to preliminarily classify customer questions before passing them to human representatives. However, the accuracy of such classification systems is not good enough to help representatives, and it is not easy to enhance such classification systems to obtain the necessary training data ^[6-^10]. Attempts have been made to transform those classification problems into similarity evaluation problems based on recently proposed natural language processing (NLP) models, such as BERT ^[11], BiMPM ^[12], and Open AI GPT ^[13]. Nonetheless, although the NLP models were pre-trained to include extensive domain information, they are not efficient enough to be used for customer service, and a lot of additional data are required ^[14-^18].

Consequently, in this paper, a novel representative assistance system is proposed to overcome the difficulties with the previous approaches and to improve customer service efficiency. The proposed system includes two main functions: FAQ recommendation and automatic data acquisition. For FAQ recommendation, the system calculates a similarity measure between an input question and every question in a well-defined customer service FAQ list. Then, it recommends the top $\textit{k}$ FAQs to the representative.

In fact, consumers frequently ask questions that have already been answered in the FAQ list, or that are similar. Thus, the recommended FAQs can help the representative to answer more quickly and accurately by transforming a subjective problem into an objective problem. Following this system, the representative chooses one of the recommendations, and the choice is automatically saved as new data. Consequently, the system is updated with newly collected data from the specific service domain, and the accuracy of the system is improved incrementally.

This paper is organized as follows. Section 2 explains the proposed system. In Section 3, the experimental results are evaluated. Finally, Section 4 presents the conclusions.

2. The Proposed System

2.1 Building a Baseline NLP Model

The first main function of the proposed system is to recommend the $\textit{k}$ FAQs that are more similar to a customer’s query than others from the list. Fig. 1 shows the overall flow of the proposed system. At first, it is necessary to train a baseline NLP model to recommend the most similar $\textit{k}$ FAQs. Here, for the baseline NLP model, a Quora Question Pairs (QQP) dataset ^[19] was used. The dataset contains roughly 400,000 sentence pairs with corresponding labels. The original dataset is structured as shown in Fig. 2(a) and has been modified for simplicity as shown in Fig. 2(b)

Fig. 1. The overall flow of the proposed system.

Fig. 2. An example of the data format.

2.2 Operating the System with the NLP Model

When a customer asks a question, the NLP model measures the similarity between the customer’s question and every FAQ in a well-defined FAQ list. Then, the system shows the representative the closest $\textit{k}$ FAQs. After that, the representative chooses one from among them that is similar to the customer’s question. New data are constructed from these choices and are stored in a training dataset. Fig. 3 shows an example of the data construction process with k=3.

Fig. 3. An example of the data-construction process.

2.3 Fine-tuning the Model

After the process described in Subsection 2.2 has been repeated many times, and enough data have been added to the training dataset, the model can be fine-tuned with the newly obtained data. The fine-tuning process is as follows. First, as shown in Fig. 1, the weights of the layers are copied to the model’s next version except for the pooling layer, which is the last layer of the model. Instead, the pooling layer of the next version is initialized. Then, the new version of the model is trained with the data in the training dataset. When the training process is finished, the updated model is applied to the system. The processes in subsections 2.2 and 2.3 are repeated until no more improvement is achieved.

3. Performance Evaluation

The environmental settings for the experiments are as follows. As the initial FAQ list for customer service, we chose 40 FAQs from the Facebook website. Customer questions were collected from the Facebook user community. We then divided the collected questions into two sets. One set was used for training the model, and the other for testing the performance of the system in each version. The training dataset was built through a role-playing simulation by five participants randomly recruited from among a population of graduate students who did not know the authors personally. The participants used the proposed system as if they were representatives, choosing responses from the recommended $\textit{k}$ FAQs when $\textit{k}$=5 for each query. The test dataset was built based on directly matching participants. Note that the training and test datasets included questions related to FAQs 1-20 and FAQs 1-40, respectively, which means the system did not learn information from FAQs 21-40.

In the experiments, BiMPM, OpenAI GPT, and BERT were employed as the NLP models, and the results were compared. Each model was pre-trained with the QQP dataset and used as the baseline model. The service’s operation and fine-tuning scenario was set considering the real-world customer service process illustrated in Fig. 4. The scenario consisted of four steps in operating/fine-tuning the pairs and one step in testing them, with 5,000 data entries gathered for each operation and the number of FAQs in the FAQ list increased at the beginning of Step 3. In Step 5 (the testing step), since versions 1 and 2 were trained with data from FAQs 1-10, they were tested with the test dataset that included FAQs 1-10 and then retested with the dataset that had FAQs 21-40. Similarly, versions 3 and 4 were tested with the test dataset including FAQs 1-20 and then retested with the dataset using FAQs 21-40.

Fig. 4. The test scenario of the experiments.

Fig. 5 shows the test accuracies for each model and version. Top $\textit{k}$ accuracy (the y-axes) indicates the probability that the best answer exists among the top $\textit{k}$ recommendations in the system. For every NLP model, the accuracy from the proposed system increased after each step in the scenario. Table 1 shows the test accuracy in detail. The most important thing is that for the BERT and OpenAI GPT models already pre-trained with relatively heavy data in their initial states, the test accuracies increased even with the test dataset excluding experienced information. Moreover, BiMPM showed significant accuracy improvement with the test dataset including the experienced information, and this is an advantage because additional data for the changed FAQ list can be readily and automatically accumulated during services with the proposed system, as shown in the test scenario. OpenAI GPT showed the best performance, along with the proposed system, under the test configuration.

Fig. 5. The resulting top $\textit{k}$ accuracies for each model.

Table 1. The detailed results of test accuracies for each model.

		FAQs 1-10			FAQs 1-20			FAQs 21-40
		Top 1	Top 3	Top 5	Top 1	Top 3	Top 5	Top 1	Top 3	Top 5
Baseline (version 0)	BiMPM	37.90	60.00	71.60	47.77	59.96	72.71	43.16	63.31	74.24
	GPT	39.06	61.58	68.56	57.99	68.35	69.86	33.60	43.82	63.67
	BERT	42.52	62.73	70.58	41.08	60.79	67.63	41.51	54.68	61.58
Fine-tuned (version 1)	BiMPM	61.51	78.05	83.59	N/A			39.06	63.67	74.89
	GPT	65.28	81.51	86.88				60.79	77.91	86.19
	BERT	55.04	82.37	88.78				60.50	74.17	81.80
Fine-tuned (version 2)	BiMPM	64.60	81.00	87.55	N/A			41.65	64.96	76.97
	GPT	65.61	81.94	88.25				60.43	77.05	86.83
	BERT	55.04	73.24	82.73				63.45	70.36	80.12
Fine-tuned (version 3)	BiMPM	N/A			73.89	86.83	91.22	38.34	63.74	76.12
	GPT				81.87	90.94	91.87	62.59	79.42	87.41
	BERT				69.07	84.43	89.18	60.94	79.14	85.68
Fine-tuned (version 4)	BiMPM	N/A			76.44	89.82	93.34	40.36	64.75	75.47
	GPT				85.11	90.58	92.09	59.93	81.44	87.55
	BERT				65.29	80.98	87.84	52.59	79.42	86.76

4. Conclusion

In this paper, we proposed a novel system to assist customer service representatives in answering customer questions. Since the proposed system automatically accumulates new data during service calls with a representative, it can avoid the data-shortage problem common in various service fields. In addition, as the experimental results show, the more data gathered, the greater the accuracy becomes. This means the accuracy of the proposed system improves from the automatically accumulated data as time goes by. Above all, the proposed system transforms subjective problems into objective ones so that representatives can save time in answering, and so customers are more satisfied. Furthermore, this system can be applied to languages other than English.

ACKNOWLEDGMENTS

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1F1A1062979) and excellent researcher support project of Kwangwoon University in 2021.

REFERENCES

Novalia Agung W. A., 2018, The Impact of Interpersonal Communication toward Customer Satisfaction: The Case of Customer Service of Sari Asih Hospital., MATEC Web of Conferences,150, 05087

Cheong K.J., Kim J.J., So S.H., 2008, A study of strategic call center management:Relationship between key performance indicators and customer satisfaction., 6, Vol. 2, pp. 268-276

Jane Lockwood. , 2017, An analysis of web-chat in an outsourced customer service account in the Philippines.

Bhavika R. Ranoliya , Nidhi Raghuwanshi , Sanjay Singh , 2017, Chatbot for university related FAQs., 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Chung M., Ko E., Joung H., Kim S. J., 2018, Chatbot e-service and customer satisfaction regarding luxury brands., Journal of Business Research

Tetsuji Nakagawa , Kentaro Inui , Sadao Kurohashi , 2010, Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables., In Proceedings of NIPS 2010

Honglun Zhang , Liqiang Xiao , Yongkun Wang , Yaohui Jin , 2017, A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning., In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

Baoyu Jing , Chenwei Lu , Deqing Wang , Fuzhen Zhuang , 2018, Cross-Domain Labeled LDA for Cross-Domain Text Classification., 2018 IEEE International Conference on Data Mining (ICDM)

Shang Gao , Arvind Ramanathan , Georgia Tourassi , 2018, Hierarchical Convolutional Attention Networks for Text Classification., In Proceedings of the Third Workshop on Representation Learning for NLP. Association for Computational Linguistics, pp. 11-23

Jeremy Howard , Sebastian Ruder , 2018, Universal Language Model Fine-tuning for Text Classification., In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 328-339

Jacob Devlin , Ming-Wei Chang , Kenton Lee , Kristina Toutanova , 2018, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding., arXiv preprint arXiv: 1810.04805

Zhiguo Wang , Wael Hamza , Radu Florian , 2017, Bilateral Multi-Perspective Matching for Natural Language Sentences., arXiv:1702.03814

Alec Radford , Karthik Narasimhan , Tim Salimans , Ilya Sutskever , 2018, Improving Language Understanding by Generative Pre-Training.

Alexis Conneau , Douwe Kiela , Holger Schwenk , Lo¨ıc Barrault , Antoine Bordes , 2017, Supervised Learning of Universal Sentence Representations from Natural Language Inference Data., In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670-680, Copenhagen, Denmark. Association for Computational Linguistics

Bryan McCann , James Bradbury , Caiming Xiong , Richard Socher , 2017, Learned in Translation: Contextualized Word Vectors., In NIPS. arXiv: 1708.00107

Antonio Valerio Miceli Barone , Barry Haddow , Ulrich Germann and Rico Sennrich , 2017, Regularization techniques for fine-tuning in neural machine translation., In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1489-1494, Copenhagen, Denmark, September 7-11, 2017. Association for Computational Linguistics

Kanako Komiya , Hiroyuki Shinnou , 2018, Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus., In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP. Association for Computational Linguistics, pp. 60-67

Jinhyuk Lee , Wonjin Yoon , Sungdon Kim , Donghyeon Kim , Sunkyu Kim , Chang Ho So , Jaewoo Kang , 2019, BioBERT: a pre-trained biomedical language representation model for biomedical text mining., arXiv:1901.08746

Chen Z., Zhang H., Zhang X., Zhao L., 2018, Quora question pairs.

Author

Nayoung Yun

Nayoung Yun received her BS degree in Electrical Engineering from Kwangwoon University, Seoul, Korea, in 2021. She has been a MS student of the Department of Electrical Engi-neering, Kwangwoon University, Seoul, Korea. She is interested in Computer vision and transformer deep learning models.

Sangkyu Lim

Sangkyu Lim Graduated Kwangwoon University, major in Electrical Engi-neering. Interested in Vision and Multimodal NLP.

Seoyoung Hong

Seoyoung Hong received her BS degree in Electrical Engineering from Kwangwoon University, Seoul, Korea, in 2021. Since 2021, she has been a MS student at the Department of Electrical and Computer Engineering, New York University, NY, USA. Her research interests include Signal Processing and Deep Learning.

Jiwon Moon

Jiwon Moon Graduated Kwangwoon University, major in Electrical Engi-neering. Currently a graduate student at the Nature-Inspired Intelligence Laboratory, Department of Electrical Engineering, Kwangwoon Graduate School. Interested in Vision and Multimodal NLP.

Hakjun Lee

Hakjun Lee graduated from Kwangwoon University majoring in Electrical Engineering. Currently a graduate student in the Nature-Inspired Intelligence Laboratory in the Department of Electrical Engineering of the Kwangwoon Graduate School, research interests include transformer deep learning models.

Sunmok Kim

Sunmok Kim received his BS degree in electrical engineering from Kwang-woon University, Seoul, Korea, in 2016. Since 2016, he has been a MS student of the Department of Electrical Engineering, Kwangwoon University, Seoul, Korea. His research interests include machine learning.

Heung-Jae Lee

Heung-Jae Lee received the BS, MS and Ph. D. degrees from Seoul National University, in 1983, 1986 and 1990 respectively, all in electrical engineering. He was a visiting professor in the University of Washington from 1995 to 1996. His major research interests are the expert systems, the neural networks and the fuzzy systems application to power systems including the computer application. He is a full professor in the Kwangwoon university.

Ki-Baek Lee

Ki-Baek Lee received his BS, MS, and PhD degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Rep. of Korea, in 2005, 2008 and 2014, respectively. Since 2014, he has been an assistant professor with the Department of Electrical Engineering, College of Electronics and Information Engineering, Kwangwoon University, Seoul, South Korea. He has researched computational intelligence and artificial intelligence, particularly in swarm intelligence, multi-objective evolutionary algorithms, and machine learning. His research interests also include real‐world applications such as sign‐language recognition, object picking, and customer service automation.

Article Information (continued)

Regular Paper

Invited Paper: This paper is invited by Cheolsoo Park, the associate editor.

Keywords :

Keywords

Keyword :

NLP

Keyword :

Sentence similarity

Keyword :

FAQ

Keyword :

Assist system

This display is generated from NISO JATS XML with jats-style.xsl. The XSLT engine is Saxonica.