Mobile QR Code QR CODE

  1. (Department of Electrical Engineering, Kwangwoon University / Seoul 01897, Korea {nayoung1124, khlim258, mjw426, cpfl410, nadasunmok, hjlee, kblee}@kw.ac.kr )
  2. (Department of Electrical and Computer Engineering, New York University, NY, USA sh6480@nyu.edu )



NLP, Sentence similarity, FAQ, Assist system

1. Introduction

Customer service is one of the most difficult tasks in product marketing because it is not easy to satisfy customers while limiting costs [1]. Cheong et al. [2] showed that 53% of customers were not satisfied with customer support center service. The results also revealed that customers wanted their problems solved quickly by customer service representatives.

As one way to address this problem, a number of companies have constructed live chat systems that connect customers to representatives in real time through the Internet [3]. In addition, since costs increase with an increased number of representatives, attempts to replace people with chatbots have also been initiated in order to keep costs down. It is questionable, however, whether chatbots are capable of natural conversation and of understanding exactly what the customer needs [4,5].

Another way—text classification through deep learning—can be used to preliminarily classify customer questions before passing them to human representatives. However, the accuracy of such classification systems is not good enough to help representatives, and it is not easy to enhance such classification systems to obtain the necessary training data [6-10]. Attempts have been made to transform those classification problems into similarity evaluation problems based on recently proposed natural language processing (NLP) models, such as BERT [11], BiMPM [12], and Open AI GPT [13]. Nonetheless, although the NLP models were pre-trained to include extensive domain information, they are not efficient enough to be used for customer service, and a lot of additional data are required [14-18].

Consequently, in this paper, a novel representative assistance system is proposed to overcome the difficulties with the previous approaches and to improve customer service efficiency. The proposed system includes two main functions: FAQ recommendation and automatic data acquisition. For FAQ recommendation, the system calculates a similarity measure between an input question and every question in a well-defined customer service FAQ list. Then, it recommends the top $\textit{k}$ FAQs to the representative.

In fact, consumers frequently ask questions that have already been answered in the FAQ list, or that are similar. Thus, the recommended FAQs can help the representative to answer more quickly and accurately by transforming a subjective problem into an objective problem. Following this system, the representative chooses one of the recommendations, and the choice is automatically saved as new data. Consequently, the system is updated with newly collected data from the specific service domain, and the accuracy of the system is improved incrementally.

This paper is organized as follows. Section 2 explains the proposed system. In Section 3, the experimental results are evaluated. Finally, Section 4 presents the conclusions.

2. The Proposed System

2.1 Building a Baseline NLP Model

The first main function of the proposed system is to recommend the $\textit{k}$ FAQs that are more similar to a customer’s query than others from the list. Fig. 1 shows the overall flow of the proposed system. At first, it is necessary to train a baseline NLP model to recommend the most similar $\textit{k}$ FAQs. Here, for the baseline NLP model, a Quora Question Pairs (QQP) dataset [19] was used. The dataset contains roughly 400,000 sentence pairs with corresponding labels. The original dataset is structured as shown in Fig. 2(a) and has been modified for simplicity as shown in Fig. 2(b)

Fig. 1. The overall flow of the proposed system.
../../Resources/ieie/IEIESPC.2022.11.4.248/fig1.png
Fig. 2. An example of the data format.
../../Resources/ieie/IEIESPC.2022.11.4.248/fig2.png

2.2 Operating the System with the NLP Model

When a customer asks a question, the NLP model measures the similarity between the customer’s question and every FAQ in a well-defined FAQ list. Then, the system shows the representative the closest $\textit{k}$ FAQs. After that, the representative chooses one from among them that is similar to the customer’s question. New data are constructed from these choices and are stored in a training dataset. Fig. 3 shows an example of the data construction process with k=3.

Fig. 3. An example of the data-construction process.
../../Resources/ieie/IEIESPC.2022.11.4.248/fig3.png

2.3 Fine-tuning the Model

After the process described in Subsection 2.2 has been repeated many times, and enough data have been added to the training dataset, the model can be fine-tuned with the newly obtained data. The fine-tuning process is as follows. First, as shown in Fig. 1, the weights of the layers are copied to the model’s next version except for the pooling layer, which is the last layer of the model. Instead, the pooling layer of the next version is initialized. Then, the new version of the model is trained with the data in the training dataset. When the training process is finished, the updated model is applied to the system. The processes in subsections 2.2 and 2.3 are repeated until no more improvement is achieved.

3. Performance Evaluation

The environmental settings for the experiments are as follows. As the initial FAQ list for customer service, we chose 40 FAQs from the Facebook website. Customer questions were collected from the Facebook user community. We then divided the collected questions into two sets. One set was used for training the model, and the other for testing the performance of the system in each version. The training dataset was built through a role-playing simulation by five participants randomly recruited from among a population of graduate students who did not know the authors personally. The participants used the proposed system as if they were representatives, choosing responses from the recommended $\textit{k}$ FAQs when $\textit{k}$=5 for each query. The test dataset was built based on directly matching participants. Note that the training and test datasets included questions related to FAQs 1-20 and FAQs 1-40, respectively, which means the system did not learn information from FAQs 21-40.

In the experiments, BiMPM, OpenAI GPT, and BERT were employed as the NLP models, and the results were compared. Each model was pre-trained with the QQP dataset and used as the baseline model. The service’s operation and fine-tuning scenario was set considering the real-world customer service process illustrated in Fig. 4. The scenario consisted of four steps in operating/fine-tuning the pairs and one step in testing them, with 5,000 data entries gathered for each operation and the number of FAQs in the FAQ list increased at the beginning of Step 3. In Step 5 (the testing step), since versions 1 and 2 were trained with data from FAQs 1-10, they were tested with the test dataset that included FAQs 1-10 and then retested with the dataset that had FAQs 21-40. Similarly, versions 3 and 4 were tested with the test dataset including FAQs 1-20 and then retested with the dataset using FAQs 21-40.

Fig. 4. The test scenario of the experiments.
../../Resources/ieie/IEIESPC.2022.11.4.248/fig4.png

Fig. 5 shows the test accuracies for each model and version. Top $\textit{k}$ accuracy (the y-axes) indicates the probability that the best answer exists among the top $\textit{k}$ recommendations in the system. For every NLP model, the accuracy from the proposed system increased after each step in the scenario. Table 1 shows the test accuracy in detail. The most important thing is that for the BERT and OpenAI GPT models already pre-trained with relatively heavy data in their initial states, the test accuracies increased even with the test dataset excluding experienced information. Moreover, BiMPM showed significant accuracy improvement with the test dataset including the experienced information, and this is an advantage because additional data for the changed FAQ list can be readily and automatically accumulated during services with the proposed system, as shown in the test scenario. OpenAI GPT showed the best performance, along with the proposed system, under the test configuration.

Fig. 5. The resulting top $\textit{k}$ accuracies for each model.
../../Resources/ieie/IEIESPC.2022.11.4.248/fig5.png
Table 1. The detailed results of test accuracies for each model.

FAQs 1-10

FAQs 1-20

FAQs 21-40

Top 1

Top 3

Top 5

Top 1

Top 3

Top 5

Top 1

Top 3

Top 5

Baseline

(version 0)

BiMPM

37.90

60.00

71.60

47.77

59.96

72.71

43.16

63.31

74.24

GPT

39.06

61.58

68.56

57.99

68.35

69.86

33.60

43.82

63.67

BERT

42.52

62.73

70.58

41.08

60.79

67.63

41.51

54.68

61.58

Fine-tuned

(version 1)

BiMPM

61.51

78.05

83.59

N/A

39.06

63.67

74.89

GPT

65.28

81.51

86.88

60.79

77.91

86.19

BERT

55.04

82.37

88.78

60.50

74.17

81.80

Fine-tuned

(version 2)

BiMPM

64.60

81.00

87.55

N/A

41.65

64.96

76.97

GPT

65.61

81.94

88.25

60.43

77.05

86.83

BERT

55.04

73.24

82.73

63.45

70.36

80.12

Fine-tuned

(version 3)

BiMPM

N/A

73.89

86.83

91.22

38.34

63.74

76.12

GPT

81.87

90.94

91.87

62.59

79.42

87.41

BERT

69.07

84.43

89.18

60.94

79.14

85.68

Fine-tuned

(version 4)

BiMPM

N/A

76.44

89.82

93.34

40.36

64.75

75.47

GPT

85.11

90.58

92.09

59.93

81.44

87.55

BERT

65.29

80.98

87.84

52.59

79.42

86.76

4. Conclusion

In this paper, we proposed a novel system to assist customer service representatives in answering customer questions. Since the proposed system automatically accumulates new data during service calls with a representative, it can avoid the data-shortage problem common in various service fields. In addition, as the experimental results show, the more data gathered, the greater the accuracy becomes. This means the accuracy of the proposed system improves from the automatically accumulated data as time goes by. Above all, the proposed system transforms subjective problems into objective ones so that representatives can save time in answering, and so customers are more satisfied. Furthermore, this system can be applied to languages other than English.

ACKNOWLEDGMENTS

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1F1A1062979) and excellent researcher support project of Kwangwoon University in 2021.

REFERENCES

1 
Novalia Agung W. A., 2018, The Impact of Interpersonal Communication toward Customer Satisfaction: The Case of Customer Service of Sari Asih Hospital., MATEC Web of Conferences,150, 05087DOI
2 
Cheong K.J., Kim J.J., So S.H., 2008, A study of strategic call center management:Relationship between key performance indicators and customer satisfaction., 6, Vol. 2, pp. 268-276URL
3 
Jane Lockwood. , 2017, An analysis of web-chat in an outsourced customer service account in the Philippines.DOI
4 
Bhavika R. Ranoliya , Nidhi Raghuwanshi , Sanjay Singh , 2017, Chatbot for university related FAQs., 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)DOI
5 
Chung M., Ko E., Joung H., Kim S. J., 2018, Chatbot e-service and customer satisfaction regarding luxury brands., Journal of Business ResearchDOI
6 
Tetsuji Nakagawa , Kentaro Inui , Sadao Kurohashi , 2010, Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables., In Proceedings of NIPS 2010URL
7 
Honglun Zhang , Liqiang Xiao , Yongkun Wang , Yaohui Jin , 2017, A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning., In Proceedings of the Twenty-Sixth International Joint Conference on Artificial IntelligenceDOI
8 
Baoyu Jing , Chenwei Lu , Deqing Wang , Fuzhen Zhuang , 2018, Cross-Domain Labeled LDA for Cross-Domain Text Classification., 2018 IEEE International Conference on Data Mining (ICDM)DOI
9 
Shang Gao , Arvind Ramanathan , Georgia Tourassi , 2018, Hierarchical Convolutional Attention Networks for Text Classification., In Proceedings of the Third Workshop on Representation Learning for NLP. Association for Computational Linguistics, pp. 11-23URL
10 
Jeremy Howard , Sebastian Ruder , 2018, Universal Language Model Fine-tuning for Text Classification., In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 328-339URL
11 
Jacob Devlin , Ming-Wei Chang , Kenton Lee , Kristina Toutanova , 2018, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding., arXiv preprint arXiv: 1810.04805URL
12 
Zhiguo Wang , Wael Hamza , Radu Florian , 2017, Bilateral Multi-Perspective Matching for Natural Language Sentences., arXiv:1702.03814URL
13 
Alec Radford , Karthik Narasimhan , Tim Salimans , Ilya Sutskever , 2018, Improving Language Understanding by Generative Pre-Training.URL
14 
Alexis Conneau , Douwe Kiela , Holger Schwenk , Lo¨ıc Barrault , Antoine Bordes , 2017, Supervised Learning of Universal Sentence Representations from Natural Language Inference Data., In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670-680, Copenhagen, Denmark. Association for Computational LinguisticsURL
15 
Bryan McCann , James Bradbury , Caiming Xiong , Richard Socher , 2017, Learned in Translation: Contextualized Word Vectors., In NIPS. arXiv: 1708.00107URL
16 
Antonio Valerio Miceli Barone , Barry Haddow , Ulrich Germann and Rico Sennrich , 2017, Regularization techniques for fine-tuning in neural machine translation., In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1489-1494, Copenhagen, Denmark, September 7-11, 2017. Association for Computational LinguisticsURL
17 
Kanako Komiya , Hiroyuki Shinnou , 2018, Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus., In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP. Association for Computational Linguistics, pp. 60-67URL
18 
Jinhyuk Lee , Wonjin Yoon , Sungdon Kim , Donghyeon Kim , Sunkyu Kim , Chang Ho So , Jaewoo Kang , 2019, BioBERT: a pre-trained biomedical language representation model for biomedical text mining., arXiv:1901.08746URL
19 
Chen Z., Zhang H., Zhang X., Zhao L., 2018, Quora question pairs.URL

Author

Nayoung Yun
../../Resources/ieie/IEIESPC.2022.11.4.248/au1.png

Nayoung Yun received her BS degree in Electrical Engineering from Kwangwoon University, Seoul, Korea, in 2021. She has been a MS student of the Department of Electrical Engi-neering, Kwangwoon University, Seoul, Korea. She is interested in Computer vision and transformer deep learning models.

Sangkyu Lim
../../Resources/ieie/IEIESPC.2022.11.4.248/au2.png

Sangkyu Lim Graduated Kwangwoon University, major in Electrical Engi-neering. Interested in Vision and Multimodal NLP.

Seoyoung Hong
../../Resources/ieie/IEIESPC.2022.11.4.248/au3.png

Seoyoung Hong received her BS degree in Electrical Engineering from Kwangwoon University, Seoul, Korea, in 2021. Since 2021, she has been a MS student at the Department of Electrical and Computer Engineering, New York University, NY, USA. Her research interests include Signal Processing and Deep Learning.

Jiwon Moon
../../Resources/ieie/IEIESPC.2022.11.4.248/au4.png

Jiwon Moon Graduated Kwangwoon University, major in Electrical Engi-neering. Currently a graduate student at the Nature-Inspired Intelligence Laboratory, Department of Electrical Engineering, Kwangwoon Graduate School. Interested in Vision and Multimodal NLP.

Hakjun Lee
../../Resources/ieie/IEIESPC.2022.11.4.248/au5.png

Hakjun Lee graduated from Kwangwoon University majoring in Electrical Engineering. Currently a graduate student in the Nature-Inspired Intelligence Laboratory in the Department of Electrical Engineering of the Kwangwoon Graduate School, research interests include transformer deep learning models.

Sunmok Kim
../../Resources/ieie/IEIESPC.2022.11.4.248/au6.png

Sunmok Kim received his BS degree in electrical engineering from Kwang-woon University, Seoul, Korea, in 2016. Since 2016, he has been a MS student of the Department of Electrical Engineering, Kwangwoon University, Seoul, Korea. His research interests include machine learning.

Heung-Jae Lee
../../Resources/ieie/IEIESPC.2022.11.4.248/au7.png

Heung-Jae Lee received the BS, MS and Ph. D. degrees from Seoul National University, in 1983, 1986 and 1990 respectively, all in electrical engineering. He was a visiting professor in the University of Washington from 1995 to 1996. His major research interests are the expert systems, the neural networks and the fuzzy systems application to power systems including the computer application. He is a full professor in the Kwangwoon university.

Ki-Baek Lee
../../Resources/ieie/IEIESPC.2022.11.4.248/au8.png

Ki-Baek Lee received his BS, MS, and PhD degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Rep. of Korea, in 2005, 2008 and 2014, respectively. Since 2014, he has been an assistant professor with the Department of Electrical Engineering, College of Electronics and Information Engineering, Kwangwoon University, Seoul, South Korea. He has researched computational intelligence and artificial intelligence, particularly in swarm intelligence, multi-objective evolutionary algorithms, and machine learning. His research interests also include real‐world applications such as sign‐language recognition, object picking, and customer service automation.