Mobile QR Code QR CODE

  1. (School of Humanities and Law, Yanching Institute of Technology, Langfang, Hebei 065201, China)



Cross-cultural communication, Out-of-vocabulary, Automatic translation, Neural network

1. Introduction

With the development of society, more and more languages have been involved in communication [1]. In the context of cross-cultural communication, translation has become increasingly important to facilitate the exchange of information [2]. With the development of technology, machine translation has gradually matured. Machine translation can automatically translate languages through computer technology, which not only plays a positive role in people’s lives in terms of tourism, finance, etc. [3], but also provides more convenience for cross-cultural communication [4].

Compared with human translation, machine translation is faster and less expensive, so it has made great contributions to the world’s development and communication. Given the important role of machine translation, finding ways to achieve more efficient and accurate automatic translation has attracted the attention of researchers [5]. Nagaraj et al. [6] translated the Kannada text into English through neural machine translation (NMT). Compared to statistical machine translation (SMT), NMT achieved a better Bilingual Evaluation Understudy (BLEU) score and had an accuracy of 86.32%.

Under the premise of weakening grammar rules, Li et al. [7] proposed a machine translation method based on artificial intelligence and analyzed English grammar rules. They found that this method had great potential. Based on Hindi-English translation, Tiwari et al. [8] compared two NMT models, which were realized by Long Short Term Memory (LSTM) and Conventional Neural Network (CNN) methods combined with an attention mechanism. Their study provided the best model and parameters for the task. Saengthongpattana et al. [9] studied a Thai-English translation and compared Transformer, Recurrent Neural Network (RNN), and SMT models. They found that the Transformer model had the highest BLEU score, the SMT model had the most word-order errors, and the RNN model had the most errors and omissions in word selection.

This paper focuses on the automatic translation of out-of-vocabulary (OOV) words in the context of cross-cultural communication. We propose a new solution to the shortcomings of current Chinese-English automatic translation methods in OOV processing and show the reliability of the method for improving translation quality by comparing it with another Chinese-English translation model. This paper also presents a new idea for the processing of OOV words in the field of automatic translation, which could be studied for the automatic translation of more languages to further improve the usability of automatic translation.

2. Translation Algorithm for Out-of-vocabulary Words

2.1 Seq2seq Model based on the Attention Mechanism

NMT can directly translate a source language into a target language through an RNN [10]. Compared with SMT, NMT has higher efficiency and quality [11], so it has been widely applied in automatic translation [12]. NMT is composed of an encoder and decoder, and its process is represented by the following:

(1)
$c=Encode\left(w^{\left(s\right)}\right)$,
(2)
$w^{\left(t\right)}|w^{\left(s\right)}\sim Decode\left(c\right)$,

where $w^{\left(s\right)}$ and $w^{\left(t\right)}$ are the aligned sentence pair of the source language and translation, and $c$ is a context vector generated by the encoder.

The model is trained by maximizing the conditional probability likelihood function of parallel corpora using the formula $\log p\left(w^{\left(t\right)}|w^{\left(s\right)}\right)=\sum _{m=1}^{M}p\left(w_{m}^{\left(t\right)}|w_{1\colon m-1}^{\left(t\right)},c\right)$, where $M$ is the sentence length of the output translation, and $w_{m}^{\left(t\right)}$ is the $m$-th output target word. For the encoder-decoder structure, the seq2seq model is the simplest one [13] and is usually used to solve problems such as machine translation and speech recognition [14].

The Google Neural Machine Translation (GNMT) model is based on the seq2seq model [15] and uses two RNNs as the encoder and decoder. A Chinese and English sentence pair is represented as $\left(X,Y\right)\,,$ where $\mathrm{X}=\left(\mathrm{x}_{1},\mathrm{x}_{2},\cdots ,\mathrm{x}_{\mathrm{M}}\right)$ ($M$ is the length of the source language word sequence) and $Y$ is $\left(y_{1},y_{2},\cdots ,y_{N}\right)~ $($\mathrm{N}$ is the length of the translation word sequence). The encoder RNN in the seq2seq model is $C=\left(X_{1},X_{2},\cdots ,X_{M}\right)=EncoderRNN\left(x_{1},x_{2},\cdots ,x_{M}\right)$. The conditional probability of the sentence pair is written as $P\left(Y|X\right)=P\left(Y|C\right)=\prod _{i=1}^{N}p\left(y_{1}|y_{0},y_{1},\cdots ,y_{i-1};C\right)$, where $y_{0}$ indicates the start of translation, ``<EOS>''.

An attention mechanism [16] is adopted to improve the performance of the seq2seq model in automatic translation. After improvement by the attention mechanism, the fixed context vector $c$ is no longer applicable. The conditional probability of output at time $i$ is written as $p\left(y_{i}|y_{0},y_{1},\cdots ,y_{i-1};C\right)=g\left(y_{i-1},s_{i},c_{i}\right)$, where$s_{i}$is the hidden layer status of the decoder, $s_{i}=f\left(y_{i-1},s_{i-1},c_{i}\right)$, and $c_{i}$ is the context vector of the encoder at time $i$. Furthermore, a score function is defined: $e_{ij}=a\left(s_{i-1},h_{j}\right)$, where $h_{j}$ is the output status of the encoder, and $a$ is an arbitrary function in the real number field, $a_{ij}=exp\left(e_{ij}\right)/\sum _{k=1}^{M}exp\left(e_{ik}\right)$. $a_{ij}$ at different time points forms a vector $a_{i}=\left(a_{i1},a_{i2},\cdots ,a_{iM}\right)$, and $a_{i}$ is the attention vector. The context vector at time $i$ is $c_{i}=\sum _{j=1}^{M}a_{ij}h_{j}$.

2.2 Transformer Model

The Transformer model does not use an RNN and instead uses a self-attention mechanism to realize fast calculation [17], which significantly improves the translation quality. In essence, it is also a seq2seq model that can be divided into a coding layer and a decoding layer [18]. The model represents every word with three vectors, Query(Q), Key(K), and Value(V). A word vector $e_{i}$, $e_{i}\in R^{l\times p}$, is multiplied by three weight matrices with dimensions of $p\times d$, which are denoted as $W^{Q}$, $W^{K}$, and $W^{V}$.

Next, a multi-head attention mechanism is used to perform matrix splicing on $k$ and $v$ of every word to obtain $n\times d$ matrices $K$ and $V$ ($n$ is the number of words). After dividing $q$, $K,$ and $V$, $\left\{q_{i}\right\}_{i=1}^{m}$, $\left\{K_{i}\right\}_{i=1}^{m}$, and $\left\{V_{i}\right\}_{i=1}^{m}$ are obtained. For any $i\in m$ ($m$ is the number of heads), the self-attention is calculated as:

(3)
$C_{i}=attention\left(q_{i},K_{i},V_{i}\right)=softmax\left(\frac{q_{i}K_{i}^{T}}{\sqrt{d}}\right)V$.

Then, the multi-head attention is calculated:

(4)
$MultiHead\left(q,K,V\right)=Concat\left(\left\{C_{i}\right\}_{i=1}^{m}\right)W^{O}$.

In the Transformer model, a position code is used to explain the word order. The dimension is the same as the word vector, $d_{model}=512$. The formulas are:

(5)
$PE\left(pos,2i\right)=\sin \left(\frac{pos}{10000^{\frac{2i}{d_{model}}}}\right)$,
(6)
$PE\left(pos,2i+1\right)=\cos \left(\frac{pos}{10000^{\frac{2i}{d_{model}}}}\right)$.

2.3 Out-of-vocabulary Processing

In NMT, words with lower frequency in the corpus that cannot be added to the dictionary are called OOV words [19] and are usually expressed by <UNK>. The semantics of the original word is lost in automatic translation, which decreases the quality of translation. Therefore, solving the problem of OOV words is an important task of NMT. We propose a semantics-based approach to replace OOV words in the corpus.

First, the skip-gram model [20] from the Word2vec tool is used to learn word vectors, and the structure is shown in Fig. 1. The principle is to predict the surrounding words based on the current word. Word vectors were learned for Chinese and English corpora, the window size was set as 5, and the word vector dimension was 300.

Based on the learned word vectors, the semantic similarity is calculated. Based on the cosine similarity, the similarity between word vector $w$ and common word vector $w'$ is:

(7)
$sim\left(w,w'\right)=\cos \left(vec\left(w\right),vec\left(w'\right)\right)$,
(8)
$w^{*}=\underset{w'\in IV}{\text{argmax}}sim\left(w,w'\right)$,

where $IV$ is a list of common words. After obtaining candidates for similar words, an n-gram model is used to find the most appropriate replacement word to improve the fluency of sentences:

(9)
$ \begin{array}{l} score_{blm}=\gamma \left(p\left(w'|w_{i-1},w_{i-2}\right)+p\left(w'|w_{i+1},w_{i+2}\right)\right)+\\ \left(1-\gamma \right)\left(p\left(w'|w_{i-1}\right)+p\left(w'|w_{i+1}\right)\right), \end{array} $

where $score_{blm}$ refers to the score of candidate words.

The word with the highest score is found to replace the OOV words. In addition, words without semantic vectors (low-frequency OOV words) in the corpus are either ① retained or ② deleted. The automatic translation between Chinese and English after OOV processing is shown in Fig. 2. Based on word vector training and similarity calculation, the most similar word to the OOV words is found to replace the OOV words, and then the NMT model is trained using the replaced corpus and translates the replaced source language.

Fig. 1. Skip-gram model.
../../Resources/ieie/IEIESPC.2023.12.6.466/fig1.png
Fig. 2. Automatic Chinese–English translation algorithm for OOV words.
../../Resources/ieie/IEIESPC.2023.12.6.466/fig2.png

3. Results and Analysis

Tests were conducted using the Windows 10 operating system with 8 GB of memory and an NVIDIA GeForce GTX 1070 Ti. The model was built and computed using TensorFlow. The parameters of the seq2seq model and the Transformer model are shown in Table 1.

The LDC dataset was used in the experiments. The model was trained on LDC2004T07, LDC2004T08, LDC2005T06, and LDC2005T10. NIST05 was used as the development set, and NIST06 and NIST08 were used as the test sets. These datasets are described in the following:

LDC2004T07: Multiple-Translation Chinese (MTC) Part 3

LDC2004T08: Hong Kong Parallel Text

LDC2005T06: Chinese News Translation Text Part 1

LDC2005T10: Chinese English News Magazine Parallel Text

NIST05: NIST 2005 Open Machine Translation (OpenMT) Evaluation

NIST06: NIST 2006 Open Machine Translation (OpenMT) Evaluation

NIST08: NIST 2008 Open Machine Translation (OpenMT) Evaluation

The BLEU score was used as an evaluation index of the algorithm [21]. The higher the score of BLEU is, the closer the translation is to the result of manual translation. Based on n-grams, the calculation formula of the BLEU score is:

(10)
$p_{n}=\frac{\sum _{c\in \left\{candidate\right\}}\sum _{n-gram\in c}count_{clip}\left(n-gram\right)}{\sum _{c'\in \left\{candidate\right\}}\sum _{n-gram'\in c'}count_{clip}\left(n-gram'\right)}$,
(11)
$BLEU=BP\cdot \exp \left(\sum _{n=1}^{N}w_{n}\log p_{n}\right)$,
(12)
$BP=\left\{\begin{array}{l} 1,if~ c>r\\ exp\left(1-\frac{r}{c}\right),if~ c\leq r \end{array}\right.$,

where $BP$ is the penalty factor, $w_{n}$ is the weight factor, $w_{n}=\frac{1}{2^{n}}$ , and $N$ is the n-gram size, which is usually 4. The BLEU value is between 0 and 100, and the higher the similarity is, the larger the value of BLEU is. The results of the seq2seq model and Transformer model on the test sets are shown in Fig. 3.

Fig. 3 shows that the quality of translation obtained by the Transformer model was higher than that of the seq2seq model. First of all, on NIST06, the BLEU score of the seq2seq model was 36.45, while that of the Transformer model was 37.26, which is higher by 0.81. Second, on NIST08, the BLEU score of the seq2seq model was 30.16, while that of the Transformer model was 30.75, which is greater by 0.59. These results indicated that the Transformer model performed better than the seq2seq model in automatic Chinese-English translation.

Fig. 3 shows the result without considering OOV words. The seq2seq model was analyzed, and the OOV words were replaced using the method proposed in this paper to obtain a replaced corpus. The seq2seq model was then trained. The performance on different datasets is shown in Table 2.

Table 2 shows that after OOV processing, if a low-frequency OOV word was retained, the BLEU score was higher than that of the seq2seq model (37.12 (+0.67) for NIST06 and 30.34 (+0.18) for NIST08). However, when the low-frequency OOV word was deleted, the BLEU score decreased (36.16 (-0.26) for NIST06 and 30.08 (- 0.08) for NIST08). These results suggest that directly deleting OOV words might damage the sentence structure and result in ambiguity. Therefore, it is necessary to keep the low-frequency OOV word and replace it with <UNK> to maintain the integrity of the sentence. The BLEU scores of the Transformer model combined with OOV processing are shown in Table 3.

Table 3 shows that the BLEU score after Transformer+OOV processing was similar to that of seq2seq+OOV processing. When the low-frequency OOV word was reserved, the BLEU score was 37.89 (+0.63) for NIST06 and 30.84 (+0.09) for NIST08. When the low-frequency OOV word was deleted, the BLEU score was 37.17 (-0.09) for NIST06 and 30.33 (-0.42) for NIST08. Therefore, it was concluded from the results of the models that after replacing the high-frequency OOV words based on similarity, replacing the low-frequency OOV words with <UNK> could realize higher translation quality.

In the context of cross-cultural communication, the process of automatic translation between Chinese and English can easily lead to translation errors due to cultural differences, but after being processed by the OOV method designed in this study, the translation can be improved. The following sentence is shown as an example:

他昨天买了一件文化衫,作为朋友的生日礼物。

In this sentence, ``文化衫'' can be regarded as a low-frequency OOV word. When the Transformer model is used for translation, the result is:

He bought a cultural shirt yesterday as a birthday gift for his friend.

The Chinese word "文化衫" means a round-necked shirt with patterns and texts printed on it, which is used by young people to express their emotions, personality, and values. A direct translation of "cultural shirt" cannot express its meaning correctly.

If deletion is done in the OOV processing, the result obtained is:

He bought one yesterday as a birthday present for his friend.

The word "culture shirt" is deleted as an important part of the sentence, and the sentence loses its original meaning. In the proposed OOV processing method, the word is retained. After using similar word replacement, the result obtained is:

He bought a T-shirt yesterday as a birthday present for his friend.

The analysis of this example further demonstrates the reliability of the OOV processing method designed in this study for automatic Chinese-English translation in cross-cultural communication.

Fig. 3. Comparison of the BLEU score between the seq2seq model and Transformer model.
../../Resources/ieie/IEIESPC.2023.12.6.466/fig3.png
Table 1. Parameter settings.

Seq2seq model combined with attention mechanism

Number of network layers

6

Neuron type

LSTM

Encoder

6 layers of LSTM

Decoder

6 layers of LSTM

Number of neurons

256

Word vector dimension

256

Batch size

128

Dropout

0.2

Learning rate

1.0

Transformer model

Number of network layers

6

Word vector dimension

512

Hidden layer state dimension of feedforward neural network

2048

Head number

8

Batch size

6250

Dropout

0.1

Table 2. BLEU scores after seq2seq+OOV processing.

Low-frequency OOV words

NIST06

NIST08

Seq2seq

-

36.45

30.16

Seq2seq+OOV processing

Retained

37.12

30.34

Seq2seq+OOV processing

Deleted

36.16

30.08

Table 3. BLEU scores after Transformer+OOV processing.

Low-frequency OOV words

NIST06

NIST08

Transformer

-

37.26

30.75

Transformer+OOV processing

Retained

37.89

30.84

Transformer+OOV processing

Deleted

37.17

30.33

4. Conclusion

This paper presented an automatic Chinese-English translation algorithm for cross-cultural communication. Assuming that the quality of Chinese-English automatic translation can be improved by processing OOV words, a method for processing OOV words was designed. Tests on two models showed that the Transformer model had higher BLEU scores than the seq2seq model, indicating better performance in automatic Chinese-English translation.

After OOV processing, retaining low-frequency OOV words effectively improved the BLEU score, indicating that the translation quality was improved. However, this research also had some limitations, such as the high reliance on dictionaries in the processing of OOV words and the study of OOV words in only Chinese and English languages. Therefore, in future work, more in-depth research on the processing of OOV words is needed to reduce the reliance on dictionaries, and the method will be applied to more automatic translations in different languages to expand its applicability and promote better applications in solving translation tasks.

REFERENCES

1 
C. Xu, Q. Li, "Machine Translation and Computer Aided English Translation," Journal of Physics: Conference Series, Vol. 1881, No. 4, pp. 1-8, Jan. 2021.DOI
2 
C. Yang, "A Study of Influences of Big Data on Machine Translation and Enlightenment for Translation Teaching in Cross-cultural Communi-cation," 2020 International Conference on Information Science and Education (ICISE-IE), Vol. 2020, pp. 228-232, Dec. 2020.DOI
3 
S. Narzary, M. Brahma. B. Singha, R. Brahma, B. Dibragede, S. Barman, S. Nandi, B. Som, "Attention based English-Bodo Neural Machine Translation System for Tourism Domain," 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 335-343, Aug. 2019.DOI
4 
S. Li, "Research on the External Communication of Chinese Excellent Traditional Culture from the Perspective of Machine Translation," Journal of Physics: Conference Series, Vol. 1744, No. 3, pp. 1-8, 2021. http://dx.doi.org/10.1088/1742-6596/1744/3/032019DOI
5 
T. Kano, S. Sakti, S. Nakamura, "Transformer-Based Direct Speech-To-Speech Translation with Transcoder," 2021 IEEE Spoken Language Technology Workshop (SLT), Vol. 2021, pp. 958-965, Jan. 2021.DOI
6 
P. K. Nagaraj, K. S. Ravikumar, M. S. Kasyap, M. H. S. Murthy, J. Paul, “Kannada to English Machine Translation Using Deep Neural Network,” Ingénierie des Systèmes D Information, Vol. 26, No. 1, pp. 123-127, Feb. 2021.DOI
7 
X. Li, X. Hao, "English Machine Translation Model Based on Artificial Intelligence," Journal of Physics: Conference Series, Vol. 1982, No. 1, pp. 1-6, May. 2021.DOI
8 
G. Tiwari, A. Sharma, A. Sahotra, R. Kapoor, "English-Hindi Neural Machine Translation-LSTM Seq2Seq and ConvS2S," 2020 International Conference on Communication and Signal Processing (ICCSP), Vol. 2020, pp. 871-875, July. 2020.DOI
9 
K. Saengthongpattana, K. Kriengket, P. Porkaew, T. Supnithi, "Thai-English and English-Thai Translation Performance of Transformer Machine Translation," 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Vol. 2019, pp. 1-5, Oct. 2019.DOI
10 
M. S. Kumar, D. Dipankar, B. Sivaji, "MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation," Journal of Intelligent Systems, Vol. 28, No. 3, pp. 447-453, May 2018.DOI
11 
R. Baruah, R. K. Mundotiya, A. K. Singh, "Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages," Transactions on Asian and Low-Resource Language Information Processing, Vol. 21, No. 1, pp. 19.1-19.32, 2022.DOI
12 
Z. Tan, J. Su, B. Wang, Y. Chen, X. Shi, "Lattice-to-sequence attentional Neural Machine Translation models," Neurocomputing, Vol. 284, No. APR.5, pp. 138-147, April. 2018.DOI
13 
X. Li, V. Krivtsov, K. Arora, "Attention-based deep survival model for time series data," Reliability Engineering & System Safety, Vol. 217, pp. 293-304, 2022.DOI
14 
J. Cho, S. Watanabe, T. Hori, M. K. Baskar, H. Inaguma, J. Villalba, N. Dehak, "Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2019, pp. 6191-6195, April. 2019.DOI
15 
D. D. Kalamkar, K. Banerjee, Srinivasan, S. Sridharan, E. Georganas, M. E. Smorkalov, C. Xu, A. Heinecke, "Training Google Neural Machine Translation on an Intel CPU Cluster," 2019 IEEE International Conference on Cluster Computing (CLUSTER), Vol. 2019, pp. 1-10, Nov. 2019.DOI
16 
W. Hu, Y. Zhang, Q. Guo, X. Huang, G. Li, W. Wang, Y. Meng, "Research on Short-Term Load Forecasting Method of Power System Based on Seq2Seq-Attention Model," 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2), pp. 227-232, Oct. 2020. http://dx.doi.org/10.1109/EI250167.2020.9346583DOI
17 
H. Luo, S. Zhang, M. Lei, L. Xie, "Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition," 2021 IEEE Spoken Language Technology Workshop (SLT), Vol. 2021, pp. 75-81, Jan. 2021.DOI
18 
K. Jin, X. Zhang, J. Zhang, "Learning to Generate Diverse and Authentic Reviews via an Encoder-Decoder Model with Transformer and GRU," 2019 IEEE International Conference on Big Data (Big Data), pp. 3180-3189, Dec. 2019.DOI
19 
E. Egorova, L. Burget, "Out-of-Vocabulary Word Recovery using FST-Based Subword Unit Clustering in a Hybrid ASR System," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2018, pp. 5919-5923, April. 2018.DOI
20 
L. Nguyen, H. H. Chung, K. V. Tuliao, T. M. Y. Lin, "Using XGBoost and Skip-Gram Model to Predict Online Review Popularity," SAGE Open, Vol. 10, No. 4, pp. 215824402098331, Oct. 2020.DOI
21 
H. K. Vydana, M. Karafiát, K. Zmolikova, L. Burget, H. Černocký, “Jointly Trained Transformers Models for Spoken Language Translation,” ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2021, pp. 7513-7517, June. 2021.DOI

Author

Jiayan Duan
../../Resources/ieie/IEIESPC.2023.12.6.466/au1.png

Jiayan Duan was born in Hebei, China, in 1983. From 2003 to 2007, she studied at Beijing University of Chemical Technology and received a bachelor’s degree in 2007. From 2013 to 2015, she studied at Beijing International Studies University and received a master’s degree in 2015. Currently, she works at Yanching Institute of Technology. She has published eight academic papers and translated two books. Her main research interests include applied translation studies and translation teaching theory and practice.

Hongwei Ma
../../Resources/ieie/IEIESPC.2023.12.6.466/au2.png

Hongwei Ma was born in Changchun City, Jilin Province, China. He received an M.A. degree in foreign linguistics and applied linguistics from Changchun University of Technology in 2011 and an M.A. degree in English translation from Beijing Normal University in 2016. He has been working at Yanching Institute of Technology since 2021. He is engaged in research on English language education, translation and cross-cultural communication, and comparative literature.

Junxia Wang
../../Resources/ieie/IEIESPC.2023.12.6.466/au3.png

Junxia Wang was born in Shan'xi, China, in 1982. From 2000 to 2007, she studied at China University of Geosciences and obtained a bachelor's degree and master's degree. From 2007 to present, she has worked at Yanching Institute of Technology. She has undertaken two projects related to teaching supported by Education Department of Hebei Province. She has published over 20 academic papers and 7 books. Her main research interests include applied linguistics and teaching