NiXuran1
-
(Department of International Educational Exchange, Tangshan Vocational and Technical
College, Tangshan, Hebei 063000, China )
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Machine translation, Tense recognition, Tense translation, Chinese-English translation, Bilingual evaluation understudy
1. Introduction
Translations are becoming increasingly important as cross-cultural communication becomes
more frequent [1]. Human translation is increasingly challenging to meet the current huge demand for
translation. Therefore, machine translation (MT) has been studied widely [2]. MT can automatically convert the text of language A into the text of language B,
which is highly efficient and makes communication easier [3]. With the development of deep learning, neural machine translation (NMT) has become
a mainstream method with a better translation effect than traditional MT [4]. NMT has promising applications in translating many languages [5], and its research is in progress [6]. Choi et al. [7] designed a fine-grained attention mechanism, conducted experiments on En–De and En–Fi
translation tasks, and reported that the method improved the translation quality.
Sun et al. [8] examined Tibetan–Chinese NMT. They designed a method combining techniques, such as
stop list and reverse translation. They found through experiments that the bilingual
evaluation understudy (BLEU) value of Tibetan-Chinese NMT increased from the initial
5.53 to 19.03. Martinez et al. [9] proposed a method combining character-level information for subword segmentation
in NMT, used a custom algorithm to select the binary character n-gram features, and
verified the advantage of the method in handling resource-constrained languages and
its better performance in BLEU score. Ma [10] proposed a grammar-based approach that merges source-side grammatical structures
into the attention mechanism and location coding. Experimentally, the improvement
of BLEU was 2.32, 2.91, and 1.03 for English-Japanese, English–Chinese, and English–German
translation tasks, respectively. Chinese–English translations are used extensively;
however, there is a great deficiency in the translation of tenses when using MT or
NMT for translation. The English tense can be reflected by verb morphology, while
Chinese does not have word morphology changes, and the Chinese verb does not contain
tense information. For example, in sentences such as "我要去吃饭了", "我正在吃饭" and "我吃过饭了",
the verbs have no change as the tense changes, which leads to a poorer quality of
MT in Chinese–English translations compared with English–Chinese translations. Therefore,
this paper focuses mainly on how to achieve recognition and translation of different
tenses in Chinese–English translation. Based on NMT, this paper combined a neural
network model to recognize the different tenses of Chinese verbs to achieve the consistency
of tense translations in Chinese–English translations. Moreover, the effectiveness
of this method in improving the translation quality was verified through experiments.
This paper provides a new method to improve Chinese-to-English translations in machine
translations. The proposed method can be applied to more tense recognition and translation
problems in different languages, promoting the further development of machine translations.
2. Machine Translation
2.1 Neural Machine Translation
NMT uses an encoder-decoder structure [11]. The encoder part encodes the source language sequence and outputs a fixed-length
vector representation $\mathrm{C}$, which is called context vector representation.
The vector representation was decoded in the decoder part to obtain the translated
sequence. Before the translation, the original language needs to be converted to digital
vectors to facilitate computer processing. This process is called embedding word embedding.
The commonly used models include COBW, Skip-Gram, and Word2vec [12]. There are two frequently used methods in the decoding stage.
Greedy search [13]: If the output sequence of the decoder is $\hat{Y}=\left(\hat{y}_{1},\hat{y}_{2},\cdots
,\hat{y}_{T}\right)$ and the transliteration word list is written as $V$, the decoding
process of a greedy search is as follows. ① After the source language sequence is
encoded, start symbol <bos> is input to the decoder to start decoding; ② the probability
of every word in $V$ is calculated, and the result is generated sequentially; ③ at
the $t$ moment, according to the formula, $\hat{y}_{t}=\underset{y\in V}{argmax}\log
p\left(y|\hat{y}_{0\sim t-1},x_{1\sim T'};\theta \right)$, the word with the highest
probability is selected; ④ the decoder generates symbol <eos>, the decoding ends,
and the final translation is obtained.
Cluster search [14]: greedy search falls easily into a local optimum. The cluster search first caches
the results, whose number equals the cluster width, and outputs the result with the
highest comprehensive probability. The result obtained is more diverse and more convergent
to the global optimum. Its decoding process is as follows. ① Let the cluster width
be $K$ at the $t-1$ moment. The cluster candidate sequence is $C_{t-1}=\left\{\overset{˜}{y}_{0\sim
t-1}^{\left(1\right)},\overset{˜}{y}_{0\sim t-1}^{\left(2\right)},\cdots ,\overset{˜}{y}_{0\sim
t-1}^{\left(K\right)}\right\}$. ② At the $t$ moment, a greedy search is performed
on $\mathrm{K}$ candidate sequences for $\mathrm{K}$ times: $C_{t}=$ $C_{t}=\left\{\overset{˜}{y}_{0\sim
t}^{\left(1\right)},\overset{˜}{y}_{0\sim t}^{\left(2\right)},\cdots ,\overset{˜}{y}_{0\sim
t}^{\left(K\right)}\right\}=\underset{y_{t}\in V,y_{0\sim t-1}\in C_{t-1}}{\text{argsort}}^{K}\sum
_{t'=0}^{t}\log p\left(y_{t'}|\hat{y}_{0\sim t'-1},x_{1\sim T'};\theta \right)\,.$
③ when a sequence outputs <eos>, it means that the decoding of this sequence is completed;
after $K$ sequences are decoded, they are reordered using logarithmic regularization:
$\hat{\mathrm{Y}}=\underset{\hat{\mathrm{Y}}_{1},\cdots ,\hat{\mathrm{Y}}_{\mathrm{K}}}{\text{argmax}}\left(\frac{1}{\left|
\mathrm{Y}\right| }\right)^{\alpha }\log \mathrm{p}\left(\mathrm{Y}|\mathrm{X};\theta
\right)$, where $\left| \mathrm{Y}\right| $ refers to the sequence length, and $\alpha
$ is usually 0.6; the value with the largest probability is output.
2.2 Long Short-term Memory-based Neural Machine Translation
Recurrent neural network (RNN)-based NMT is a commonly used one [15]; however, RNN is prone to gradient disappearance when dealing with long sequences
[16]. LSTM-based NMT emerged to solve this problem [17]. LSTM is an improvement of RNN and has good performance in pattern recognition and
data prediction [18]. LSTM neurons mainly include input gates, forgetting gates, output gates, and memory
cells, and the computational process is described as follows.
The input gate $I_{t}$, forgetting gate $F_{t}$, and output gate $O_{t}$ are calculated:
where $X_{t}$ refers to the input at the current moment, $H_{t-1}$ is the hidden vector
at the previous moment, and $W$ and $b$ are the weight and bias of every layer.
The candidate memory cell $\overset{˜}{C}_{t}$ was calculated:
where $C_{t}$ and $C_{t-1}$ refer to the memory cells of the current and previous
moment, respectively.
The hidden state $H_{t}$ was calculated:
The Bi-LSTM model [19], which consists of forward and backward LSTMs to obtain past and future vector information,
is proposed to solve the problem that LSTM cannot learn from future text sequences.
It was assumed that the input vector of the current time step is $X_{n}$, and the
matrix weight is $W_{n}$, then the forward calculation formula of Bi-LSTM is
The backward calculation formula is
3. Methods for Recognizing Different Tenses in Chinese
3.1 Collation of the Corpus Data
Before conducting tense recognition, the bilingual corpus data needs to be organized
first. The tense recognition method used in this paper targeted verbs. The open-source
POS toolkit from Stanford was used to recognize English verb tenses. English verb
tenses are divided into the following categories.
(1) VB: Verb original form
(2) VBP: present tense, non-third person singular
(3) VBZ: present tense, third person singular
(4) VBD: past tense
(5) VBG: present participle
(6) VBN: past participle
(7) MD: modal verb
For Chinese verb tenses, the alignment information of attention in NMT is used to
map English tenses to the corresponding Chinese verbs. All Chinese verbs are labeled
as VV, and non-verbs are labeled as None. English tense information and Chinese verb
information are converted to vectors and imported into NMT. When generating Chinese
words in every step, the tense information corresponding to the English word with
the largest probability corresponding to the current attention is identified and transferred
to the Chinese word at the decoding step to obtain the tense labeling sequence corresponding
to the Chinese word position. The tense information of the training data is obtained
in this way. After training, the NMT model tends to be stable after the 10$^{\mathrm{th}}$epoch.
Therefore, the results of the 10-12$^{\mathrm{th}}$ epoch are used as the final data.
The process of data collation is as follows.
(1) If the data from two of the three epochs label the verb as tense A, the final
tense of the verb is A.
(2) If the data from two of the three epochs label the verb as None, the other result
that is not None is taken as the final tense.
(3) If the results of all three epochs are different, the largest epoch result that
is not None is taken as the final tense.
3.2 NMT Model Combined with Tense Recognition
LSTM is used to predict the tense of Chinese words. The source-end sequences are transformed
into word vectors by embedding, and the tense prediction information of the current
word is obtained through the LSTM network. For every Chinese verb, let the tense prediction
result of LSTM be $\mathrm{T}_{\mathrm{s}}$ and the tense of the English verb obtained
by translation be $T_{t}$. In the NMT translation process, in the decoder stage, if
the attention aligned by the source end at the current time step is a verb, then,
in the process of cluster search, a limiting condition is added: $T_{s}=T_{s}$, to
realize that the tense of the candidate word at the target end is consistent with
the Chinese tense at the source end. The NMT model combining tense recognition is
illustrated with a simple sentence as an example.
The sentence ``我昨天打扫了房间'' is taken as an example. Before the translation, the sentence
is passed through LSTM to obtain the tense annotation sequence. ``打扫'' is recognized
as past tense, and this tense information is saved. The sentence passes through the
encoder–decoder module. When generating the time step of ``cleaned'', the corresponding
source-end position in the attention alignment matrix is ``打扫''. In the decoding process,
the value of cluster search is set as 10. The first 10 candidate words with the highest
probability are selected, and the English tense of these 10 words is obtained. The
word with the highest probability is ``clean'', which is in the present tense and
is inconsistent with tense recognition, so it was eliminated. Thus, the word with
the second highest probability, i.e., ``cleaned'', rises to first place and becomes
the final translation.
4. Analysis of Results
The experiments were conducted on a Linux system. The deep learning framework used
was Pytorth, which is flexible in operation, easy to deploy, and can support NMT research
well. The experimental dataset was a NIST dataset (https://catalog.ldc.upenn.edu/).
The dataset contains audio files and corresponding text transcriptions from different
languages and topics. The English part contains audio files and transcribed texts
from various scenarios, such as news broadcasts, teleconferences, and interviews.
The Mandarin part contains audio files and transcribed texts from scenarios, such
as teleconference and narration. Its wide range of speech sources, high speech quality,
and the inclusion of different languages and topics make it a high reference value
in speech recognition. The NIST dataset contained the original corpus except for sentences
with a sequence length larger than 50. The BPE tool was used for segmentation, and
<bos> and <eos> markers were added. For example, [‘<bos>’, ’他’, ’有’, ’一只’, ’猫’, ’。’,
’<eos>’], [‘<bos>’, ’He’, ’has’, ’a’, ’cat’, ’.’, ’<eos>’]. The NIST dataset contained
subsets, such as NIST 04 (MT04). This paper used MT05 as the validation set and MT04,
MT06, and MT08 as the test sets. The details are listed in Table 1.
The tense training of the source corpus was achieved by training the NMT. The neural
network for tense recognition was trained with 20 epochs. The initial learning rate
was set to 0.001. The effects of tense recognition were evaluated by accuracy. The
translation performance was evaluated using the BLEU score [20]. The corresponding formulae are as follows.
where the candidate refers to the machine translation; $p_{n}$ is the translation
accuracy; $BP$ is the penalty factor; $c$ and $r$ are the length of machine translation
and reference translation, respectively; $w_{n}$ is the weight.
First, the effectiveness of two methods, LSTM and Bi-LSTM, for tense recognition was
compared. Table 2 lists the accuracy of tense recognition of the two methods for the validation and
test sets.
When using LSTM as the neural network model for tense recognition, its accuracy was
approximately 80% (maximum: 83.64%; minimum: 80.67%; average: 82.09%), while its accuracy
was around 90% (maximum: 91.64%, minimum: 88.42%; average: 89.89%) when using Bi-LSTM
as the recognition model (Table 2). The average accuracy of the latter was 7.8% higher than that of the former, indicating
that the Bi-LSTM model was more accurate in recognizing different tenses in Chinese.
Bi-SLTM improved the effect of Chinese verb tense recognition significantly by analyzing
the past and future context information. Therefore, it was more suitable for Chinese
tense recognition.
The translation effect of the NMT models combined with tense recognition was analyzed.
The baseline used was the RNN-based NMT model, the LSTM-based NMT model, and the Bi-LSTM-based
NMT model. The Bi-LSTM-based tense recognition was combined with the baseline. The
translation effects of different models were compared, and Table 3 lists the results.
The BLEU score of the NMT models combined with tense recognition was significantly
higher than that of the baseline (Table 3). First, a comparison of the baseline showed that the average BLEU score of the Bi-LSTM-based
model was 33.43, which was 1.2 larger than the RNN-based model (32.23) and 0.56 larger
than the LSTM-based model (32.87). The BLEU score of the RNN-based model combined
with tense recognition was 36.53, which was 4.3 larger than the baseline. The BLEU
score of the LSTM-based model combined with tense recognition was 38.92, which was
6.05 larger than the baseline. The BLEU score of the Bi-LSTM-based model combined
with tense recognition was 40.33, which was 6.9 larger than the baseline. These results
show that combining tense recognition improved the NMT quality.
Finally, the translation results of the NMT models combined with tense recognition
were analyzed with several sentences as examples.
According to Table 4, in the first example sentence, for the verb ``伤害'' in the source sentence, the translation
result of the reference translation was ``had tarnished''; the translation result
of the Bi-LSTM-based model without the recognition of Chinese verb tense was ``harms'';
the translation result of the NMT model combined with tense recognition was ``harmed''.
In the second example sentence, for the verb ``免除'' in the source sentence, the translation
result of the reference translation was ``removed''; the translation result of the
Bi-LSTM-based model was ``sacks''; the translation result of the Bi-LSTM-based model
combined with verb tense recognition was ``fired''. These results suggested that the
Bi-LSTM-based model combined with tense recognition was reliable in recognizing verb
tense.
Table 1. Experimental Data Set.
|
Training set
|
Validation set
|
Test set
|
Name
|
NIST
Zh-En
|
MT05
|
MT04
|
MT06
|
MT08
|
Size
|
1.2M
|
1082 sentences
|
1788 sentences
|
1664 sentences
|
1357 sentences
|
Table 2. Accuracy of Tense Recognition.
|
MT05
|
MT04
|
MT06
|
MT08
|
LSTM
|
83.64%
|
82.72%
|
81.33%
|
80.67%
|
Bi-LTM
|
88.42%
|
89.37%
|
90.11%
|
91.64%
|
Table 3. Comparison of Translation Effects Between Different Models.
|
MT05
|
MT04
|
MT06
|
MT08
|
Average value
|
RNN
|
33.64
|
37.64
|
32.07
|
25.57
|
32.23
|
LSTM
|
34.06
|
38.07
|
33.12
|
26.21
|
32.87
|
Bi-LSTM
|
34.42
|
38.66
|
33.87
|
26.78
|
33.43
|
RNN+tense recognition
|
39.64
|
42.11
|
35.67
|
28.68
|
36.53
|
LSTM+tense recognition
|
41.26
|
43.57
|
39.78
|
31.05
|
38.92
|
Bi-LSTM+tense recognition
|
42.33
|
44.56
|
41.64
|
32.77
|
40.33
|
Table 4. Example Sentence Analysis.
The original Chinese sentence
|
大使馆的关闭曾激怒菲律宾政府,它说,所谓的威胁是过于夸大的,而关闭大使馆伤害菲国的形象。
|
Reference translation
|
The closing of the embassies had angered the Philippine government, which said that
the so-called threats were exaggerated and the closing had tarnished Philippines image.
|
Bi-LSTM
|
The closure of the embassies had angered the Philippine government, which said the
alleged threat was exaggerated and the closure of the embassies harms the image of the Philippines.
|
Bi-LSTM + tense recognition
|
The closure of the embassies had angered the Philippine government, which said that
the alleged threats were exaggerated and the closure of embassies harmed the image of Philippines.
|
The original Chinese sentence
|
马尼拉曾免除了警方情报官员的职务,因为他透露有关澳洲与加拿大大使馆遭到恐怖威胁的未经证实的情报。
|
Reference translation
|
Manila removed an intelligence officer from the police, for he had released unconfirmed intelligence
about the terrorist threat to the Australia and Canadian embassies.
|
Bi-LSTM
|
Manila sacks a police intelligence officer after he revealed unsubstantiated intelligence about
terrorist threats against the Australian and Canadian embassies.
|
Bi-LSTM + tense recognition
|
Manila fired the police intelligence official because he leaked the unverified intelligence about
terrorist threats upon the Australian and Canadian embassies.
|
5. Discussion
The tense recognition and translation problem in Chinese-to-English machine translation
is a vital part, which significantly influences the quality of Chinese-to-English
translation. In this paper, an LSTM-based neural network method was designed for tense
recognition translation, and its experimental analysis was carried out on an NIST
dataset.
The experimental results showed that Bi-LSTM had higher recognition accuracy than
LSTM in tense recognition. Compared with LSTM, Bi-LSTM performed better in tense recognition
because it learned sufficient information in both directions through forward LSTM
and backward LSTM. In comparison, Bi-LSTM achieved an average accuracy of 89.89% on
the test set, 7.8% better than LSTM. This demonstrated the superiority of Bi-LSTM
in Chinese tense recognition. The machine translation showed that the BLEU scores
of the RNN-based NMT model, the LSTM-based NMT model, and the Bi-LSTM-based NMT model
were all improved to a certain extent after combining tense recognition, demonstrating
the reliability of tense recognition for improving translation results. The analysis
results of the example sentences in Table 4 suggested that Bi-LSTM identified and translated the tenses in Chinese accurately,
making the obtained translations closer to the semantics of the source sentences.
This study contributes to the tense recognition and translation of Chinese-to-English
translation. Maintaining the tense consistency between Chinese and English through
the recognition and translation of verbs effectively improves the quality of Chinese-to-English
translation, providing a theoretical basis for further improving the level of machine
translation and a new idea for the problem of tense recognition and translation in
other languages, such as English-to-Chinese and Chinese-to-French translation.
6. Conclusion
This paper mainly studied machine translation of Chinese to English, designed a neural
network-based recognition method to solve the problem of recognizing different tenses
of Chinese verbs, and combined it with NMT models. The experiments showed that the
Bi-LSTM-based model showed higher accuracy in recognizing different tenses than the
LSTM-based model, with an average accuracy of 89.89%. In the performance comparison
of NMT models combined with tense recognition, the BLEU score of the Bi-LSTM-based
NMT model combined with tense recognition was higher, with an average value of 40.33.
These results confirmed that the Bi-LSTM-based model combined with tense recognition
is reliable for improving the quality of Chinese–English translation and can be further
promoted and applied in practice.
REFERENCES
A. V. Potnis, R. C. Shinde, S. S. Durbha, ``Towards Natural Language Question Answering
Over Earth Observation Linked Data Using Attention-Based Neural Machine Translation,''
IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, Vol.
2020, pp. 577-580, Sep. 2020.
A. G. Dorst, S. Valdez, H. Bouman, ``Machine translation in the multilingual classroom:
How, when and why do humanities students at a Dutch university use machine translation?,''
Translation and Translanguaging in Multilingual Contexts, Vol. 8, No. 1, pp. 49-66,
Feb. 2022.
C. Lalrempuii, B. Soni, P. Pakray, ``An Improved English-to-Mizo Neural Machine Translation,''
ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 20,
No. 4, pp. 1-21, May. 2021.
R. Kr\"{u}ger, ``Some Translation Studies informed suggestions for further balancing
methodologies for machine translation quality evaluation,'' Translation Spaces, Vol.
11, No. 2, pp. 213-233, March. 2022.
A. Ba, B. Bjd, A. Ra, ``Impact of Filtering Generated Pseudo Bilingual Texts in Low-Resource
Neural Machine Translation Enhancement: The Case of Persian-Spanish - ScienceDirect,''
Procedia Computer Science, Vol. 189, pp. 136-141, July. 2021.
T. K. Lam, J. Kreutzer, S. Riezler, ``A reinforcement learning approach to interactive-predictive
neural machine translation,'' Proceedings of the 21st Annual Conference of the European
Association for Machine Translation, Vol. 2018, pp. 169-178, May. 2018.
H. Choi, K. Cho, Y. Bengio, ``Fine-Grained Attention Mechanism for Neural Machine
Translation,'' Neurocomputing, Vol. 284, No. APR.5, pp. 171-176, March. 2018.
Y. Sun, C. Yong, ``Research on Tibetan-Chinese neural network machine translation
with few samples,'' Journal of Physics: Conference Series, Vol. 1871, No. 1, pp. 1-8,
April. 2021.
A. Martinez, K. Sudoh, Y. Matsumoto, ``Sub-Subword N-Gram Features for Subword-Level
Neural Machine Translation,'' Journal of Natural Language Processing, Vol. 28, No.
1, pp. 82-103, Jan. 2021.
C. Ma, ``Syntax-based Transformer for Neural Machine TranslationSyntax-based Transformer
for Neural Machine Translation,'' Journal of Natural Language Processing, Vol. 28,
No. 2, pp. 682-687, Jan. 2021.
J. Su, J. Chen, H. Jiang, C. Zhou, H. Lin, Y. Ge, Q. Wu, Y. Lai, ``Multi-modal neural
machine translation with deep semantic interactions - ScienceDirect,'' Information
Sciences, Vol. 554, pp. 47-60, Nov. 2020.
S. Tiun, U. A. Mokhtar, S. H. Bakar, S. Saad, ``Classification of functional and non-functional
requirement in software requirement using Word2vec and fast Text,'' Journal of Physics:
Conference Series, Vol. 1529, No. 4, pp. 1-6, April. 2020.
R. Zarkami, M. Moradi, R. S. Pasvisheh, A. Bani, K. Abbasi, ``Input variable selection
with greedy stepwise search algorithm for analysing the probability of fish occurrence:
A case study for Alburnoides mossulensis in the Gamasiab River, Iran,'' Ecological
Engineering, Vol. 118, pp. 104-110, May. 2018.
P. G. Shambharkar, P. Kumari, P. Yadav, R. Kumar, ``Generating Caption for Image using
Beam Search and Analyzation with Unsupervised Image Captioning Algorithm,'' 2021 5th
International Conference on Intelligent Computing and Control Systems (ICICCS), Vol.
2021, pp. 857-864, May. 2021.
Y. Liu, D. Zhang, L. Du, Z. Gu, J. Qiu, Q. Tan, ``A Simple but Effective Way to Improve
the Performance of RNN-Based Encoder in Neural Machine Translation Task,'' 2019 IEEE
Fourth International Conference on Data Science in Cyberspace (DSC), Vol. 2019, pp.
416-421,June. 2019.
Z. Liu, F. Qi, ``Research on advertising content recognition based on convolutional
neural network and recurrent neural network,'' International Journal of Computational
Science and Engineering, Vol. 24, No. 4, pp. 398-404, Jan. 2021.
K. Shuang, R. Li, M. Gu, J. Loo, S. Su, ``Major-minor long short-term memory for word-level
language model,'' IEEE Transactions on Neural Networks and Learning Systems, Vol.
31, No. 10, pp. 3932-3946, Dec. 2020.
S. Xu, R. Niu, ``Displacement prediction of Baijiabao landslide based on empirical
mode decomposition and long short-term memory neural network in Three Gorges area,
China,'' Computers & Geosciences, Vol. 111, pp. 87-96, Feb. 2018.
M. Banna, T. Ghosh, M. Nahian, K. A. Taher, M. S. Kaiser, M. Mahmud, M. S. Hossain,
K. Andersson, ``Attention-based Bi-directional Long-Short Term Memory Network for
Earthquake Prediction,'' IEEE Access, Vol. 9, No. 56589-56603, April. 2021.
H. I. Liu, W. L. Chen, ``Re-Transformer: A Self-Attention Based Model for Machine
Translation,'' Procedia Computer Science, Vol. 189, No. 8, pp. 3-10, July. 2021.
Author
Xuran Ni was born in Hebei, China in 1983. From 2002 to 2006, she studied at Hebei
University and received her bachelor's degree in 2006. From 2011 to 2015, she studied
at Capital Normal University and received her Master's degree in 2015. She has published
17 papers. Her research interests include English teaching and reform.