Mobile QR Code QR CODE

2025

Reject Ratio

81.5%


  1. (School of Literature and History, Longdong University, Qingyang 745000, China)
  2. (School of Foreign Languages, Henan University of Science and Technology, Luoyang 471000, China)



BERT-BiLSTM, Genetic algorithm, English translation, System optimization

1. Introduction

With the in-depth development of globalization and the popularization and application of Internet technology, the demand for language communication is increasing daily, especially in cross-cultural communication, international trade, and academic research. Efficient multilingual translation ability has become a critical technical support point [1, 2]. However, existing translation systems still face many challenges in practical application scenarios, especially when dealing with domain-specific or complex structure translation of English phrases [3, 4]. On the one hand, English phrases put forward higher requirements for the understanding and generation ability of the translation system because of their rich grammatical structure and complex semantic relations [5]. On the other hand, improving translation efficiency while ensuring translation quality and meeting the needs of real-time and large-scale data processing has become an urgent problem that needs to be solved [6]. Therefore, exploring more efficient and accurate English phrase translation methods for promoting the barrier-free circulation of cross-language information is significant.

Against this background, the development of NLP (Natural Language Processing) technology is remarkably rapid, among which machine translation is one of the essential branches of NLP, and its accuracy and efficiency directly affect the quality and speed of information transmission [7, 8]. In recent years, deep learning models such as BERT (Bidirectional Encoder Representations from Transformers) and BiLSTM (Bidirectional Long Short-Term Memory) have shown excellent performance in natural language processing tasks, which has extensively promoted the progress in the field of machine translation [9].

To solve the above problems, this study proposes an English phrase translation system framework that combines the BERT pre-trained model with the sequence modeling capabilities of BiLSTM and introduces a genetic algorithm for parameter optimization. The BERT model can effectively capture long-distance dependencies and enhance semantic expression capabilities; BiLSTM is good at processing time series data and can better use context information for prediction. BERT, a cutting-edge NLP model, accurately captures the context of words with a bidirectional Transformer structure, improves text semantic understanding, and lays a solid foundation for translation. With its bidirectional memory ability, BiLSTM strengthens the ability of the translation system to deal with complex syntax and long-distance dependence so that the translation is fluent and accurate. A genetic algorithm is integrated into the system to optimize model parameters and translation strategies, efficiently explore the optimal solution by simulating natural selection and genetic mechanisms, adjust BERT-BiLSTM hyperparameters, optimize translation rules, and improve the performance of specific tasks. The three work together: BERT parses the source language, BiLSTM builds the target language, and the genetic algorithm intelligently optimizes parameters and strategies to ensure iterative improvement of translation quality. This combination leverages the advantages of each model to significantly improve the efficiency and accuracy of English phrase translation through algorithm innovation, demonstrating the great potential and value of AI in the field of language processing. By combining the advantages of these two models, a translation system is constructed that can deeply understand the meaning of source language phrases and quickly generate target language translations. In order to improve the translation quality and efficiency further, this study will use a genetic algorithm to optimize the model parameters to achieve the best performance configuration. This method can reduce the workload of manual parameter adjustment and avoid the problem of locally optimal solutions to a certain extent, thereby achieving robustness improvement in a broader range of application scenarios. The optimization research of an efficient English phrase translation system based on BERT-BiLSTM and genetic algorithm is an important work combining theoretical innovation with practical application and a robust measure to promote exchanges and mutual learning among human civilizations and enhance world peace and development. It is expected that the efforts of this research will bring revolutionary changes to the global language service industry and open a new era of intelligent, personalized, and high-quality language communication.

2. Research on English Phrase Recognition Method Based on BERT-BiLSTM

2.1. Bidirectional Encoder Representations from Transformers

A feedforward neural network and self-attention mechanism construct the Transformer. The self-attention mechanism focuses on the current vocabulary and considers the context and context, significantly enhancing the ability to comprehend the context [10, 11]. The Transformer adopts an Encoder-Decoder structure, each composed of 6 repeating modules, as shown in BERT-Base. The encoder consists of 6 layers: the first layer is Multihead-Attention, the second layer is feedforwarding neural network, residual connection, and layer standardization are applied between layers, and the output dimension is 512. The decoder structure is similar; the first layer is the mask Multihead-Attention, and the last two layers correspond to the encoder, which also adopts residual and standardized operations [12].

BERT is a pre-trained model based on Transformer architecture. It has vibrant language knowledge representation through self-supervised learning on massive unlabeled text, especially for context-sensitive vocabulary understanding and complex sentence meaning analysis.

BERT applies the Attention mechanism to modern deep learning to improve the performance and alignment effect of machine translation [13]. Imagine the constituent elements in Source as composed of a series of < Keyi, Valuei > data pairs. Given a specific query item in the target, calculate the similarity between the Query and different Key, perform weighted summation, and calculate the corresponding weight coefficient of Value. The formula is shown in Eq. (1).

(1)
$ Attention(Query,Source) \nonumber\\ = \sum_{i=1}^{L} Similarity(Query,Key_i) * Value_i. $

The Similarity table measures the correlation between the Query vector (Query) and the Key vector (Key). Value represents the information content associated with each Key, a vector that stores specific information, where L is the length of the input sentence, Query is the word sequence in Target, and (i = 1, 2, ..., n) is the specific sequence coding. Source points to the semantic coding corresponding to each word in the input sentence, so the result obtained is more accurate [14]. The calculation steps of the Attention mechanism are roughly as follows: first, the weight coefficient is calculated according to Query and Key, and the process can be subdivided into first calculating the similarity or correlation between Query and Key and then performing normalization processing Finally, the Value is weighted and summed according to the weight coefficient.

In the first stage, a two-vector dot product, two-vector Cosine similarity, or neural network is used to calculate the similarity between Query and Key. The formulas are shown in Eqs. (2)-(4), where MLP represents a multi-layer perceptron.

(2)
$ Similarity(Query,Key_i) = Query \cdot Key_i, $
(3)
$ Similarity(Query,Key_i) = \frac{Query \cdot Key_i}{\|Query\|\|Key_i\|}, $
(4)
$ Similarity(Query,Key_i) = MLP(Query \cdot Key_i). $

Query represents the characteristics of content found or retrieved from the information set. Key matches the Query and determines which parts of the information set the Query is related to. In the second stage, a function is introduced to numerically normalize the calculation results of the previous stage, using the following formula (5): SoftMax is the activation function, and e is the input vector.

(5)
$ A_i = softmax(Similarity(Query,Key_i)) \nonumber\\ = \frac{e^{Similarity(Query,Key_i)}}{\sum_{i=1}^{L} e^{Similarity(Query,Key_j)}}. $

A represents a specific variable, and L is the input sentence length. In the third stage, the calculation result Ai of the weight coefficient corresponding to Valuei is weighted and summed, as shown in formula (6):

(6)
$ Attention(Query,Source) = \sum_{i=1}^{L} A_i \cdot Value_i. $

The Valuei represents the corresponding weight coefficient. The attention mechanism is simple to calculate, with few parameters and fast speed, which solves the problem of RNN being unable to be calculated in parallel [15, 16]. Due to distance limitation, NLP effectively reduces the dependence complexity between the source sequence and target sequence, and the empirical results show that the model has a superior effect. The Key part of BERT is the self-attention mechanism. First, the self-attention mechanism will calculate three new vectors: Query (v), Key (w), and Value (x). The vectors have the same dimension. The self-attention mechanism mainly obtains the representation of words by adjusting the association coefficient matrix between words and words in sentences, and the formula is shown in Eq. (7).

(7)
$ Attention(Query,Source) = \sum_{i=1}^{L} A_i \cdot Value_i. $

Among them, Q, K, and V are the input vector matrices of characters, the dimensions of Q and K are both dk and the vector dimension of V is dv. The multi-head attention mechanism of BERT projects Q, K, and V through linear transformation, apply Scaled Dot-Product Attention and splices different self-attention results to extract sentence semantics. The formulas are shown in Eqs. (8)-(9).

(8)
$ MultiheadAttention \nonumber\\ = Concat(head_1,head_2,\cdots,head_h)\cdot W^o, $
(9)
$ head_i = SelfAttention(QW_i^Q,KW_i^K,VW_i^V). $

Among them, $W_i^Q \in$ Rdmodelxdk, $W_i^K \in$ Rdmodelxdk, $W_i^V \in$ Rdmodelxdk, WO $\in$ Rdmodelxdk, head is the number of attention heads, i is the head index, and WO is the weight parameter. Regarding calculating probability, specifically, the probability of the next word appearing is calculated from left to right [17]. It is mainly a training set composed of sentences w1, w2, ..., wm, and then a language model with occurrence probability is obtained by neural network training, as shown in formula (10).

(10)
$ p(S) = p(w_1,w_2,\cdots ,w_m) \nonumber\\ = \prod_{i=1}^{m} p(w_i|w_1,w_2,\cdots ,w_{i-1}). $

p indicates the probability that the next word will appear. S denotes the input sequence, and w refers to the number of sequences. Traditional language models are static and cannot represent word meaning and grammar dynamically. Pre-trained models such as ELMo, GPT, and BERT are pre-loaded with semantic information, and large-scale corpus is used to enhance word meaning expression, improve model robustness, and avoid repeated training from zero [18, 19]. BERT is based on Transformer’s Encoder, which has a deeper model structure, efficiently captures context, and significantly enhances feature extraction capabilities [20]. BERT training is divided into two stages: pre-training and fine-tuning, and its structure is shown in Fig. 1. Pre-training includes a mask language model and next-sentence prediction, and the fine-tuning stage is adjusted for specific tasks.

Fig. 1. BERT structure.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig1.png

The pre-training of BERT includes a mask language model and next sentence prediction (NSP) [21]. In the masking task, 10% of words are randomly masked, of which 80% are replaced with [Mask], 10% remain unchanged, and 10% are replaced with random words. The Chinese dataset uses full-word masking. The NSP task predicts whether two sentences are connected and uses [CLS] and [SEP] to mark the sentences. If there is a connection, [CLS] outputs IsNest; otherwise, it outputs NotNest. The BERT input contains [CLS] and [SEP] tags to identify the beginning and end of the sentence. The input comprises words, words, and position vectors; sentence vectors assist NSP tasks; position vectors supplement time series information, and the model learns sequence features through absolute coding.

2.2. Bidirectional Long Short-term Memory

BiLSTM is a variant that solves the gradient problem in recurrent neural networks. It efficiently uses context, deeply mines semantics, reduces workload, improves entity recognition accuracy, and is widely used for text semantic information extraction [22, 23]. BiLSTM contains three stages internally: the forgetting stage selects to forget old data through the forget gate; The memory stage stores new data through the input gate and updates the state; The output phase determines the output and activates the status information. One-way LSTM handles one-way information; two-way LSTM is more suitable for text context processing. The structure of BiLSTM can be represented by the following formulas (11)-(12):

(11)
$ f_t = \sigma(W_f \cdot [h_{t-1}, x_t] +b_f), $
(12)
$ i_t = \sigma(W_i \cdot [h_{t-1}, x_t] +b_i), $

where ft represents a forgetting gate, it represents an input gate, $\sigma$ is the sigmoid activation function, Wf and Wi represent the weight value and the output gate respectively, and ht−1 represents the hidden layer state at t − 1 time, xt represents the current input; bf, bi is the deviation term. LSTM stores key information through a gating mechanism and forgets unimportant parts. Compared with single-layer storage RNN, LSTM can dynamically capture information and have stronger memory. BiLSTM deeply mines long-distance text information through memory units and control gates, solves gradient disappearance, and has significant application value in information retrieval, automatic question answering, and knowledge graph construction.

2.3. BERT-BiLSTM Fusion Mechanism

The challenges we face include the increased difficulty of training and inference due to the high complexity of the model, the limited scale of training data that may limit the improvement of the generalization ability of the system, and the huge consumption of computing resources in the training and inference process. In response to these limitations, we explore building more efficient model architectures, leveraging larger and more diverse training datasets, and optimizing the use of computing resources.

The technical route of BERT-BiLSTM integration shows its unique charm and strong potential. It integrates the advantages of two deep learning models, dramatically improving the translation quality and speeding up the translation process, making it an ideal choice to meet the needs of modern communication [24, 25].

In English phrase translation, BERT can keenly capture the specific contextual meaning behind each word, thus avoiding ambiguity and misunderstanding caused by literal translation and ensuring the accuracy of the translation. Meanwhile, BiLSTM, with its unique bidirectional memory unit, can simultaneously retain past and future contextual information in sequence data processing, which is crucial for correctly identifying and transforming grammatical structures in phrases [26, 27]. In the scenario of English phrase translation, BiLSTM can effectively track the relationship chain between words and help the model understand complex structures, such as attributive clauses and non-predicate verbs, so that the final translation is more in line with grammatical norms and reads naturally and fluently.

Integrating BERT and BiLSTM perfectly balances semantic deep mining and grammatical detail control [28]. In a specific implementation, the powerful pre-training function of BERT is usually used to obtain the high-dimensional semantic features of source language phrases. Then, these features are input into the BiLSTM network, and the latter can reorganize these features based on its excellent sequence processing skills to generate corresponding expressions in the target language. This process fully uses BERT’s advantages in word-level and sentence-level understanding. It gives full play to BiLSTM’s expertise in grammatical reconstruction, contributing to a significant leap in translation effect.

The BERT-BiLSTM convergence solution also brings additional efficiency gains. Since many common language patterns have been accumulated in the pre-training phase of BERT, only a small amount of fine-tuning often needs to be used in specific translation tasks, significantly reducing the data and computing resources required for model training. Coupled with the rapid response characteristics of BiLSTM when processing fixed-length inputs, it can effectively improve the operating efficiency of the overall system.

3. Efficient English Phrase Translation System Based on BERT-BiLSTM and Genetic Algorithm

3.1. Genetic Algorithm Optimization

The BERT-BiLSTM model combines the bidirectional Transformer of BERT and the bidirectional memory of BiLSTM to construct a powerful language processing architecture. BERT accurately understands the source language and captures the context of the words; BiLSTM is further processed to capture sequence dependencies and improve the ability to translate complex syntax and long-distance dependencies. In order to optimize this model, we introduce a genetic algorithm to simulate natural selection and heredity and efficiently explore the optimal solution in parameter space, including initialization, selection, crossover, and mutation to termination criteria. The fitness function is designed to evaluate the performance of model parameters and guide the optimization process. Charts and flowcharts are provided to visually demonstrate the integration and interaction of BERT, BiLSTM, and genetic algorithms. At the same time, the implementation details, such as libraries and frameworks, hardware specifications, data preprocessing, training, and testing steps, are detailed, providing comprehensive guidance for system deployment and performance evaluation.

We selected individuals based on fitness, balancing exploration and utilization to avoid precocious convergence. In crossover, we combined genetic information using various techniques to expand the search space. Mutation introduced random changes to maintain diversity. We also addressed initialization, elite selection, and termination conditions, impacting algorithm performance. Our genetic algorithm optimized translation system parameters like learning rate, batch size, and hidden nodes. Despite challenges like high computational complexity and local optima, adaptive mutation, elite strategies, and parallel computing helped us find the optimal parameter combination, significantly improving translation accuracy, speed, and resource efficiency. Genetic algorithms outperformed traditional methods like grid and random search.

Genetic algorithm simulates “genetic inheritance” and “natural selection” to achieve optimization. Its core idea is “survival of the fittest.” When resources are limited, excellent individuals survive, inferior individuals are eliminated, and mutations may occur in reproduction to form new individuals [29]. As a global search tool, the genetic algorithm encodes feasible solutions into chromosomes, calculates and selects chromosomes with high fitness, generates new chromosomes through crossover and mutation, iteratively eliminates individuals with low fitness, and retains those with high fitness until the stop condition is met, and the optimal solution is output. Fig. 2 shows the genetic algorithm architecture. The main steps involve coding, population initialization, fitness function design, selection, crossover, and mutation operations.

Fig. 2. Genetic algorithm architecture.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig2.png

Chromosome coding is the first step of the genetic algorithm, which affects crossover and mutation and is related to the feasibility of solution and algorithm performance. Commonly used codes include binary, natural numbers, floating-point numbers, etc. Although binary coding is efficient, improper length affects the search, and there may be no similarity between adjacent solutions, so it is not suitable for vehicle routing problems. Accurate coding is direct and clear, suitable for extensive space search, and accelerates the convergence to the optimum. Therefore, this paper uses accurate coding to solve the problem. Initializing the population affects the algorithm’s efficiency and the solution’s quality. Standard methods include random, clustering, and heuristic initialization. Population size is significant. If it is too small, it reduces the searchability, and if it is too large, it increases the amount of calculation. The fitness function reflects the advantages and disadvantages of individuals. The larger the value, the stronger the ability to adapt to the environment and the greater the probability of being retained. Its design is often related to the objective function of the problem, which directly affects the algorithm’s convergence. Select operation, that is, survival of the fittest, and individuals with high fitness are more likely to enter the next generation. Fitness guarantees excellent individuals in the later period.

Variable neighborhood search algorithm belongs to local search. Starting from any initial solution, it searches in a small range, expands the range when the solution quality is no longer improved, and retracts after the quality improvement. It is good at optimizing solutions, but its efficiency is low and suitable for TSP, image coloring, etc [30]. In variable neighborhood search, max-ordered neighborhoods k1 to max are combined, and the local optimum is searched from the minimum neighborhood k1. If it is better than the current optimum, the minimum neighborhood is replaced and re-searched; Otherwise, search for max step by step.

Individuals with high fitness have many breeding opportunities, so the selection operation screens them accordingly. There are various selection strategies of genetic algorithms, and the probability of the roulette strategy is proportional to fitness. However, the survival of the fittest is weak in the later stage, and the outstanding individuals may be eliminated by mistake. The elite retains and prevents the best, but the local optimum affects the global search. This paper adopts dynamic, elite retention combined with roulette optimization: less retention or no retention in the initial stage, full roulette selection. With the iteration, the number of outstanding individuals gradually increases, the right number of elites are retained, and the rest are roulette selected. At the beginning of the algorithm iteration, the pure roulette method is used to select individuals, and the elite is not retained. As the iteration deepens, the adaptability improves, and the number of elite retentions gradually increases to be constant in the middle of the iteration. Combining dynamic, elite strategy with roulette can not only prevent a single outstanding individual from dominating evolution and enhance population diversity but also ensure the retention of high-quality individuals, accelerate convergence and improve solution efficiency.

In genetic algorithm applications, crossover and mutation settings crucially affect searchability and solution quality. Too low probability leads to less gene exchange, reducing offspring diversity and risking premature convergence or local optima. Too high probability expands the search space but may disrupt existing excellent solutions, reducing efficiency. Mutation aims to broaden the search range; too low probability decreases population diversity, risking loss of high-quality solutions, while too high makes search nearly random, drastically increasing time cost. Hence, parameter settings must balance search efficiency and solution quality based on the specific problem.

The computational resource consumption of model training and inference is critical. In the training stage, the deep structure and bidirectional LSTM layer of BERT-BiLSTM lead to long-term consumption, and the iterative genetic algorithm increases the time cost. Large-scale datasets and model parameters significantly increase memory usage, and GPU-accelerated training comes with increased energy consumption and cost. In the inference stage, the time consumption is less, but the text needs to be processed and the results are generated, the memory needs to be sufficient, and the GPU utilization can reduce latency, but when the real-time performance requirements are high, more efficient algorithms or hardware acceleration are required.

3.2. Construction of College English Phrase Translation System

Data preprocessing includes three parts: data cleaning, word segmentation, and annotation. During the data cleansing process, we remove extraneous information, correct typos, and standardize text formatting to improve dataset quality and reduce noise from degraded model performance. In the word segmentation stage, we precisely divide the text into meaningful units, such as words or phrases, a step that is critical to the translation system because it is directly related to whether the model can correctly understand and translate the text. The annotation process involves labeling the data with the correct translation, either manually annotated by experts or aided by automated tools to ensure that the model learns the correct translated phrases. In particular, we discuss the combined impact of these preprocessing steps on model performance and present experimental results that demonstrate that data cleaning, word segmentation, and annotation significantly improve the accuracy and efficiency of translation systems.

The English phrase recognition system covers five steps: preprocessing, positioning, correction, segmentation, and recognition, each of which must be efficient and accurate. The system integrates machine vision, image processing, and feature recognition technologies to achieve automatic recognition. It is composed of hardware (such as mobile phones) and software (image processing, character recognition algorithms), with hardware supporting operation and software processing data. Text positioning is the foundation that directly affects the recognition effect and system performance, and the key is to extracting image text information. In this study, a genetic algorithm was introduced to optimize localization and improve the accuracy of low-quality image recognition. Character segmentation technology separates individual characters. During the operation, the system first grayscales the color image and then processes it into binarity so as to analyze and process it in depth and obtain accurate recognition results. At the same time, this study also discusses the optimization of the English phrase translation system based on BERT-BiLSTM and genetic algorithm, focusing on the generalization ability of the model to improve the practicability and adaptability of the system.

4. Experimental Results and Analysis

To verify the feasibility of combining BERT-BiLSTM with a genetic algorithm in efficient English phrase translation, we conducted comparative experiments against linear weighting and BP neural network models. As shown in Fig. 3, each model stabilized after approximately 100 generations. Notably, the BERT-BiLSTM+genetic algorithm model showed continuous improvement in user satisfaction during iteration, outperforming the BP and linear weight-ing models after 20 generations. This result demonstrates BERT’s translation prowess and validates the effectiveness of BiLSTM and genetic algorithm in optimizing the translation system. Analysis revealed that BERT accurately captures phrase semantics, BiLSTM enhances context understanding, and the genetic algorithm optimizes parameters to boost translation quality.

Fig. 3. Changes in the number of iterations of the model.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig3.png

Table 1 shows the test results of the extraction model, and the results clearly show that the BERT model shows significant advantages over traditional translation systems, especially in the improvement of multi-level feature extraction and generalization capabilities. Fig. 4 has showed the results of model calculation values. By combining BERT with Bidirectional Long Short-Term Memory Network (BiLSTM), the BERT-BiLSTM model not only inherits the powerful ability of BERT in text feature extraction but also obtains the global optimal solution through label association so as to perform better in entity recognition. The experimental data reveal that the F1 value of the BERT-BiLSTM model is as high as 0.8136, the precision (P-value) is 0.9413, and the recall rate (R-value) is 0.7165, which is better than other comparison models, which effectively verifies the effectiveness and superiority of the BERT-BiLSTM model inefficient English phrase translation tasks.

Table 1. Extraction model test results.

Model P (accuracy) R (Recall Rate) F1-score
LSTM-CRF 0.80724 0.681135 0.74193
BiLSTM-CRF 0.886095 0.69594 0.77952
BERT 0.98721 0.73626 0.843465
BERT-BiLSTM 0.989415 0.73248 0.84168
BERT-BiLSTM-CRF 0.988365 0.752325 0.85428

Fig. 4. Results of model calculation values.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig4.png

Fig. 5 shows that the model accuracy and time consumption vary under different batch sizes. When the batch size is 4, the accuracy rate is low and takes a long time. When increased to 32, the accuracy rate is as high as 89.57%, and the shortest time. If it exceeds 32, if set to 64, the accuracy rate drops to 88.57%, and when the batch size is 128, it cannot be trained due to memory limitations. Therefore, selecting the appropriate batch size within the allowable range of memory is essential.

Fig. 5. Effect of different batch sizes on the model

../../Resources/ieie/IEIESPC.2026.15.2.245/fig5.png

In the process of model training, the setting of the learning rate has an important impact on the convergence and accuracy of the model. According to the experimental results shown in Fig. 6, when the learning rate is set to 5?10-4, 5?10-5, 5?10-6, the accuracy of the model shows a trend of increasing first and then decreasing. Specifically, when the learning rate is set at 5?10-5 is a more appropriate choice.

Fig. 6. Effect of different rates on the model.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig6.png

Fig. 7 shows that the Bert-BiLSTM model has excellent performance. Compared with Word2vec-CNN, the accuracy is increased by 52% and the F1 value is increased by 29%. Compared with Word2vec-RNN, the accuracy is increased by 1.5%, and the F1 value is increased by 4.1%. Compared with Word2Vec-BiLSTM, the accuracy rate increased by 2.9%, and the F1 value increased by 4.2%. Compared with Bert-RNN, the accuracy rate is increased by 2.5%, and the F1 value is greatly increased by 10.2%. These data prove that Bert-BiLSTM is leading in accuracy and F1 value, highlighting its efficiency and superiority in English phrase translation tasks.

Fig. 7. Evaluation results of the model.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig7.png

In the research on the optimization of an efficient English phrase translation system based on BERT-BiLSTM and genetic algorithm, we carried out a detailed experimental evaluation. Fig. 8 shows the performance of the algorithm under the clustering of six polysemous words, in which the contour coefficient reaches the highest of 0.4 (0.27 overall), while the contour coefficient of the improved algorithm is up to 0.9 (0.7 overall), the optimization center algorithm is up to 0.62 (0.42 overall), and the adaptive algorithm is up to 0.88, but the overall stability is slightly worse (0.58 overall). Experimental results show that the improved algorithm proposed in this paper performs better in clustering effect. In addition, we also found that the contour coefficient was closely related to the number of meaningful terms, and when the number of meaningful terms was 2, the clustering effect was the best, and the contour coefficient was close to 1. When the number of semantic terms increases to 4 because the sample points are in the boundary position, there are noise or isolated points, which leads to the deviation of the mean value, the clustering effect decreases, and the contour coefficient is close to 0, which is easy to fall into the local optimum. The optimized system has also seen significant improvements in key metrics such as translation accuracy, recall, and F1 scores.

Fig. 8. Comparative experiment of clustering effect.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig8.png

The experimental results in Fig. 9 show that our English phrase translation system, optimized based on BERT-BiLSTM and genetic algorithm, performs excellently, with an average translation accuracy of 74.9%, far exceeding the five methods compared. Among them, the word2vec model uses the equivalent pseudo-word context to preserve the semantic relationship better than the context vector. However, the learning of mixed feature rules relies on expert cognition and corpus, and the accuracy is limited. Semantic relevance and code methods are easy to introduce noise and affect the accuracy. The hidden Markov model is sensitive to word segmentation and lacks stability.

Fig. 9. Comparison of experimental accuracy.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig9.png

The experimental comparison in Fig. 10 shows that the accuracy rate of using English definition and example sentence information combined with BiLSTM is 80.57%; the DBN model is used to select adjacent word features, and the accuracy rate is 72.33%. Based on the context translation supervision model, the accuracy rate is 78.97%. In this paper, BiLSTM is used. After the pre-trained word vector is improved, the disambiguation accuracy rate increases by 5.5%.

Fig. 10. Compares the disambiguation results of different methods.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig10.png

Fig. 11 shows that BERT feature combined with BiLSTM is better than word2vec combined with BiLSTM, and BERT has strong generalization ability and a better description of multi-level relationships. BERT-BiLSTM is better than BERT-LSTM, and BiLSTM promotes contextual information learning. The BERT-BiLSTM model with part-of-speech features has better performance, with an accuracy of 87.73%.

Fig. 11. Performance of the model on different data sets.

../../Resources/ieie/IEIESPC.2026.15.2.245/fig11.png

5. Conclusion

This study proposes a new English phrase translation system combining the BERT pre-trained model, BiLSTM, and genetic algorithm. Moreover, it is deeply optimized, and the following conclusions are drawn:

Firstly, the BERT model is used for word vector representation, and then these words are embedded in BiLSTM for sequence modeling, thus capturing context information.

In order to further improve the performance of the system, a genetic algorithm is introduced into the model optimization stage. The genetic algorithm is used to optimize the model parameters, making the model more accurately learn the mapping relationship between the source and target languages, improving the translation quality and efficiency.

The experimental results show that on the English translation dataset, the BLEU score of our system reaches 28.93 points. Compared with the traditional neural network machine translation system, the BLEU score is improved by about 6%. In terms of FLOPs, our system is better than the traditional method. It is reduced by about 7%.

After using a genetic algorithm to optimize the parameters, the convergence speed of the model is also significantly improved. With the same computing resources, our model only needs 70% of the time to achieve the best performance.

The system uses BERT for word vector representation, BiLSTM is used to capture contextual information, and a genetic algorithm is introduced to optimize model parameters, which significantly improves the translation quality and efficiency. In terms of theoretical contributions, our research verifies the synergy between BERT, BiLSTM, and genetic algorithms in translation systems and provides new theoretical support for the optimization of deep learning models.

References

1 
Yang X. , Lv F. , Liu F. , Lin G. , 2023, Self-training vision language BERTs with a unified conditional model, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 33, No. 8, pp. 3560-3569DOI
2 
Bu Y. , Chen T. , Duan H. , Liu M. , Xue Y. , 2024, A semi-supervised learning approach for semantic parsing boosted by BERT word embedding, Journal of Intelligent & Fuzzy Systems, Vol. 46, No. 3, pp. 6577-6588DOI
3 
Liu C. , Zhu W. , Zhang X. , Zhai Q. , 2023, Sentence part-enhanced BERT with respect to downstream tasks, Complex & Intelligent Systems, Vol. 9, No. 1, pp. 463-474DOI
4 
Deng L. , Yin T. , Li Z. , Ge Q. , 2023, Sentiment analysis of comment data based on BERT-ETextCNN-ELSTM, Electronics, Vol. 12, No. 13, pp. 2023DOI
5 
Habbat N. , Nouri H. , Anoun H. , Hassouni L. , 2023, Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning, Engineering Applications of Artificial Intelligence, Vol. 126, pp. 2023DOI
6 
Mutinda J. , Mwangi W. , Okeyo G. , 2023, Sentiment analysis of text reviews using lexicon-enhanced BERT embedding (LeBERT) model with convolutional neural network, Applied Sciences, Vol. 13, No. 3, pp. 2023DOI
7 
Zhou X. , 2023, Sentiment analysis of the consumer review text based on BERT-BiLSTM in a social media environment, International Journal of Information Technologies and Systems Approach, Vol. 16, No. 2, pp. 2023DOI
8 
Duan R. , Huang Z. , Zhang Y. , Liu X. , Dang Y. , 2021, Sentiment classification algorithm based on the cascade of BERT model and adaptive sentiment dictionary, Wireless Communications & Mobile Computing, Vol. 2021, pp. 2021DOI
9 
Hao S. , Zhang P. , Liu S. , Wang Y. , 2023, Sentiment recognition and analysis method of official document text based on BERT-SVM model, Neural Computing & Applications, Vol. 35, No. 35, pp. 24621-24632DOI
10 
Jia N. , Yao C. , 2024, ShallowBKGC: A BERT-enhanced shallow neural network model for knowledge graph completion, PeerJ Computer Science, Vol. 10, pp. 2024DOI
11 
Shen S. , Liu J. , Lin L. , Huang Y. , Zhang L. , Liu C. , Feng Y. , Wang D. , 2023, SsciBERT: a pre-trained language model for social science texts, Scientometrics, Vol. 128, No. 2, pp. 1241-1263DOI
12 
Siddharth M. , Aarthi R. , 2021, Text to image GANs with RoBERTa and fine-grained attention networks, International Journal of Advanced Computer Science and Applications, Vol. 12, No. 12, pp. 947-955DOI
13 
Prottasha N. J. , Sami A. A. , Kowsher M. , Murad S. A. , Bairagi A. K. , Masud M. , Baz M. , 2022, Transfer learning for sentiment analysis using BERT-based supervised fine-tuning, Sensors, Vol. 22, No. 11, pp. 2022DOI
14 
Acheampong F. A. , Nunoo-Mensah H. , Chen W. , 2021, Transformer models for text-based emotion detection: A review of BERT-based approaches, Artificial Intelligence Review, Vol. 54, No. 8, pp. 5789-5829DOI
15 
Phuc D. , Hung L. , Pham A. B. , Nguyen C. H. , 2022, Using BERT and knowledge graph for detecting triples in Vietnamese text, Neural Computing & Applications, Vol. 34, No. 20, pp. 17999-18013DOI
16 
Chang C. , Tang Y. , Long Y. , Hu K. , Li Y. , Li J. , Wang C.-D. , 2023, Multi-information preprocessing event extraction with BiLSTM-CRF attention for academic knowledge graph construction, IEEE Transactions on Computational Social Systems, Vol. 10, No. 5, pp. 2713-2724DOI
17 
Shen K. , Yan D. , Ye Z. , Xu X. , gao J. , Dong L. , Peng C. , Yang K. , 2023, Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM, Signal, Image and Video Processing, Vol. 17, No. 7, pp. 3377-3385DOI
18 
Ranjan R. , Daniel A. K. , 2022, An optimized deep convolutional network sentiment classification model with word embedding and BiLSTM technique, Advances in Distributed Computing and Artificial Intelligence Journal, Vol. 11, No. 3, pp. 309-329DOI
19 
Zhou X. , 2023, Sentiment analysis of the consumer review text based on BERT-BiLSTM in a social media environment, International Journal of Information Technologies and Systems Approach, Vol. 16, No. 2, pp. 2023DOI
20 
Hsieh Y.-H. , Zeng X.-P. , 2022, Sentiment analysis: an ERNIE-BiLSTM approach to bullet screen comments, Sensors, Vol. 22, No. 14, pp. 2022DOI
21 
Liu J. , 2023, Sentiment classification of social network text based on AT-BiLSTM model in a big data environment, International Journal of Information Technologies and Systems Approach, Vol. 16, No. 2, pp. 2023DOI
22 
Demirci D. , Sahin N. , Sirlancis M. , Acarturk C. , 2022, Static malware detection using stacked BiLSTM and GPT-2, IEEE Access, Vol. 10, pp. 58488-58502DOI
23 
He A. , Abisado M. , 2024, Text sentiment analysis of Douban film short comments based on BERT-CNN-BiLSTM-Att model, IEEE Access, Vol. 12, pp. 45229-45237DOI
24 
Sangeetha J. , Kumaran U. , 2023, Using BiLSTM structure with cascaded attention fusion model for sentiment analysis, Journal of Scientific & Industrial Research, Vol. 82, No. 4, pp. 444-449DOI
25 
Rao P. J. , Rao K. N. , Gokuruboyina S. , 2022, An experimental study with fuzzy-wuzzy (partial ratio) for identifying the similarity between English and French languages for plagiarism detection, International Journal of Advanced Computer Science and Applications, Vol. 13, No. 10, pp. 393-401DOI
26 
Hilal A. M. , Al-Wesabi F. N. , Abdelmaboud A. , Hamza M. A. , Mahzari M. , Hassan A. Q. A. , 2022, A hybrid intelligent text watermarking and natural language processing approach for transferring and receiving an authentic English text via internet, The Computer Journal, Vol. 65, No. 2, pp. 423-435DOI
27 
Wang Y. , 2021, An improved machine learning and artificial intelligence algorithm for classroom management of English distance education, Journal of Intelligent & Fuzzy Systems, Vol. 40, No. 2, pp. 3477-3488DOI
28 
Wang R. , 2023, Research on effectiveness of college English blended teaching mode under small private online course based on machine learning, SN Applied Sciences, Vol. 5, No. 2, pp. 2023DOI
29 
Zhang J. , 2022, Research on multimedia and interactive teaching model of college English, International Journal of Computational Science and Engineering, Vol. 25, No. 6, pp. 587-592DOI
30 
Yin J. , Cui J. , 2023, Secure application of MIoT: privacy-preserving solution for online English education platforms, Applied Sciences, Vol. 13, No. 14, pp. 2023DOI
Jingyun Huang
../../Resources/ieie/IEIESPC.2026.15.2.245/au1.png

Jingyun Huang obtained her B.A. degree in english from Zhixing College of Northwest Normal University in 2010. She obtained her M.A. degree in english language and literature from Northwest Normal University in 2018. Presently, she is working as a lecturer in the School of Literature and Historical Culture, Longdong University. Her areas of interest are corpus-based translation studies and related fields.

Bing Wang

Bing Wang No Imformation